Mining sequential patterns

Year of publication

2001

Authors

Ahola, Jussi

Abstract

Discovering associations is one of the fundamental tasks of data mining. Its aim is to automatically seek for dependencies from vast amounts of data. The task results in socalled association rules, which are of form: If A occurs in the data then B occurs also. Only those rules that occur in the data frequently enough are generated. However, various information sources generate data with an inherent sequential nature, i.e., it is composed of discrete events which have a temporal/spatial ordering. This kind of data can be obtained from, e.g., telecommunications networks, electronic commerce, www-servers of Internet, and various scientific sources, like gene databases. The sequential nature of the data is totally ignored in the generation of the association rules. Thus, a part of the useful information included in the data is discarded. Thus, since the mid 90's the interest in discovering also the sequential associations in the data has arisen among the data mining community. The sequential associations or sequential patterns can be presented in the form: when A occurs, B occurs within some certain time. So, the difference to traditional association rules is that here the time information is included both in the rule itself and also in the mining process in the form of timing constraints. Nowadays there exist several highly efficient methods for mining these kind of patterns. The problem with them is that they assume the input data to be sequences of discrete events including only the information of the ordering, usually the time. Often, however, the events are associated with some additional attributes. The existing methods cannot take this multi-dimensionality of the data into account and so they lose the additional information it involves. Furthermore, the methods are designed for some specific problem, and are not, as such, applicable to different types of sequential data. In this report, a general formulation of the sequential patterns is introduced as it is presented in [1]. By using this approach the last problem of the existing algorithms can be tackled. A survey of the existing algorithm is then done. Three algorithms are presented in detail: WINEPI [2] and GSP [4] as they form the basis of the algorithms, and cSPADE [6] since it seems to be the most promising method proposed for the problem yet. Also the other relevant approaches are shortly introduced. Lastly, the extension of the patterns into the multi-dimensional is considered. Some ideas of handling the problem are given and also the features of the existing algorithms supporting multi-dimensionality are studied.

Organizations and authors

VTT Technical Research Centre of Finland Ltd

Ahola Jussi

Publication type

Publication format

Monograph

Audience

Professional

MINEDU's publication type classification code

D4 Published development or research report or study

Publication channel information

Journal/Series

VTT Research Report

Publisher

VTT Technical Research Centre of Finland

Issue

TTE1-2001-10

Open access

Open access in the publisher’s service

Yes

License of the publisher’s version

Other license

Self-archived

Other information

Fields of science

Computer and information sciences

Keywords

[object Object]

Language

English

Mining sequential patterns

Year of publication

Authors

Abstract

Organizations and authors

Publication type

Publication format

Audience

MINEDU's publication type classification code

Publication channel information

Journal/Series

Publisher

Issue

Open access

Open access in the publisher’s service

License of the publisher’s version

Self-archived

Other information

Fields of science

Keywords

Language

International co-publication

Co-publication with a company

The publication is included in the Ministry of Education and Culture’s Publication data collection

undefined

Mining sequential patterns

Year of publication

Authors

Abstract

Organizations and authors

Publication type

Publication format

Audience

MINEDU's publication type classification code

Publication channel information

Journal/Series

Publisher

Issue

Open access

Open access in the publisher’s service

License of the publisher’s version

Self-archived

Other information

Fields of science

Keywords

Language

International co-publication

Co-publication with a company

The publication is included in the Ministry of Education and Culture’s Publication data collection