title:
 
Progressive CFM-Miner: An Algorithm to Mine CFM – Sequential Patterns from a Progressive Database
publication:
 
IJCIS
volume-issue:   6 - 2
pages:   209 - 222
ISSN:
  1875-6883
DOI:
  doi:10.2991/10.1080/18756891.2013.768432 (how to use a DOI)
author(s):
 
Bhawna Mallick, Deepak Garg, P. S. Grover
publication date:
 
April 2013
keywords:
 
Sequential pattern mining, CFM-PrefixSpan, Progressive database, updated CFM-tree, progressive CFM patterns, algorithms
abstract:
 
Sequential pattern mining is a vital data mining task to discover the frequently occurring patterns in sequence databases. As databases develop, the problem of maintaining sequential patterns over an extensively long period of time turn into essential, since a large number of new records may be added to a database. To reflect the current state of the database where previous sequential patterns would become irrelevant and new sequential patterns might appear, there is a need for efficient algorithms to update, maintain and manage the information discovered. Several efficient algorithms for maintaining sequential patterns have been developed. Here, we have presented an efficient algorithm to handle the maintenance problem of CFM-sequential patterns (Compact, Frequent, Monetary-constraints based sequential patterns). In order to efficiently capture the dynamic nature of data addition and deletion into the mining problem, initially, we construct the updated CFM-tree using the CFM patterns obtained from the static database. Then, the database gets updated from the distributed sources that have data which may be static, inserted, or deleted. Whenever the database is updated from the multiple sources, CFM tree is also updated by including the updated sequence. Then, the updated CFM-tree is used to mine the progressive CFM-patterns using the proposed tree pattern mining algorithm. Finally, the experimentation is carried out using the synthetic and real life distributed databases that are given to the progressive CFM-miner. The experimental results and analysis provides better results in terms of the generated number of sequential patterns, execution time and the memory usage over the existing IncSpan algorithm.
copyright:
 
© The authors.
This article is distributed under the terms of the Creative Commons Attribution License 4.0, which permits non-commercial use, distribution and reproduction in any medium, provided the original work is properly cited. See for details: https://creativecommons.org/licenses/by-nc/4.0/
full text: