Proceedings of the 2017 5th International Conference on Frontiers of Manufacturing Science and Measuring Technology (FMSMT 2017)

A Schema Feature Based Frequent Pattern Mining Algorithm for Semi-structured Data Stream

Authors
Weiqi Fu, Husheng Liao, Xueyun Jin
Corresponding Author
Weiqi Fu
Available Online April 2017.
DOI
https://doi.org/10.2991/fmsmt-17.2017.260How to use a DOI?
Keywords
frequent pattern mining, semi-structured data stream, schema feature.
Abstract
Data mining is used to find useful information from massive data. Frequent pattern mining is one important task of data mining. Recently, the researches on frequent pattern mining for semi-structured data have made some progresses, and it also have a lot of focuses for data stream. However, only a few studies focus on both semi-structured data and data stream. This paper proposes an algorithm named SPrefixTreeISpan. We segment the semi-structured data stream first, and then uses the pattern-growth method to mine each segment. In the end, we maintain all the results on a structure called patternTree. At the same time, the mining algorithm is optimized by the inevitable parent-child relationship and the inevitable child-parent relationship extracted from XML schema. Experiment shows that SPrefixTreeISpan has better performance.
Open Access
This is an open access article distributed under the CC BY-NC license.

Download article (PDF)

Cite this article

TY  - CONF
AU  - Weiqi Fu
AU  - Husheng Liao
AU  - Xueyun Jin
PY  - 2017/04
DA  - 2017/04
TI  - A Schema Feature Based Frequent Pattern Mining Algorithm for Semi-structured Data Stream
BT  - Proceedings of the 2017 5th International Conference on Frontiers of Manufacturing Science and Measuring Technology (FMSMT 2017)
PB  - Atlantis Press
SP  - 1329
EP  - 1336
SN  - 2352-5401
UR  - https://doi.org/10.2991/fmsmt-17.2017.260
DO  - https://doi.org/10.2991/fmsmt-17.2017.260
ID  - Fu2017/04
ER  -