Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science

Improving Suffix Tree Clustering Algorithm for Web Documents

Authors
Yan Zhuang, Youguang Chen
Corresponding Author
Yan Zhuang
Available Online July 2015.
DOI
10.2991/lemcs-15.2015.310How to use a DOI?
Keywords
Web document clustering; Suffix tree; Suffix tree clustering; Space vector model; Pearson correlation coefficient
Abstract

Web document clustering results can help users quickly locate the information they need among the results search engines returned. According to the characteristics of the suffix tree structure and the flaws of similarity calculation in STC algorithm's cluster merging, this paper proposes an improved suffix tree clustering method. The method combines vector space model with Pearson correlation coefficient, calculates the relevant of clusters based on document vector of all clusters, and then utilizes the relevant vectors of clusters and the correlations between them to calculate the similarity for cluster merging, improves the clustering process of documents. Analysis of the experimental results shows that the method outperforms the original STC algorithm on Web documents clustering.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science
Series
Advances in Intelligent Systems Research
Publication Date
July 2015
ISBN
10.2991/lemcs-15.2015.310
ISSN
1951-6851
DOI
10.2991/lemcs-15.2015.310How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Yan Zhuang
AU  - Youguang Chen
PY  - 2015/07
DA  - 2015/07
TI  - Improving Suffix Tree Clustering Algorithm for Web Documents
BT  - Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science
PB  - Atlantis Press
SP  - 1557
EP  - 1561
SN  - 1951-6851
UR  - https://doi.org/10.2991/lemcs-15.2015.310
DO  - 10.2991/lemcs-15.2015.310
ID  - Zhuang2015/07
ER  -