Estimation of Structural Similarity of XML Document Based on Frequency and Path

Xueli Ren; Yubiao Dai

doi:10.2991/emcs-16.2016.66

<Previous Article In Volume

Next Article In Volume>

Estimation of Structural Similarity of XML Document Based on Frequency and Path

Authors

Xueli Ren, Yubiao Dai

Corresponding Author

Xueli Ren

Available Online January 2016.

DOI: 10.2991/emcs-16.2016.66 How to use a DOI?
Keywords: XML; Structural similarity; Frequency; Sematic; Tuple
Abstract: With the continuous development of Internet and rich resources emerging on the Web, information retrieval based on XML has emerged; the similarity of documents is the basis of information retrieval. A method is proposed to compute similarity of XML documents based on path and frequency in the paper. XML document is expressed as a collection of tuple, the paths are extracted and delete the recurring in order to improve efficiency, tag is matched by WordNet; and then path similarity is computed by the fuzzy longest common subsequence and frequency; finally, the structure similarity between documents are calculated. Two experiments are done to show that the method is effective, the experiment 1 test structural similarity of 15 XML documents from 3 DTDs; the similarity computing is applied in the documents classification for real data sets in the experiment 2, and results show the accuracy may arrive at 100%
Copyright: © 2016, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2016 International Conference on Education, Management, Computer and Society
Series: Advances in Computer Science Research
Publication Date: January 2016
ISBN: 978-94-6252-158-2
ISSN: 2352-538X
DOI: 10.2991/emcs-16.2016.66 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Xueli Ren
AU  - Yubiao Dai
PY  - 2016/01
DA  - 2016/01
TI  - Estimation of Structural Similarity of XML Document Based on Frequency and Path
BT  - Proceedings of the 2016 International Conference on Education, Management, Computer and Society
PB  - Atlantis Press
SP  - 272
EP  - 275
SN  - 2352-538X
UR  - https://doi.org/10.2991/emcs-16.2016.66
DO  - 10.2991/emcs-16.2016.66
ID  - Ren2016/01
ER  -

download .riscopy to clipboard