Proceedings of the 2017 5th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2017)

Quality Assessment Method of Web Documents Based on Random Forest

Authors
Li He, Li Tang, Ning Wang
Corresponding Author
Li He
Available Online September 2017.
DOI
10.2991/icmmcce-17.2017.190How to use a DOI?
Keywords
Web document; quality assessment; LDA topic model; random forest
Abstract

This paper proposes a method based on the method of Random Forest (RF) for better assessing quality of web documents, and formulates a novel quality evaluation index system including features of organization structure, network access and content. In order to extract the content feature of a document, a topic coverage degree calculation model based on LDA is put forward. Finally, it conduct some experiments on two document sets: Wikipedia and Baidu Encyclopedia, and precision rate, recall rate and F-Measure are used to verify the validity of the proposed quality assessment method. Experimental results show that the proposed evaluation index system and the RF-based quality assessment method can achieve good performance and advantages.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2017 5th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2017)
Series
Advances in Engineering Research
Publication Date
September 2017
ISBN
10.2991/icmmcce-17.2017.190
ISSN
2352-5401
DOI
10.2991/icmmcce-17.2017.190How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Li He
AU  - Li Tang
AU  - Ning Wang
PY  - 2017/09
DA  - 2017/09
TI  - Quality Assessment Method of Web Documents Based on Random Forest
BT  - Proceedings of the 2017 5th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2017)
PB  - Atlantis Press
SP  - 1058
EP  - 1065
SN  - 2352-5401
UR  - https://doi.org/10.2991/icmmcce-17.2017.190
DO  - 10.2991/icmmcce-17.2017.190
ID  - He2017/09
ER  -