Proceedings of the 2nd International Symposium on Computer, Communication, Control and Automation (ISCCCA 2013)

Website information extraction based on DOM-model

Authors
YaFang Lou, YiChong Zhang, ZhiJun Yuan
Corresponding Author
YaFang Lou
Available Online February 2013.
DOI
10.2991/isccca.2013.199How to use a DOI?
Keywords
DOM, Information Extraction, Web Introduction
Abstract

With the rapid development of network technology and the promotion of application, web has become the main platform of the issuing and accessing information. It is current research focus, how to obtain the information required by the user from the vast information source. This paper presents an extraction method of website information based on DOM to improve the searching efficiency, which only to preserve the theme information and to filter out the noise information that the users are not interested in.

Copyright
© 2013, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2nd International Symposium on Computer, Communication, Control and Automation (ISCCCA 2013)
Series
Advances in Intelligent Systems Research
Publication Date
February 2013
ISBN
10.2991/isccca.2013.199
ISSN
1951-6851
DOI
10.2991/isccca.2013.199How to use a DOI?
Copyright
© 2013, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - YaFang Lou
AU  - YiChong Zhang
AU  - ZhiJun Yuan
PY  - 2013/02
DA  - 2013/02
TI  - Website information extraction based on DOM-model
BT  - Proceedings of the 2nd International Symposium on Computer, Communication, Control and Automation (ISCCCA 2013)
PB  - Atlantis Press
SP  - 792
EP  - 795
SN  - 1951-6851
UR  - https://doi.org/10.2991/isccca.2013.199
DO  - 10.2991/isccca.2013.199
ID  - Lou2013/02
ER  -