Proceedings of the 2nd International Conference on Electronic & Mechanical Engineering and Information Technology (EMEIT 2012)

A new method of page standardization based on DOM

Authors
Weicheng Ma, Yong Fan, Wenqian Shang, Fengyan Wu
Corresponding Author
Weicheng Ma
Available Online September 2012.
DOI
10.2991/emeit.2012.57How to use a DOI?
Keywords
DOM Tree, unreadable codes, charset, page standardization
Abstract

With the rapid development of the Internet, information as well as websites boomed. And, being differentiated in style, structure or content, it is unable to get the information from different pages using the same model, while it is really a waste of time to search each line of the page to find useful information because of noises. That makes arranging all the information from a page to build a DOM tree for search a wise choice firstly because it raises the possibility of searching accurately. What is more, converting a web page into a tree helps identify the main frame of the page. On the other hand, unreadable codes, which are caused by invalid transformation between languages, is a barrier separating people apart from information on websites of other districts of the world. Our work is aimed at solving the listed problems to make information from all around the world accessible while convenient to extract.

Copyright
© 2012, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2nd International Conference on Electronic & Mechanical Engineering and Information Technology (EMEIT 2012)
Series
Advances in Intelligent Systems Research
Publication Date
September 2012
ISBN
10.2991/emeit.2012.57
ISSN
1951-6851
DOI
10.2991/emeit.2012.57How to use a DOI?
Copyright
© 2012, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Weicheng Ma
AU  - Yong Fan
AU  - Wenqian Shang
AU  - Fengyan Wu
PY  - 2012/09
DA  - 2012/09
TI  - A new method of page standardization based on DOM
BT  - Proceedings of the 2nd International Conference on Electronic & Mechanical Engineering and Information Technology (EMEIT 2012)
PB  - Atlantis Press
SP  - 288
EP  - 292
SN  - 1951-6851
UR  - https://doi.org/10.2991/emeit.2012.57
DO  - 10.2991/emeit.2012.57
ID  - Ma2012/09
ER  -