A new method of page standardization based on DOM

Weicheng Ma; Yong Fan; Wenqian Shang; Fengyan Wu

doi:10.2991/emeit.2012.57

<Previous Article In Volume

Next Article In Volume>

A new method of page standardization based on DOM

Authors

Weicheng Ma, Yong Fan, Wenqian Shang, Fengyan Wu

Corresponding Author

Weicheng Ma

Available Online September 2012.

DOI: 10.2991/emeit.2012.57 How to use a DOI?
Keywords: DOM Tree, unreadable codes, charset, page standardization
Abstract: With the rapid development of the Internet, information as well as websites boomed. And, being differentiated in style, structure or content, it is unable to get the information from different pages using the same model, while it is really a waste of time to search each line of the page to find useful information because of noises. That makes arranging all the information from a page to build a DOM tree for search a wise choice firstly because it raises the possibility of searching accurately. What is more, converting a web page into a tree helps identify the main frame of the page. On the other hand, unreadable codes, which are caused by invalid transformation between languages, is a barrier separating people apart from information on websites of other districts of the world. Our work is aimed at solving the listed problems to make information from all around the world accessible while convenient to extract.
Copyright: © 2012, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2nd International Conference on Electronic & Mechanical Engineering and Information Technology (EMEIT 2012)
Series: Advances in Intelligent Systems Research
Publication Date: September 2012
ISBN: 978-90-78677-60-4
ISSN: 1951-6851
DOI: 10.2991/emeit.2012.57 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Weicheng Ma
AU  - Yong Fan
AU  - Wenqian Shang
AU  - Fengyan Wu
PY  - 2012/09
DA  - 2012/09
TI  - A new method of page standardization based on DOM
BT  - Proceedings of the 2nd International Conference on Electronic & Mechanical Engineering and Information Technology (EMEIT 2012)
PB  - Atlantis Press
SP  - 288
EP  - 292
SN  - 1951-6851
UR  - https://doi.org/10.2991/emeit.2012.57
DO  - 10.2991/emeit.2012.57
ID  - Ma2012/09
ER  -

download .riscopy to clipboard