A new method of page standardization based on DOM
- 10.2991/emeit.2012.57How to use a DOI?
- DOM Tree, unreadable codes, charset, page standardization
With the rapid development of the Internet, information as well as websites boomed. And, being differentiated in style, structure or content, it is unable to get the information from different pages using the same model, while it is really a waste of time to search each line of the page to find useful information because of noises. That makes arranging all the information from a page to build a DOM tree for search a wise choice firstly because it raises the possibility of searching accurately. What is more, converting a web page into a tree helps identify the main frame of the page. On the other hand, unreadable codes, which are caused by invalid transformation between languages, is a barrier separating people apart from information on websites of other districts of the world. Our work is aimed at solving the listed problems to make information from all around the world accessible while convenient to extract.
- © 2012, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Weicheng Ma AU - Yong Fan AU - Wenqian Shang AU - Fengyan Wu PY - 2012/09 DA - 2012/09 TI - A new method of page standardization based on DOM BT - Proceedings of the 2nd International Conference on Electronic & Mechanical Engineering and Information Technology (EMEIT 2012) PB - Atlantis Press SP - 288 EP - 292 SN - 1951-6851 UR - https://doi.org/10.2991/emeit.2012.57 DO - 10.2991/emeit.2012.57 ID - Ma2012/09 ER -