Research of the Web Page Cleaning Technology on Tourism Theme
- DOI
- 10.2991/iccnce.2013.118How to use a DOI?
- Keywords
- tourist theme pages, html, page cleaning, regex.
- Abstract
With the development of web technology, the use of dynamic web pages and the personalization of page contents become more and more popular. Currently, the information of page is protean and the structures of different pages are vastly different, the traditional thinking of page cleaning technology has been difficult to adapt to the situation.In this paper, proposes a web cleaning method based on regex extraction strategy through the analysis of structural features of web pages on tourist theme.This algorithm avoides the defects of traditional page cleaning technology, it is simple, practical, high cleaning efficiency, accuracy, and saving the overhead of the system.
- Copyright
- © 2013, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Qi Shen AU - Qingming Song AU - Meng Zhang AU - Yan Tang PY - 2013/07 DA - 2013/07 TI - Research of the Web Page Cleaning Technology on Tourism Theme BT - Proceedings of the International Conference on Computer, Networks and Communication Engineering (ICCNCE 2013) PB - Atlantis Press SP - 475 EP - 478 SN - 1951-6851 UR - https://doi.org/10.2991/iccnce.2013.118 DO - 10.2991/iccnce.2013.118 ID - Shen2013/07 ER -