Proceedings of the 2nd Annual International Conference on Electronics, Electrical Engineering and Information Science (EEEIS 2016)

Domain adaptation of web data extraction based on bootstrapping method

Authors
Dong-Lan Liu, Xin Liu, Lei Ma, Hao Yu, Yong Zhao, Guo-Dong Lv
Corresponding Author
Dong-Lan Liu
Available Online December 2016.
DOI
10.2991/eeeis-16.2017.49How to use a DOI?
Keywords
Domain Adaptation; Web Data Extraction; Bootstrapping; Electric Power Enterprise Information; Wrapper.
Abstract

With the fast development of electric power enterprise information, special structured storage and management system is becoming more and more important. As a uniform interface for multitude of data sources, the efficiency to extract unstructured and semi-structured data existing in webpage is a key issue for Web data integration. In this paper, we grope for the problem of constructing domain adaptation wrappers for web information extraction. We design the domain adaptation extraction framework based on bootstrapping method. Meanwhile, we discuss the main technologies. By obtaining the extraction model for recruitment site and then random sampling pages at other power system sites using this extraction model for training the new wrapper, a uniform data accessing infrastructure in power system domain can be built. In addition, the wrapper has high multipurpose, realizing the domain adaptation extraction. Experimental results show that the proposed approach can improve the extraction accuracy of target data and effectively solve the adaptive wrapper for the massive Web data in various fields.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2nd Annual International Conference on Electronics, Electrical Engineering and Information Science (EEEIS 2016)
Series
Advances in Engineering Research
Publication Date
December 2016
ISBN
10.2991/eeeis-16.2017.49
ISSN
2352-5401
DOI
10.2991/eeeis-16.2017.49How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Dong-Lan Liu
AU  - Xin Liu
AU  - Lei Ma
AU  - Hao Yu
AU  - Yong Zhao
AU  - Guo-Dong Lv
PY  - 2016/12
DA  - 2016/12
TI  - Domain adaptation of web data extraction based on bootstrapping method
BT  - Proceedings of the 2nd Annual International Conference on Electronics, Electrical Engineering and Information Science (EEEIS 2016)
PB  - Atlantis Press
SP  - 372
EP  - 385
SN  - 2352-5401
UR  - https://doi.org/10.2991/eeeis-16.2017.49
DO  - 10.2991/eeeis-16.2017.49
ID  - Liu2016/12
ER  -