Proceedings of the 2015 International Conference on Computational Science and Engineering

Combining Classifiers to Extract Web Data

Authors
Qiang Chu, Yongquan Dong, Ping Ling
Corresponding Author
Qiang Chu
Available Online July 2015.
DOI
10.2991/iccse-15.2015.76How to use a DOI?
Keywords
Web Data Extraction; Ensemble Learning; Data Integration
Abstract

A lot of data on the web are usually embedded in the semi-structured pages. In order to automatically process the content embedded in Web pages, extracting data from them and making it available to computer applications remains a complex and urgent task. Most of current approaches use a single classifier to extract web data, but relying on a single classifier is not sufficient and different classifier has different performance for a problem. In this paper, we combine multiple classifiers to extract web data. Firstly, we identify the main data regions of web pages, and construct feature sets of text nodes in the regions. Secondly, we choose three kinds of base classifiers and then use the voting method to integrate results of each classifier. Finally, we combine integration results with heuristic rules to get the final extraction results. The experiment results show that our approach outperforms the baseline approaches.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2015 International Conference on Computational Science and Engineering
Series
Advances in Computer Science Research
Publication Date
July 2015
ISBN
10.2991/iccse-15.2015.76
ISSN
2352-538X
DOI
10.2991/iccse-15.2015.76How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Qiang Chu
AU  - Yongquan Dong
AU  - Ping Ling
PY  - 2015/07
DA  - 2015/07
TI  - Combining Classifiers to Extract Web Data
BT  - Proceedings of the 2015 International Conference on Computational Science and Engineering
PB  - Atlantis Press
SP  - 412
EP  - 416
SN  - 2352-538X
UR  - https://doi.org/10.2991/iccse-15.2015.76
DO  - 10.2991/iccse-15.2015.76
ID  - Chu2015/07
ER  -