A Hybrid Method for Extracting Deep Web Information
Yuanpeng Zhang, Li Wang, Kui Jiang, Danmin Qian, Jiancheng Dong
Available Online April 2015.
- https://doi.org/10.2991/amcce-15.2015.138How to use a DOI?
- information extraction; clinic expert information; domain model; block importance model; SVM
- Some previous works show that more than 60% of the information available on the Web is located in Deep Web database. Such information cannot be directly indexed by search engines. In this paper, a hybrid method, which is composed of a domain model and a block importance model is proposed to extract information in Deep Web. The domain model is used for classifying and identifying whether a form is a WQI. The block importance model is used for filtering noisy information in re-sponse pages. These two models are both compared with a rule-based method. The experiment re-sults indicate that the domain model yields a precision 6.44% higher than that of the rule-based method, whereas the block importance model yields an F1 measure 10.5% higher than that of the XPath method.
- Open Access
- This is an open access article distributed under the CC BY-NC license.
Cite this article
TY - CONF AU - Yuanpeng Zhang AU - Li Wang AU - Kui Jiang AU - Danmin Qian AU - Jiancheng Dong PY - 2015/04 DA - 2015/04 TI - A Hybrid Method for Extracting Deep Web Information BT - 2015 International Conference on Automation, Mechanical Control and Computational Engineering PB - Atlantis Press SN - 1951-6851 UR - https://doi.org/10.2991/amcce-15.2015.138 DO - https://doi.org/10.2991/amcce-15.2015.138 ID - Zhang2015/04 ER -