Proceedings of the 3rd International Conference on Mechatronics, Robotics and Automation

Gray Tunneling Based on Joint Link for Focused Crawling

Authors
Wei Dong, Hong Ni, Haojiang Deng, Liheng Tuo
Corresponding Author
Wei Dong
Available Online April 2015.
DOI
10.2991/icmra-15.2015.167How to use a DOI?
Keywords
Focused Crawling; Gray Tunneling; Web Link Machine Learning; Q Learning
Abstract

Tunneling problems of the topic-multiplicity of a web page makes the relevance of the highly relevant page to be weakened. In this paper, we proposed a novel relevance prediction for focused crawling to solve gray tunneling. Our approach is based on calculating the relevancy score of web page based on its block relevancy score with respect to topics and calculating the URL score based on its parent pages and its anchor contexts, and we joins the context similarity and the link similarity which is based on Q feedback learning. Experimental results showed that the proposed method outperformed the Link-Contexts, Best-First and Breadth-First for all test data sets.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 3rd International Conference on Mechatronics, Robotics and Automation
Series
Advances in Computer Science Research
Publication Date
April 2015
ISBN
10.2991/icmra-15.2015.167
ISSN
2352-538X
DOI
10.2991/icmra-15.2015.167How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Wei Dong
AU  - Hong Ni
AU  - Haojiang Deng
AU  - Liheng Tuo
PY  - 2015/04
DA  - 2015/04
TI  - Gray Tunneling Based on Joint Link for Focused Crawling
BT  - Proceedings of the 3rd International Conference on Mechatronics, Robotics and Automation
PB  - Atlantis Press
SP  - 859
EP  - 862
SN  - 2352-538X
UR  - https://doi.org/10.2991/icmra-15.2015.167
DO  - 10.2991/icmra-15.2015.167
ID  - Dong2015/04
ER  -