Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science

Research on Tibetan News Sites’ Web Crawler and Search Engine

Authors
Zhiqiang Han, Guixian Xu, Wei Sun
Corresponding Author
Zhiqiang Han
Available Online July 2015.
DOI
10.2991/lemcs-15.2015.116How to use a DOI?
Keywords
Tibetan; News sites; Web crawler; Solr; Search engine.
Abstract

In this paper, researchers detailedly introduce the features of Tibetan language and related technologies that researchers use to deal with Tibetan news web pages with computers. To get the content of the Tibetan news, researchers used web crawler to download Tibetan news pages which are the bases of this project. Researchers used an open source web crawler named scrapy and rewrote the crawl part to make the crawler work more accurate and efficient. To search the Tibetan content in a way, researchers define and count every statistical data that is useful and helpful to enhance the performance of our search engine, researchers use solr, another open source software, as the user interface of this system. The crawler and search engine are combined by the web pages to provide the data retrieval service. Comparing with other works, our work adopts a safe and stable enough framework to enhance the user experience in using Tibetan search engine. Our work played a positive role in the spread of Tibetan culture and promoted the development of the Tibetan language news in the field of search engines.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science
Series
Advances in Intelligent Systems Research
Publication Date
July 2015
ISBN
10.2991/lemcs-15.2015.116
ISSN
1951-6851
DOI
10.2991/lemcs-15.2015.116How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Zhiqiang Han
AU  - Guixian Xu
AU  - Wei Sun
PY  - 2015/07
DA  - 2015/07
TI  - Research on Tibetan News Sites’ Web Crawler and Search Engine
BT  - Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science
PB  - Atlantis Press
SP  - 607
EP  - 611
SN  - 1951-6851
UR  - https://doi.org/10.2991/lemcs-15.2015.116
DO  - 10.2991/lemcs-15.2015.116
ID  - Han2015/07
ER  -