Research on Tibetan News Sites’ Web Crawler and Search Engine
- 10.2991/lemcs-15.2015.116How to use a DOI?
- Tibetan; News sites; Web crawler; Solr; Search engine.
In this paper, researchers detailedly introduce the features of Tibetan language and related technologies that researchers use to deal with Tibetan news web pages with computers. To get the content of the Tibetan news, researchers used web crawler to download Tibetan news pages which are the bases of this project. Researchers used an open source web crawler named scrapy and rewrote the crawl part to make the crawler work more accurate and efficient. To search the Tibetan content in a way, researchers define and count every statistical data that is useful and helpful to enhance the performance of our search engine, researchers use solr, another open source software, as the user interface of this system. The crawler and search engine are combined by the web pages to provide the data retrieval service. Comparing with other works, our work adopts a safe and stable enough framework to enhance the user experience in using Tibetan search engine. Our work played a positive role in the spread of Tibetan culture and promoted the development of the Tibetan language news in the field of search engines.
- © 2015, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Zhiqiang Han AU - Guixian Xu AU - Wei Sun PY - 2015/07 DA - 2015/07 TI - Research on Tibetan News Sites’ Web Crawler and Search Engine BT - Proceedings of the International Conference on Logistics, Engineering, Management and Computer Science PB - Atlantis Press SP - 607 EP - 611 SN - 1951-6851 UR - https://doi.org/10.2991/lemcs-15.2015.116 DO - 10.2991/lemcs-15.2015.116 ID - Han2015/07 ER -