Proceedings of the 2015 International Conference on Education, Management and Computing Technology

A novel approach for building Domain-specific Lexical Repository with Chinese Wikipedia

Authors
Zhijian Ruan, Xiu Li
Corresponding Author
Zhijian Ruan
Available Online June 2015.
DOI
10.2991/icemct-15.2015.230How to use a DOI?
Keywords
Domain-specific lexical repository, Domain Corpus, Domain Relatedness, Modified Explicit Semantic Analysis, Chinese-Wikipedia.
Abstract

Domain ontology is a collection of domain-specific concepts and their interrelationships, which provide an abstract view of the application domain and is used in many areas such as semantic mining(SM) and natural language processing(NLP). But the direct construction of Domain ontology manually is labor intensive and time consuming, while auto-generated Domain-specific Lexical Repository can be used to build domain ontology as an indispensable component. In this paper, we propose a two-stage method to build domain-specific lexical repository making use of the dump service of Chinese Wikipedia. The main idea is that only concepts strongly semantic-related to the multi roots we choose are incorporate into the repository. First we use the dump service for all pages(zhwiki-all-pages.xml) of Chinese Wikipedia to generate a graph of all Wikipedia concepts, we call it pre-stage. Then we enter stage one by selecting three top-level nodes as roots, traversing the graph generated in the pre-stage using BFS-like algorithm to form spanning trees and computing rough domain relatedness of these nodes at the same time. Finally, in stage two we use the novel Modified Explicit Semantic Analysis method combined with the results we got in stage one to compute the ultimate domain relatedness. The experimental results shows that our method could get a high-quality domain-specific lexical repository.

Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2015 International Conference on Education, Management and Computing Technology
Series
Advances in Social Science, Education and Humanities Research
Publication Date
June 2015
ISBN
10.2991/icemct-15.2015.230
ISSN
2352-5398
DOI
10.2991/icemct-15.2015.230How to use a DOI?
Copyright
© 2015, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Zhijian Ruan
AU  - Xiu Li
PY  - 2015/06
DA  - 2015/06
TI  - A novel approach for building Domain-specific Lexical Repository with Chinese Wikipedia
BT  - Proceedings of the 2015 International Conference on Education, Management and Computing Technology
PB  - Atlantis Press
SP  - 1093
EP  - 1100
SN  - 2352-5398
UR  - https://doi.org/10.2991/icemct-15.2015.230
DO  - 10.2991/icemct-15.2015.230
ID  - Ruan2015/06
ER  -