Chinese Named Entity Extraction System Based On Word2vec Under Spark Platform
Jialu Yuan, Yongping Xiong
Available Online September 2016.
- https://doi.org/10.2991/amitp-16.2016.74How to use a DOI?
- Keywords Spark, word2vec, NER, neural network, machine learning
- This paper proposes a real-time system that support the Chinese named entity extractions, which through word2vec algorithm training language mode to obtain word vector, and by calculating the Euclidean distance between word vectors to extract Chinese named entity, and transplant algorithm to Spark platform, using the Spark distributed computing ability improve training efficiency. First the system cut corpus into words with the help of existing word segmentation and get the rough corpus, then trains the rough corpus by word2vec algorithm to obtain word vectors and extracts the first layer of named entity according clustering algorithm, finally, the system uses the Named Entity Extraction(NEE) algorithm to extract the named entities and realize it on the spark platform.
- Open Access
- This is an open access article distributed under the CC BY-NC license.
Cite this article
TY - CONF AU - Jialu Yuan AU - Yongping Xiong PY - 2016/09 DA - 2016/09 TI - Chinese Named Entity Extraction System Based On Word2vec Under Spark Platform BT - 2016 4th International Conference on Advanced Materials and Information Technology Processing (AMITP 2016) PB - Atlantis Press SN - 2352-538X UR - https://doi.org/10.2991/amitp-16.2016.74 DO - https://doi.org/10.2991/amitp-16.2016.74 ID - Yuan2016/09 ER -