Chinese Named Entity Extraction System Based On Word2vec Under Spark Platform
- https://doi.org/10.2991/amitp-16.2016.74How to use a DOI?
- Keywords Spark, word2vec, NER, neural network, machine learning
This paper proposes a real-time system that support the Chinese named entity extractions, which through word2vec algorithm training language mode to obtain word vector, and by calculating the Euclidean distance between word vectors to extract Chinese named entity, and transplant algorithm to Spark platform, using the Spark distributed computing ability improve training efficiency. First the system cut corpus into words with the help of existing word segmentation and get the rough corpus, then trains the rough corpus by word2vec algorithm to obtain word vectors and extracts the first layer of named entity according clustering algorithm, finally, the system uses the Named Entity Extraction(NEE) algorithm to extract the named entities and realize it on the spark platform.
- © 2016, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Jialu Yuan AU - Yongping Xiong PY - 2016/09 DA - 2016/09 TI - Chinese Named Entity Extraction System Based On Word2vec Under Spark Platform BT - Proceedings of the 2016 4th International Conference on Advanced Materials and Information Technology Processing (AMITP 2016) PB - Atlantis Press SP - 387 EP - 394 SN - 2352-538X UR - https://doi.org/10.2991/amitp-16.2016.74 DO - https://doi.org/10.2991/amitp-16.2016.74 ID - Yuan2016/09 ER -