A Distributed Chinese Naive Bayes Classifier Based on Word Embedding
- DOI
- 10.2991/icmmct-16.2016.222How to use a DOI?
- Keywords
- Naive Bayes, Word Embedding, Distributed Classifier.
- Abstract
The Naive Bayes classifier is built on the assumption of conditional independence between the attributes in a given class. The algorithm has been shown to be successful in text classification. But when calculating the conditional probability these methods take two different words as two different feature, no matter how close their meanings are.In this paper we proposed an algorithm to improve the calculation of probability that a word belonging to a class by using its related words base on word embedding and we named this model NBCBWE (Naive Bayes classifier based on word embedding). Word embedding provides a way of applying Deep Learning to solve natural language processing problems. In this way every word can be represented by a vector, and we can get the related word by calculate the similarity of two words. What’s more, as the data set grows larger, it can be very time consuming to store and classify text in a single computer. To decrease the consuming time, we parallel Bayes classification algorithm using Map-Reduceto implement the model on Hadoop. Our experiments shows that our model improves the precision in text classification and also processes more efficiently.
- Copyright
- © 2016, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Mengke Feng AU - Guoshi Wu PY - 2016/03 DA - 2016/03 TI - A Distributed Chinese Naive Bayes Classifier Based on Word Embedding BT - Proceedings of the 2016 4th International Conference on Machinery, Materials and Computing Technology PB - Atlantis Press SP - 1120 EP - 1126 SN - 2352-5401 UR - https://doi.org/10.2991/icmmct-16.2016.222 DO - 10.2991/icmmct-16.2016.222 ID - Feng2016/03 ER -