A Distributed Chinese Naive Bayes Classifier Based on Word Embedding

Mengke Feng; Guoshi Wu

doi:10.2991/icmmct-16.2016.222

<Previous Article In Volume

Next Article In Volume>

A Distributed Chinese Naive Bayes Classifier Based on Word Embedding

Authors

Mengke Feng, Guoshi Wu

Corresponding Author

Mengke Feng

Available Online March 2016.

DOI: 10.2991/icmmct-16.2016.222 How to use a DOI?
Keywords: Naive Bayes, Word Embedding, Distributed Classifier.
Abstract: The Naive Bayes classifier is built on the assumption of conditional independence between the attributes in a given class. The algorithm has been shown to be successful in text classification. But when calculating the conditional probability these methods take two different words as two different feature, no matter how close their meanings are.In this paper we proposed an algorithm to improve the calculation of probability that a word belonging to a class by using its related words base on word embedding and we named this model NBCBWE (Naive Bayes classifier based on word embedding). Word embedding provides a way of applying Deep Learning to solve natural language processing problems. In this way every word can be represented by a vector, and we can get the related word by calculate the similarity of two words. Whatâ€™s more, as the data set grows larger, it can be very time consuming to store and classify text in a single computer. To decrease the consuming time, we parallel Bayes classification algorithm using Map-Reduceto implement the model on Hadoop. Our experiments shows that our model improves the precision in text classification and also processes more efficiently.
Copyright: © 2016, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2016 4th International Conference on Machinery, Materials and Computing Technology
Series: Advances in Engineering Research
Publication Date: March 2016
ISBN: 978-94-6252-165-0
ISSN: 2352-5401
DOI: 10.2991/icmmct-16.2016.222 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Mengke Feng
AU  - Guoshi Wu
PY  - 2016/03
DA  - 2016/03
TI  - A Distributed Chinese Naive Bayes Classifier Based on Word Embedding
BT  - Proceedings of the 2016 4th International Conference on Machinery, Materials and Computing Technology
PB  - Atlantis Press
SP  - 1120
EP  - 1126
SN  - 2352-5401
UR  - https://doi.org/10.2991/icmmct-16.2016.222
DO  - 10.2991/icmmct-16.2016.222
ID  - Feng2016/03
ER  -

download .riscopy to clipboard