Proceedings of the 6th International Conference on Information Engineering for Mechanics and Materials

Research on the key technologies of corpus preprocessing in Mongolian-Chinese SMT

Authors
Jin-ting Li, Hong-xu Hou, Jing Wu, Hong-bin Wang, Wen-ting Fan
Corresponding Author
Jin-ting Li
Available Online November 2016.
DOI
10.2991/icimm-16.2016.119How to use a DOI?
Keywords
Mongolian-Chinese SMT; corpus preprocessing; Mongolian morphological analysis; Chinese word segmentation.
Abstract

The traditional preprocessing method in morphology analysis uses Mongolian suffix segmentation and stemming. But there exists many cases in Mongolian. If the case is not processed, the corpus will suffer from data sparse problem and lead to poor translation performance. Therefore, we sum-marize and research the existing corpus preprocessing method, and focus on the effect of case pro-cessing, in order to improving the performance of Mongolian-Chinese SMT by analyzing Mongolian morphological. Experiments show improvements of about 3.22 relative in the BLEU score of SMT over baseline system 1 by optimizing the preprocessing method.

Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 6th International Conference on Information Engineering for Mechanics and Materials
Series
Advances in Engineering Research
Publication Date
November 2016
ISBN
10.2991/icimm-16.2016.119
ISSN
2352-5401
DOI
10.2991/icimm-16.2016.119How to use a DOI?
Copyright
© 2016, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Jin-ting Li
AU  - Hong-xu Hou
AU  - Jing Wu
AU  - Hong-bin Wang
AU  - Wen-ting Fan
PY  - 2016/11
DA  - 2016/11
TI  - Research on the key technologies of corpus preprocessing in Mongolian-Chinese SMT
BT  - Proceedings of the 6th International Conference on Information Engineering for Mechanics and Materials
PB  - Atlantis Press
SP  - 665
EP  - 672
SN  - 2352-5401
UR  - https://doi.org/10.2991/icimm-16.2016.119
DO  - 10.2991/icimm-16.2016.119
ID  - Li2016/11
ER  -