11th Joint International Conference on Information Sciences

A Phrase Combination Approach to Patent SMT

Authors
Junguo Zhu 0, Muyun Yang, Tiejun Zhao, Sheng Li, Qi Haoliang
Corresponding Author
Junguo Zhu
0Harbin Institute of Technology
Available Online December 2008.
DOI
https://doi.org/10.2991/jcis.2008.99How to use a DOI?
Keywords
statistical machine translation, patent, phrase combination, word segmentation
Abstract
This paper presents a phrase combination approach to patent SMT (Statistical Ma-chine Translation) for Japanese to English. To minimize the segmentation problems caused by the rich OOV (out-of-vocabulary) words in the patent texts, the character based translation phrases are first introduced to avoid the segmentation errors in translation modeling. Then the word based translation phrases, which are established to utilize the dependent word level information, are combined with character translation table by linearly integrating their probability. Our experiments on NTCIR corpus indicate that the proposed method significantly out-performed the originally word based approach.
Open Access
This is an open access article distributed under the CC BY-NC license.

Download article (PDF)

Proceedings
11th Joint International Conference on Information Sciences
Part of series
Advances in Intelligent Systems Research
Publication Date
December 2008
ISBN
978-90-78677-18-5
ISSN
1951-6851
DOI
https://doi.org/10.2991/jcis.2008.99How to use a DOI?
Open Access
This is an open access article distributed under the CC BY-NC license.

Cite this article

TY  - CONF
AU  - Junguo Zhu
AU  - Muyun Yang
AU  - Tiejun Zhao
AU  - Sheng Li
AU  - Qi Haoliang
PY  - 2008/12
DA  - 2008/12
TI  - A Phrase Combination Approach to Patent SMT
BT  - 11th Joint International Conference on Information Sciences
PB  - Atlantis Press
SP  - 590
EP  - 594
SN  - 1951-6851
UR  - https://doi.org/10.2991/jcis.2008.99
DO  - https://doi.org/10.2991/jcis.2008.99
ID  - Zhu2008/12
ER  -