Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019)

An approach to vocabulary expansion for neural network language model by means of hierarchical clustering

Authors
Pavel V. Dudarin, Nadezhda G. Yarushkina
Corresponding Author
Pavel V. Dudarin
Available Online August 2019.
DOI
10.2991/eusflat-19.2019.85How to use a DOI?
Keywords
NLP Language model Neural Network RNN ULMFiT Clustering Fuzzy graph clustering Word-to-vec
Abstract

Neural network language models become the main tool to solve tasks in NLP field. These models already have shown state-of-the-art results in classification, translation, named entity recognition and so on. Pre-trained models are distributed freely in the internet, and could be reused with help of transfer learning techniques. However, the real life problem's domain could differ from the origin domain which the network was trained. In this paper an approach to vocabulary expansion for neural network language model by means of hierarchical clustering is proposed. This technique allows to adopt pre-rained language model to a different domain. Firstly, tokens from the language model are hierarchically clustered. Then new words from problem's domain are matched to the tokens accordingly obtained hierarchy. In the experimental part the proposed approach is demonstrated on the slightly modified ULMFiT language model.

Copyright
© 2019, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019)
Series
Atlantis Studies in Uncertainty Modelling
Publication Date
August 2019
ISBN
10.2991/eusflat-19.2019.85
ISSN
2589-6644
DOI
10.2991/eusflat-19.2019.85How to use a DOI?
Copyright
© 2019, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Pavel V. Dudarin
AU  - Nadezhda G. Yarushkina
PY  - 2019/08
DA  - 2019/08
TI  - An approach to vocabulary expansion for neural network language model by means of hierarchical clustering
BT  - Proceedings of the 11th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT 2019)
PB  - Atlantis Press
SP  - 614
EP  - 618
SN  - 2589-6644
UR  - https://doi.org/10.2991/eusflat-19.2019.85
DO  - 10.2991/eusflat-19.2019.85
ID  - Dudarin2019/08
ER  -