Proceedings of the International Conference on Computer Networks and Communication Technology (CNCT 2016)

Exploiting Document Boltzmann Machine in Query Extension

Authors
Li-ming HUANG, Xiao-zhao ZHAO, Yue-xian HOU, Ya-ping ZHANG
Corresponding Author
Li-ming HUANG
Available Online December 2016.
DOI
10.2991/cnct-16.2017.80How to use a DOI?
Keywords
Document Boltzmann Machine, Query Extension , Model Selection, CIF.
Abstract

Most work related to query extension (QE) adopted the assumption that terms in a document are independent, and multinomial distribution is widely used for feedback documents modeling in lots of QE models. We argue that in QE methods, the relevance model (RM) which generates the feedback documents should be modeled with a more suitable distribution, in order to naturally handle the term associations in feedback document. Recently, Document Boltzmann Machine (DBM) was proposed for document modeling in information retrieval, and this model can relax the independence assumption, i.e., can capture the term dependency naturally. It has been shown that DBM can be seen as the generalization of traditional unigram language model and achieves better ad hoc retrieval performance. In this paper, we replace the multinomial distribution in the traditional unigram RM method with DBM, while leaving the main QE framework unchanged to keep the model uncomplicated. Thus, the relevance model is estimated by the DBM trained on feedback documents, called relevance DBM (rDBM). The extended query is generated from the learnt rDBM, and we give the final extended query likelihood according to the parameter values in rDBM. One difficulty in learning rDBM is the problem of data sparseness, which could lead to over fitted rDBMs and harm the retrieval performance. To solve this problem, we adopt Confident Information First (CIF)as model selection principle to reduce the complexity of rDBM, which lead our proposed query extension method more efficient and practical. Experiments on several standard TREC collections show the effectiveness of our QE method with DBM and model selection method.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the International Conference on Computer Networks and Communication Technology (CNCT 2016)
Series
Advances in Computer Science Research
Publication Date
December 2016
ISBN
10.2991/cnct-16.2017.80
ISSN
2352-538X
DOI
10.2991/cnct-16.2017.80How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Li-ming HUANG
AU  - Xiao-zhao ZHAO
AU  - Yue-xian HOU
AU  - Ya-ping ZHANG
PY  - 2016/12
DA  - 2016/12
TI  - Exploiting Document Boltzmann Machine in Query Extension
BT  - Proceedings of the International Conference on Computer Networks and Communication Technology (CNCT 2016)
PB  - Atlantis Press
SP  - 585
EP  - 592
SN  - 2352-538X
UR  - https://doi.org/10.2991/cnct-16.2017.80
DO  - 10.2991/cnct-16.2017.80
ID  - HUANG2016/12
ER  -