Journal of Statistical Theory and Applications

Volume 15, Issue 3, September 2016, Pages 221 - 236

Missing Value Imputation for RNA-Sequencing Data Using Statistical Models: A Comparative Study

Authors
Taban Baghfalaki, Mojtaba Ganjali, Damon Berridge
Corresponding Author
Taban Baghfalaki
Received 13 August 2013, Accepted 4 April 2016, Available Online 1 September 2016.
DOI
10.2991/jsta.2016.15.3.3How to use a DOI?
Keywords
Bayesian approach; Clustering analysis; EM algorithm; Missing data analysis; RNA-seq data set.
Abstract

RNA-seq technology has been widely used as an alternative approach to traditional microarrays in transcript analysis. Sometimes gene expression by sequencing, which generates RNA-seq data set, may have missing read counts. These missing values can adversely affect downstream analyses. Most of the methods for analysing the RNA-seq data sets require a complete matrix of RNA-seq data. In the past few years, researchers have been putting a great deal of effort into presenting evaluations of the different imputation algorithms in microarray gene expression data sets, However, these are limited works for RNA-seq data sets and a comparative study for investigating the performance of the missing value imputation for RNA-seq data is essential. In this paper, we propose the use of some parametric models such as Regression imputation, Bayesian generalized linear model, Poisson mixture model, EM approach , Bayesian Poisson regression, Bayesian quasi-Poisson regression and the Bootstrap version of two latter for single imputation of missing values in RNA-seq count data sets. The approaches are also applied for identifying differentially expressed genes in the presence of missing values. Multiple imputation, proposed by Rubin (1978), is also used for multiple imputation of missing RNA-seq counts. This approach allows appropriate assessment of imputation uncertainty for missing values. The performance of the single and multiple imputations are investigated using some simulation studies. Also, some real data sets are analyzed using the proposed approaches.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Journal
Journal of Statistical Theory and Applications
Volume-Issue
15 - 3
Pages
221 - 236
Publication Date
2016/09/01
ISSN (Online)
2214-1766
ISSN (Print)
1538-7887
DOI
10.2991/jsta.2016.15.3.3How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - JOUR
AU  - Taban Baghfalaki
AU  - Mojtaba Ganjali
AU  - Damon Berridge
PY  - 2016
DA  - 2016/09/01
TI  - Missing Value Imputation for RNA-Sequencing Data Using Statistical Models: A Comparative Study
JO  - Journal of Statistical Theory and Applications
SP  - 221
EP  - 236
VL  - 15
IS  - 3
SN  - 2214-1766
UR  - https://doi.org/10.2991/jsta.2016.15.3.3
DO  - 10.2991/jsta.2016.15.3.3
ID  - Baghfalaki2016
ER  -