Submit Manuscript  

Article Details


A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis

[ Vol. 14 , Issue. 1 ]

Author(s):

Nishith Kumar*, Md. Aminul Hoque, Md. Shahjaman, S.M. Shahinul Islam and Md. Nurul Haque Mollah   Pages 43 - 52 ( 10 )

Abstract:


Background: Metabolomics data generation and quantification are different from other types of molecular “omics” data in bioinformatics. Mass spectrometry (MS) based (gas chromatography mass spectrometry (GC-MS), liquid chromatography mass spectrometry (LC-MS), etc.) metabolomics data frequently contain missing values that make some quantitative analysis complex. Typically metabolomics datasets contain 10% to 20% missing values that originate from several reasons, like analytical, computational as well as biological hazard. Imputation of missing values is a very important and interesting issue for further metabolomics data analysis.

Objective: This paper introduces a new algorithm for missing value imputation in the presence of outliers for metabolomics data analysis.

Method: Currently, the most well known missing value imputation techniques in metabolomics data are knearest neighbours (kNN), random forest (RF) and zero imputation. However, these techniques are sensitive to outliers. In this paper, we have proposed an outlier robust missing imputation technique by minimizing twoway empirical mean absolute error (MAE) loss function for imputing missing values in metabolomics data.

Results: We have investigated the performance of the proposed missing value imputation technique in a comparison of the other traditional imputation techniques using both simulated and real data analysis in the absence and presence of outliers.

Conclusion: Results of both simulated and real data analyses show that the proposed outlier robust missing imputation technique is better performer than the traditional missing imputation methods in both absence and presence of outliers.

Keywords:

Metabolomics, missing data, missing value imputation, singular value decomposition (SVD), receiver operating characteristic (ROC) curve, support vector machine.

Affiliation:

Department of Statistics, Rajshahi University, Rajshahi-6205, Department of Statistics, Rajshahi University, Rajshahi-6205, Department of Statistics, Rajshahi University, Rajshahi-6205, Institute of Biological Sciences, Rajshahi University, Rajshahi-6205, Department of Statistics, Rajshahi University, Rajshahi-6205

Graphical Abstract:



Read Full-Text article