Md. Shahjaman*, Nishith Kumar and Md. Nurul Haque Mollah
DNA microarray technology allows researchers to measure the expression levels of thousands of genes simultaneously. The main objective of microarray gene expression (GE) data analysis is to detect biomarker genes that are differentially expressed (DE) between two or more experimental groups/conditions. There are some popular statistical methods in the literature for selection of biomarker genes. However, most of them often produce misleading results in presence of outliers. Therefore, in this paper, we propose an outlier detection and modification rule to improve the performance of popular gene selection methods. We investigate the performance of the proposed method in a comparison of the traditional method using both simulated and real gene expression data analysis. From a real colon cancer gene expression data analysis, the proposed method detected additional fourteen (14) DE genes that were not detected by the traditional methods. Using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, we observed that these additional 14 DE genes involve in three important metabolic pathways of cancer disease. The proposed method also detected nine (9) additional DE genes from another head-and-neck cancer gene expression data analysis, those involve in top ten metabolic pathways obtain from the KEGG pathway database.
Gene expression data, Outlier detection and modification, DE gene, Statistical methods and robustness
Department of Statistics, Begum Rokeya University, Rangpur, Rangpur-5400, Department of Statistics, Bangabandhu Sheikh Mujibur Rahman Science and Technology University, Gopalganj, Laboratory of Bioinformatics, University of Rajshahi, Rajshahi-6205