Md. Bipul Hossen* and Md. Siraj-Ud-Doulah Pages 558 - 562 ( 5 )
Background: Cluster analysis techniques of gene expression microarray data is of increasing interest in the field of current bioinformatics. One of the reasons for this is the need for molecular-based refinement of broadly defined biological classes, with implications in cancer diagnosis, prognosis and treatment. And many algorithms have been developed for this problem.
Objective: However microarray data frequently include outliers, and how to treat these outlier's effects in the subsequent analysis-clustering.
Method: In this paper, we present the large-scale analysis of seven different agglomerative hierarchical clustering methods and five proximity measures for the analysis of 33 cancer gene expression datasets. As a case study, we used two experimental datasets: Affymetrix and cDNA, and different percent outliers were artificially added to these datasets.
Results: We found that ward method gives the highest corrected Rand index value with respect to the spearman proximity measures when datasets contain with and without outliers.
Conclusion: This study proves that ward method is more robust clustering methods in gene expression data analysis among other methods.
Agglomerative hierarchical clustering, corrected rand index, microarray gene expressions data, outlier, proximity measures.
Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur-5400, Department of Statistics, Faculty of Science, Begum Rokeya University, Rangpur-5400