Sivaraj Rajappan* and DeviPriya Rangasamy Pages 441 - 451 ( 11 )
Background: Microarray gene expression datasets contain huge volume of gene data to be used for cancer analysis but often suffer from “curse of dimensionality” and “missing values”. They prevent analysts from extracting right knowledge and often results in instable results.
Objective: To address both these issues, the paper proposes a novel algorithm based on Genetic Algorithm (GA).
Method: GA is commonly used for feature selection and treating missing values in microarray datasets. But, it often results in premature convergence due to insufficient exploration and exploitation. In the proposed Adaptive Genetic Algorithm (AGA), genetic parameters are dynamically determined based on the values in current generation in order to improve optimality of the solution. The population is divided into two sub-populations and crossover and mutation are performed in parallel on these sub-populations in order to speed up the execution and also to have modularity in the population for performing these operations. In this paper, the missing values are first imputed using AGA and again AGA is used to select significant features.
Results: The proposed methodology is implemented in different real microarray datasets to impute values at different missing proportions and to select prominent features. It is found that the datasets processed with AGA provides better results than the standard methods.
Conclusion: AGA can be implemented successfully in all datasets where the number of features is large and missing values are present. AGA preprocesses the datasets and prepares them for better classification.
Microarray dataset, feature selection, missing values, genetic algorithm, classification, Adaptive Genetic Algorithm.
Department of Computer Science and Engineering, Velalar College of Engineering and Technology, Erode, Tamil Nadu, Department of Information Technology, Kongu Engineering College, Erode, Tamil Nadu