R. Devi Priya* and R. Sivaraj Pages 1 - 15 ( 15 )
Background: Microarray gene expression datasets usually contain a large number of genes which complicates further operations like classification, clustering and other kinds of analysis. During classification process, identification of salient genes is a brainstorming task and needs careful selection.
Methods: The classification on multiclass datasets is more critical when compared with binary classification. When there are multiple class labels, chances are more likely that the datasets are imbalanced. Large variations can be seen in number of samples belonging to each class and hence the classification process may go biased with incorrect samples chosen for training. There is no sufficient research work available to address all these three scenarios together in microarray datasets.
Results and Discussion: The paper fills this gap with the following contributions: i) Selects salient genes for classification using multiSURF algorithm ii) Identifies right instances from imbalanced datasets using Retained Tomek Link algorithm and iii) Performs gene selection for multiclass classification using Dynamic Length Particle Swarm Optimization (DPSO).
Conclusion: The proposed method is implemented on multi-class imbalanced microarray datasets and the final classification performance is seen to be encouraging and better than other compared methods.
Feature weighing, retained tomek link, dynamic PSO, apriori algorithm, microarray datasets
Department of Information Technology, Kongu Engineering College, Department of Computer Science and Engineering, Nandha Engineering College