Chaokun Yan, Mengyuan Li, Jingjing Ma, Yi Liao, Huimin Luo*, Jianlin Wang* and Junwei Luo
Background: The massive amount of biomedical data accumulated in the past decades can be utilized for diagnosing disease.
Objective: However, its high dimensionality, small sample sizes, and irrelevant features often have a negative influence on the accuracy and speed of disease prediction. Some existing machine learning models cannot capture the patterns on these datasets accurately without utilizing feature selection.
Methods: Filter and wrapper are two prevailing feature selection methods. The filter method is fast but has low prediction accuracy, while the latter can obtain high accuracy but has a formidable computation cost. Given the drawbacks of using filter or wrapper individually, a novel feature selection method, called MRMR-EFPATS, is proposed, which hybridizes filter method minimum redundancy maximum relevance (MRMR) and wrapper method based on an improved flower pollination algorithm (FPA). First, MRMR is employed to rank and screen out some important features quickly. These features are further chosen into population individual of the following wrapper method for faster convergence and less computational time. Then, due to its efficiency and flexibility, FPA is adopted to further discover an optimal feature subset.
Result: FPA still has some drawbacks such as slow convergence rate, inadequacy in terms of searching for new solutions, and tends to be trapped in local optima. In our work, an elite strategy is adopted to improve the convergence speed of the FPA. Tabu search and Adaptive Gaussian Mutation are employed to improve the search capability of FPA and escape from local optima. Here, the KNN classifier with the 5-fold-CV is utilized to evaluate the classification accuracy.
Conclusion: Extensive experimental results on six public high dimensional biomedical datasets show that the proposed MRMR-EFPATS has achieved superior performance compared with other state-of-the-art methods.
feature selection, flower pollination algorithm, MRMR, elite strategy, adaptive gaussian mutation, tabu Search.
School of Computer and Information Engineering, Henan University, Kaifeng, School of Computer and Information Engineering, Henan University, Kaifeng, Centralized operation center,China Mobile Online Service Co. Ltd, Academy of Arts & Design, Tsinghua University, Beijing, School of Computer and Information Engineering, Henan University, Kaifeng, School of Computer and Information Engineering, Henan University, Kaifeng, College of Computer Science and Technology, Henan Polytechnic University, Jiaozuo