Liuyuan Chen, Juntao Li* and Mingming Chang
Diagnosing cancer and identifying the disease gene by using DNA microarray gene expression data are the hot topics in current bioinformatics. This paper is devoted to the latest development of cancer diagnosis and gene selection via statistical machine learning. Support vector machine is firstly introduced for the binary cancer diagnosis. Then, 1_norm support vector machine, doubly regularized support vector machine, adaptive huberized support vector machine and other extensions are presented to improve the performance of gene selection. Lasso, elastic net, partly adaptive elastic net, group lasso, sparse group lasso, adaptive sparse group lasso and other sparse regression methods are also introduced for performing simultaneous binary cancer classification and gene selection. In addition to introducing three strategies for reducing multiclass to binary, methods of directly considering all classes of data in a learning model (multi_class support vector, sparse multinomial regression, adaptive multinomial regression and so on) are presented for performing multiple cancer diagnosis. Limitations and promising directions are also discussed.
Cancer diagnosis, gene selection, machine learning, support vector machine, lasso, group lasso
Journal Editorial Office, Henan Normal University, Xinxiang, 453007, Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007, Henan Engineering Laboratory for Big Data Statistical Analysis and Optimal Control, College of Mathematics and Information Science, Henan Normal University, Xinxiang, 453007