Shicai Liu, Hailin Tang, Hongde Liu and Jinke Wang* Pages 1 - 14 ( 14 )
Background: The advancement of bioinformatics and machine learning has facilitated the diagnosis of cancer and discovery of omics-based biomarkers.
Objective: Our study employed a novel data-driven approach to classify the normal samples and different types of gastrointestinal cancer samples, to find potential biomarkers for effective diagnosis and prognosis assessment of gastrointestinal cancer patients.
Methods: Different feature selection methods were used and the diagnostic performance of the proposed biosignatures was benchmarked using support vector machine (SVM) and random forest (RF) models.
Results: All models showed satisfactory performance in which Multilabel-RF appeared to be the best. The accuracy of the Multilabel-RF based model was 83.12%, with precision, recall, F1 and Hamming-Loss of 79.70%, 68.31%, 0.7357 and 0.1688, respectively. Moreover, proposed biomarker signatures were highly associated with multifaceted hallmarks in cancer. Functional enrichment analysis and impact of the biomarker candidates in the prognosis of the patients were also examined.
Conclusion: We successfully introduce a solid workflow based on multi-label learning with High-Throughput Omics for diagnosis of cancer and identification of novel biomarkers. Novel transcriptome biosignatures that may improve the diagnostic accuracy in gastrointestinal cancer are introduced for further validations in various clinical settings.
gastrointestinal cancer, machine learning, multi-label learning, transcriptomics, diagnostic biomarkers
State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096