Xuefei Peng, Lei Chen* and Jian-Peng Zhou Pages 1 - 13 ( 13 )
Cancer is the second cause of human death in the world. To date, many factors have been confirmed to be the cause of cancer. Among them, carcinogenic chemical has been widely accepted as the important one. Traditional methods for detecting carcinogenic chemicals are of low efficiency and high cost. It is urgent to design effective computational methods, which can provide extensive detections. In this study, a new computational model was proposed for detecting carcinogenic chemicals. As a data-driven model, carcinogenic and non-carcinogenic chemicals were obtained from Carcinogenic Potency Database (CPDB). These chemicals were represented by features extracted from five chemical networks, representing five types of chemical associations, via a network embedding method, Mashup. Obtained features were fed into a powerful deep learning method, recurrent neural network, to build the model. The jackknife test on such model provided the F-measure of 0.971 and AUROC of 0.971. Extensive comparisons were performed, suggesting that the proposed model was superior to the models with traditional machine learning algorithms, classic chemical encoding scheme or direct usage of chemical associations.
carcinogenicity, carcinogenic chemical, network embedding method, deep learning, recurrent neural network
College of Information Engineering, Shanghai Maritime University, Shanghai 201306, College of Information Engineering, Shanghai Maritime University, Shanghai 201306, College of Information Engineering, Shanghai Maritime University, Shanghai 201306