Submit Manuscript  

Article Details

Identification of Disease-specific Single Amino Acid Polymorphisms Using a Simple Random Forest at Protein-level

[ Vol. 16 , Issue. 10 ]


Jian He, Rongao Yuan, Lei Xu, Yanzhi Guo* and Menglong Li*   Pages 1278 - 1287 ( 10 )


Background: The number of human genetic variants deposited into publicly available databases has been increasing exponentially. Among these variants, non-synonymous single nucleotide polymorphisms (nsSNPs), also known as single Amino Acid Polymorphisms (SAPs), have been demonstrated to be strongly correlated with phenotypic variations of traits/diseases.

Objective: However, the detailed mechanisms governing the disease association of SAPs remain unclear. Thus, further investigation of new attributes and improvement of the prediction becomes more and more urgent since amount of unknown disease-related SAPs need to be investigated.

Methods: Based on the principle of Random Forest (RF), we firstly constructed a new effective prediction model for SAPs associated with a particular disease from protein sequences. Four usual sequence signature extractions were separately performed to select the optimal features. Then SAP peptide lengths from 12 to 202 were also optimized.

Results: The optimal models achieve higher than 90% accuracy and Area Under the Curve (AUC) of over 0.9 on all 11 external testing datasets. Finally, the good performance on an independent test set with an accuracy higher than 95% proves the superiority of our method.

Conclusion: In this paper, based on Random Forest (RF), we constructed 11 disease-association prediction models for SAPs from the protein sequence level. All models yield prediction accuracy higher than 90% and Area Under the Curve (AUC) more than 0.9. Our method only using the information of protein sequences are more universal than those that depend on some additional information or predictions about the proteins.


Single amino acid polymorphisms, random forest, protein sequence, disease-specific prediction, binding site, optimal features.


College of Chemistry, Sichuan University, Chengdu, College of Computer Science, Sichuan University, Chengdu, College of Chemistry, Sichuan University, Chengdu, College of Chemistry, Sichuan University, Chengdu, College of Chemistry, Sichuan University, Chengdu

Graphical Abstract:

Read Full-Text article