Sun Can Zhuang and Feng Yonge*
Background: Intrinsically disordered proteins lack a well-defined three-dimensional structure under physiological conditions. They have performed multiple functions in life activities and are closely related to many human diseases. The identification of the disordered region of intrinsically disordered proteins is important to protein functions annotation.
Objective: Accurately identify the disordered regions in intrinsically disordered proteins.
Method: In this article, we constructed a multi-feature fusion model based on support vector machine to predict disordered regions of intrinsically disordered proteins from the Disport database. We extracted codons usage frequencies, GC content, protein secondary structure components, hydrophilic-hydrophobic amino acidscomponents, and chemical shifts as features to predict the disordered regionsofintrinsically disordered proteins.
Results: The best accuracy is 82.098% by using codons frequenciesin single feature prediction.In order to improve the performance, we fused these features and obtained the best result of 83.173%in combining codons frequencies with chemical shifts as the feature.
Conclusion: The results show that our model has achieved a good prediction result in predicting disordered regions of intrinsically disordered proteins. Moreover, the performances of our modelare better than those of existing methods.
Intrinsically Disordered Proteins, Support Vector Machine, GC content, Codon usage frequency, Chemical shifts
College of Science, Inner Mongolia Agriculture University, Hohhot 010018, College of Science, Inner Mongolia Agriculture University, Hohhot 010018