Submit Manuscript  

Article Details


PEPRF: Identification of Essential Proteins by Integrating Topological Features of PPI Network and Sequence-based Features via Random Forest

[ Vol. 16 , Issue. 9 ]

Author(s):

Chuanyan Wu*, Bentao Lin, Kai Shi, Qingju Zhang, Rui Gao*, Zhiguo Yu*, Yang De Marinis*, Yusen Zhang and Zhi-Ping Liu   Pages 1161 - 1168 ( 8 )

Abstract:


Background: Essential proteins play an important role in the process of life, which can be identified by experimental methods and computational approaches. Experimental approaches to identify essential proteins are of high accuracy but with the limitation of time and resource-consuming.

Objective: Herein, we present a computational model (PEPRF) to identify essential proteins based on machine learning.

Methods: Different features of proteins were extracted. Topological features of Protein-Protein Interaction (PPI) network-based are extracted. Based on the protein sequence, graph theory-based features, information- based features, composition and physichemical features, etc., were extracted. Finally, 282 features are constructed. In order to select the features that contributed most to the identification, ReliefF- based feature selection method was adopted to measure the weights of these features.

Results: As a result, 212 features were curated to train random forest classifiers. Finally, PEPRF get the AUC of 0.71 and an accuracy of 0.742.

Conclusion: Our results show that PEPRF may be applied as an efficient tool to identify essential proteins.

Keywords:

Essential protein prediction, graph energy, feature extraction, ReliefF-based feature selection, random forest classifier, PEPRF.

Affiliation:

School of Intelligent Engineering, Shandong Management University, Jinan 250357, School of Intelligent Engineering, Shandong Management University, Jinan 250357, Department of Traffic Engineering, Shandong Transport Vocational College, Weifang 261000, School of Intelligent Equipment, Shandong University of Science and Technology, Taian 271019, School of Control Science and Engineering, Shandong University, Jinan 250061, School of Intelligent Engineering, Shandong Management University, Jinan 250357, Diabetes and Endocrinology, Lund University, Malmo 20502, School of Mathematics and Statistics, Shandong University, Weihai 264209, School of Control Science and Engineering, Shandong University, Jinan 250061

Graphical Abstract:



Read Full-Text article