Liang Kong*, Lichao Zhang and Shiqian He
Background: Gram-negative bacteria interact with their environment by secreting a wide range of particular substrates (such as proteins) across two lipid bilayers from the cytoplasm to the extracellular space. Determining the types of secreted proteins is beneficial for further research on secreted proteins and secretion systems.
Objective: As an essential alternative for experimental methods, an accurate machine learning-based multi-type Gram-negative bacterial secreted protein prediction method was proposed in this study.
Method: The main contribution is combining auto-cross-correlation analysis and feature ranking technology to build an effective support vector machine-based multi-type Gram-negative bacterial secreted protein predictor. The specifically designed auto-cross-correlation descriptor can capture evolutionary correlation information between amino acid pairs along protein sequence from position specific scoring matrices. Feature ranking technique was used to analyze and select the most informative features for building prediction model.
Results: Several kinds of prediction accuracies obtained by independent dataset test are reported on two benchmark datasets. Compared with the state-of-the-art prediction methods, the proposed method improves overall accuracies by 2.91% and 2.25%, respectively.
Conclusion: Our study will provide an important guide to utilize protein evolutionary information for further research on bacterial secreted proteins.
Gram-negative bacteria, secreted proteins, position specific scoring matrix, auto-cross correlation, feature ranking, support vector machine
School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao, PR , School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR , School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao, PR