Shulin Zhao, Ying Ju, Xiucai Ye, Jun Zhang* and Shuguang Han Pages 1 - 12 ( 12 )
Background: Bioluminescence is a unique and significant phenomenon in nature. Bioluminescence is important for the lifecycle of some organisms and is valuable in biomedical research, including for gene expression analysis and bioluminescence imaging technology.In recent years, researchers have identified a number of methods for predicting bioluminescent proteins (BLPs), which have increased in accuracy, but could be further improved.
Method: In this paper, we propose a new bioluminescent proteins prediction method based on a voting algorithm. We used four methods of feature extraction based on the amino acid sequence. We extracted 314 dimensional features in total from amino acid composition, physicochemical properties and k-spacer amino acid pair composition. In order to obtain the highest MCC value to establish the optimal prediction model, then used a voting algorithm to build the model.To create the best performing model, we discuss the selection of base classifiers and vote counting rules.
Results: Our proposed model achieved 93.4% accuracy, 93.4% sensitivity and 91.7% specificity in the test set, which was better than any other method. We also improved a previous prediction of bioluminescent proteins in three lineages using our model building method, resulting in greatly improved accuracy.
Bioluminescent proteins, Prediction, Feature extraction, Voting algorithm, Base classifiers, Vote counting rules.
Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, School of Informatics, Xiamen University, Xiamen, Department of Computer Science, University of Tsukuba, Tsukuba Science City, Rehabilitation Department, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu