HaiXia Long, Mi Wang* and HaiYan Fu Pages 233 - 238 ( 6 )
Background: Protein hydroxyproline is one type of post translational modification (PTM). Because protein sequence contains many uncharacterized residues of P, the question that needs to be answered is: Which ones can be hydroxylated, and which ones cannot? The solution will not only give a deeper understanding of the hydroxylation mechanism but can also lead to drug development. The evergrowing demand for better handling of protein sequences in the post-genomic age presents new prediction challenges.Objective: To address these challenges, developing computational methods to identify these sites quickly and accurately is our objective. Method: We propose a new approach for predicting hydroxyproline using the deep learning model known as the convolutional neural network (CNN), and employed a pseudo amino acid composition (PseAAC) to identify these proteins and used the position-specific scoring matrix (PSSM) to represent samples as input to the CNN model. Results and Conclusion: In our experiment, K-fold cross-validation testing on benchmark datasets further demonstrated the potential for CNN identification of protein hydroxyproline as well as other PTM type proteins.
Protein hydroxyproline, deep learning, convolutional neural network, pseudo amino acid composition (PseAAC), position-specific scoring matrix (PSSM).
Department of Information Science and Technology, HaiNan Normal University, HaiKou 571158, Department of Information Science and Technology, HaiNan Normal University, HaiKou 571158, Department of Information Science and Technology, HaiNan Normal University, HaiKou 571158