Submit Manuscript  

Article Details


GASPIDs versus Non-GASPIDs - Differentiation Based on Machine Learning Approach

Author(s):

Fawad Ahmad, Saima Ikram, Jamshaid Ahmad*, Waseem Ullah, Fahad Hassan, Saeed Ullah Khattak and Irshad Ur Rehman   Pages 1 - 10 ( 10 )

Abstract:


Background: Peptidases are a group of enzymes which catalyzes the cleavage of peptide bonds. Around 2-3% of the whole genome codes for proteases and about one third of all known proteases are serine proteases which are divided into 13 clans and 40 families. They are involved in diverse physiological roles such as digestion, coagulation of blood, fibrinolysis, processing of proteins and prohormones, signaling pathways, complement fixation and have a vital role in immune defense system. Based on their functions, they can broadly be divided into two classes; GASPIDs (Granule Associated Serine Peptidases involved in Immune Defense System) and Non-GASPIDs. GASPIDs, in particular are involved in immune associated functions i.e. initiating apoptosis to kill virally infected and cancerous cells, cytokine modulation for generation of inflammatory responses and direct killing of pathogens through phagosomes.

Methods: In this study, sequence-based characterization of these two types of serine proteases is performed. We first identified sequences by analyzing multiple online databases as well as by analyzing whole genomes of different species from different orthologous and nonorthologous species. Sequences were identified by devising a distinct criterion to differentiate GASPIDs from Non-GASPIDs. The translated version of these sequences were then subjected to feature extraction. Using these distinctive features, we differentiated GASPIDs from NonGASPIDs by applying multiple supervised machine learning models.

Results and Conclusion: Our results show that, among the three classifiers used in this study, SVM classifier coupled with tripeptide as feature method, has shown the best accuracy in classification of sequences as GASPIDs and Non-GASPIDs.

Keywords:

Serine proteases, GASPIDs, immune system, genome analysis, machine learning, support vector machine, random forest, k nearest neighbor.

Affiliation:

Centre of Biotechnology & Microbiology, University of Peshawar, Peshawar, Centre of Biotechnology & Microbiology, University of Peshawar, Peshawar, Centre of Biotechnology & Microbiology, University of Peshawar, Peshawar, College of Software Convergence, Sejong University, Seoul, Centre of Biotechnology & Microbiology, University of Peshawar, Peshawar, Centre of Biotechnology & Microbiology, University of Peshawar, Peshawar, Centre of Biotechnology & Microbiology, University of Peshawar, Peshawar



Read Full-Text article