Submit Manuscript  

Article Details


A simple protein evolutionary classification method based on the mutual relations between protein sequences

Author(s):

Xiaogeng Wan* and Xinying Tan  

Abstract:


Aims:This paper presents a simple method that is efficient for protein evolutionary classification.

Background: Proteins are diverse with their sequences, structures and functions. It is important to understand the relations between the sequences, structures and functions of proteins. Many methods have been developed for protein evolutionaryclassifications, these methods include machine learning methods such as the LibSVM, feature methods such as the natural vector method and the protein map. Machine learning methods use pre-labeled training sets to classify protein sequences into disjoint classes. Feature methods such as the natural vector and the protein map convert protein sequences into feature vectors and use polygenetic-trees to classify on the distance between the feature vectors. In this paper, we propose a simple method that classify the evolutionary relations of protein sequences using the distance maps on the mutual relations between protein sequences. The new method is unsupervised and model-free, which is efficient in the evolutionary classifications of proteins.

Objective: In this paper, we propose a simple method that classify the evolutionary relations of protein sequences using the distance maps on the mutual relations between protein sequences. The new method is unsupervised and model-free, which is efficient in the evolutionary classifications of proteins.

Method: To quantify the mutual relations and the homology of protein sequences, we use the normalized mutual information rates on protein sequences, and we define two distance maps that convert the normalized mutual information rates into 'distances', and use UPGMA trees to present the evolutionary classifications of proteins.

Result: We use four classifical protein evolutionary classification examples to demonstrate the new method, where the results are compared with traditional methods such as the natural vector and the protein maps. We use the AUPRC curves to evaluate the classification qualities of the new method and the traditional methods. We found that the new method with the two distance maps is efficient in the evolutionary classification of the classical examples, and it outperforms the natural vector and the protein maps in the evolutionary classifications.

Conclusion: The normalized mutual information rates with the two distance maps are efficient in protein evolutionary classifications, which outperform some classifical methods in the evolutionary classifications.

Other: The results are compared with traditional protein evolutionary classification methods such as the natural vector and the protein map, and the method of AUPRC curves is applied to the new method and the traditional methods to inspect the classification accuracies.

Keywords:

Protein evolutionary classification, mutual information rate, protein sequence.Protein evolutionary classification, protein sequence.

Affiliation:

Department of Mathematics, Beijing University of Chemical Technology, North Third Ring Road 15, Chaoyang District, Beijing, Zip code: 100029, The Fourth Center of PLA General Hospital



Full Text Inquiry