Tomasz Smolarczyk, Irena Roterman- Konieczna and Katarzyna Stapor* Pages 1 - 18 ( 18 )
Background: Over the last few decades, a search for the theory of protein folding has grown into a full-fledged research field at the intersection of biology, chemistry and informatics. Despite enormous effort, there are still open questions and challenges, like understanding the rules by which amino acid sequence determines protein secondary structure.
Objective: In this review, we depicts the progress of the prediction methods over the years and identify sources of improvements.
Methods: The protein secondary structure prediction problem is described followed by the discussion on theoretical limitations, description of the commonly used data sets, features and a review of three generation of methods with the focus on the most recent advances.
Results: The state-of-the-art methods are currently reaching almost 88% for 3-class prediction and 76.5% for an 8-class prediction.
Conclusion: This review summarizes recent advances and outline further research directions.
protein secondary structure prediction, multiple sequence alignment, PSSM, HHblits, deep neural networks, machine learning, protein early-stage structure
Institute of Informatics, Silesian University of Technology, Gliwice, Department of Bioinformatics and Telemedicine, Jagiellonian University Medical College, Kraków, Institute of Informatics, Silesian University of Technology, Gliwice