Submit Manuscript  

Article Details


Esreem: Efficient Short Reads Error Estimation Computational Model for Next-Generation Genome Sequencing

Author(s):

Muhammad Tahir, Muhammad Sardaraz*, Zahid Mehmood and Muhammad Saud Khan   Pages 1 - 11 ( 11 )

Abstract:


Aims: To assess the error profile in NGS data generated from high throughput sequencing machines.

Background: Short-read sequencing data from Next Generation Sequencing (NGS) are presently being generated by a number of research projects. Depicting the errors produced by NGS platforms and expressing accurate genetic variation from reads are two inter-dependent phases. It has high significance in various analyses such as genome sequence assembly, SNPs calling, evolutionary studies, and haplotype inference. The systematic and random errors show incidence profile for each of the sequencing platforms i.e. Illumina sequencing, Pacific Biosciences, 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Ion Torrent sequencing, and Oxford Nanopore sequencing. Advances in NGS deliver galactic data yet with the addition of errors. Some ratio of these errors may emulate genuine true biological signals i.e. mutation and may subsequently negate the results. Various independent applications have been proposed to correct sequencing errors. Systematic analysis of these algorithms shows that state-of-the-art models are missing.

Objective: In this paper, we propose an efficient error estimation computational model called ESREEM to assess error rates in NGS data.

Methods: The proposed model prospects the analysis that there exists a true linear regression association between the number of reads containing errors and the number of reads sequenced. The model is based on a probabilistic error model integrated with the Hidden Markov Model (HMM).

Result: The proposed model is evaluated on several benchmark datasets and the results obtained are compared with state-ofthe-art algorithms.

Conclusions: Experimental results analysis show that the proposed model efficiently estimates errors and runs in less time as compared to others.

Keywords:

NGS, Genome, Sequencing, Error Analysis, Algorithms.

Affiliation:

Department of Computer Science, COMSATS University Islamabad, Attock Campus, Department of Computer Science, COMSATS University Islamabad, Attock Campus, Department of Software Engineering, University of Engineering and Technology, Taxila, Department of Computer Science, COMSATS University Islamabad, Attock Campus



Read Full-Text article