Submit Manuscript  

Article Details


Hardware Performance Evaluation of de novo Transcriptome Assembly Software in Amazon Elastic Compute Cloud

Author(s):

Fernando Mora-Márquez, José Luis Vázquez-Poletti, Víctor Chano, Carmen Collada, Álvaro Soto and Unai López de Heredia*   Pages 1 - 11 ( 11 )

Abstract:


Background: Bioinformatic software for RNA-seq analysis has high computational requirement in terms of CPU number, RAM size, and processor characteristics. Specifically, de novo transcriptome assembly demands large computational infrastructure due to the massive data size, and the complexity of the algorithms employed. Comparative studies about the quality of the transcriptome yielded by de novo assemblers have been previously published, lacking, however, a hardware efficiency-oriented approach to help selecting the assembly hardware platform in a cost-efficient way.

Objective: We tested the performance of two popular de novo transcriptome assemblers, Trinity and SOAPdenovo-Trans (SDNT) in terms of cost-efficiency and quality to assess limitations, and provide troubleshooting and guidelines to run transcriptome assemblies efficiently.

Method: We built virtual machines with different hardware characteristics (CPU number, RAM size) in the Amazon Elastic Compute Cloud of the Amazon Web Services. Using simulated and real data sets, we measured the elapsed time, cost, CPU percentage and output size of small and large data set assemblies.

Results: For small data sets, SDNT outperformed Trinity by an order the magnitude, reducing significantly the times and costs of the assembly. For large data sets Trinity performed better than SDNT. Both assemblers provide good quality transcriptomes.

Conclusion: The selection of the optimal transcriptome assembler and provision of computational resources depends on the combined effect of size and complexity of RNA-seq experiments.

Keywords:

Cloud Computing, Cost-Efficiency, Quality, RNA-Seq, Transcriptome

Affiliation:

GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politécnica de Madrid, Ciudad Universitaria, 28040 Madrid, GI Arquitectura de Sistemas Distribuidos, Dpto. Arquitectura de Computadores y Automática, Facultad de Informática, Universidad Complutense de Madrid, Ciudad Universitaria, 28040 Madrid, GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politécnica de Madrid, Ciudad Universitaria, 28040 Madrid, GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politécnica de Madrid, Ciudad Universitaria, 28040 Madrid, GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politécnica de Madrid, Ciudad Universitaria, 28040 Madrid, GI Sistemas Naturales e Historia Forestal, Dpto. Sistemas y Recursos Naturales, ETSI Montes, Forestal y del Medio Natural, Universidad Politécnica de Madrid, Ciudad Universitaria, 28040 Madrid



Read Full-Text article