Submit Manuscript  

Article Details


Clustering Count-based RNA Methylation Data Using a Nonparametric Generative Model

[ Vol. 14 , Issue. 1 ]

Author(s):

Lin Zhang, Yanling He, Huaizhi Wang, Hui Liu*, Yufei Huang, Xuesong Wang and Jia Meng*   Pages 11 - 23 ( 13 )

Abstract:


Background: RNA methylome has been discovered as an important layer of gene regulation and can be profiled directly with count-based measurements from high-throughput sequencing data. Although the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data has unique features, such as low reads coverage, which calls for novel clustering approaches.

Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach clustering analysis of count-based RNA methylation sequencing data.

Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the clustering effect in methylation level with the original count-based measurements rather than an estimated continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically determine an optimal number of clusters so as to avoid the common model selection problem in clustering analysis.

Results: When tested on the simulated system, the method demonstrated improved clustering performance over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex.

Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters adaptively from the data analyzed.

Availability: The source code and documents of DPBBM R package are freely available through the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.

Keywords:

RNA methylation, m6A-seq, beta-binomial mixture, dirichlet process, clustering, epitranscriptome.

Affiliation:

School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, Department of Electrical and Computer Engineering, The University of Texas at San Antonio, San Antonio TX 78229, School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123

Graphical Abstract:



Read Full-Text article