Liwei Liu, Tomoya Mori, Yang Zhao, Morihiro Hayashida* and Tatsuya Akutsu* Pages 25 - 33 ( 9 )
Background: Data compression is essential for efficient large-scale data processing, so that a number of studies have been done. Grammar-based compression is to find a small grammar that generates input data, and it has been used not only for data compression but also for analysis of biological data since it is useful for pattern extraction.
Objective: Recently, for rooted ordered trees, a special kind of network structures, elementary ordered tree grammar (EOTG) has been defined by extending context-free grammar (CFG) and an integer-programming (IP) method which finds the smallest EOTG for input data has also been proposed and applied to extract common pattern of RNA secondary structures. However, the method is not so efficient for large input trees. Therefore, development of an efficient method is important.
Methods: We propose an Euler string-based compression approach that finds the smallest CFG for the Euler string corresponding to an input rooted ordered tree.
Results: From a theoretical viewpoint, we show that there exists a gap of compression ratios between the tree grammar-based approach and Euler string-based approach. From a practical viewpoint, we show the efficiency and effectiveness of our proposed approach by applying it to comparison of RNA secondary structures.
Conclusion: The experimental results indicate that the Euler string-based approach can efficiently compress tree-structured data retaining some structural information of them.
Context-free grammar, data compression, Euler string, integer programming, RNA secondary structure, tree grammar.
College of Science, Dalian Jiaotong University, Dalian 116028, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Molecular Profiling Research Center for Drug Discovery, National Institute of Advanced Industrial Science and Technology, AIST Tokyo Waterfront Bio-IT Research Building, 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011