Efficient alignment of RNA secondary structures using sparse dynamic programming
Cuncong Zhong, Department of EECS, University of Central Florida, Orlando, FL 32816-2362 USA, cczhong at eecs dot ucf dot edu
Shaojie Zhang*, Department of EECS, University of Central Florida, Orlando, FL 32816-2362 USA, shzhang at eecs dot ucf dot edu
*To whom the correspondence should be addressed to.
Abstract
Background:
Current advances of the next-generation sequencing technology have revealed a large number of
un-annotated RNA transcripts. Comparative study of the RNA structurome is an important approach to assess their
biological functionalities. Due to the large sizes and abundance of the RNA transcripts, an efficient and accurate RNA
structure-structure alignment algorithm is in urgent need to facilitate the comparative study. Despite the importance
of the RNA secondary structure alignment problem, there are no computational tools available that provide high
computational efficiency and accuracy. In this case, designing and implementing such an efficient and accurate RNA
secondary structure alignment algorithm is highly desirable.
Results:
In this work, through incorporating the sparse dynamic programming technique, we implemented an
algorithm that has an O(n^3) expected time complexity, where n is the average number of base pairs in the RNA
structures. This complexity, which can be shown assuming the polymer-zeta property, is confirmed by our
experiments. The resulting new RNA secondary structure alignment tool is called ERA. Benchmark results indicate
that ERA can significantly speedup RNA structure-structure alignments compared to other state-of-the-art RNA alignment tools, while maintaining high alignment accuracy.
Conclusions:
Using the sparse dynamic programming technique, we are able to develop a new RNA secondary
structure alignment tool that is both efficient and accurate. We anticipate that the new alignment algorithm
ERA will significantly promote comparative RNA structure studies. The program, ERA, is freely available from this website.
Data set used for benchmarking:
BraliBase II
Individual structures from RFAM
Download ERA:
ERA_v1.0