RNAMotifScan: Automatic Comparing and Searching for RNA Tertiary Motifs using Secondary Structural Alignment
Cuncong Zhong, School of EECS, University of Central Florida, Orlando, FL 32816-2362 USA, cczhong at cs dot ucf dot edu
Haixu Tang, School of Informatics, Indiana University, Bloomington, IN 47408-3912 USA, htang at indiana dot edu
Shaojie Zhang*, School of EECS, University of Central Florida, Orlando, FL 32816-2362 USA, shzhang at cs dot ucf dot edu
*To whom correspondence should be addressed to.
Recent studies have shown that RNA tertiary motifs play essential roles in determining RNA folding and
functionality. Computational identification and analysis of RNA tertiary motifs remain a challenging task.
Existing motif identification methods based on 3D structure may not properly handle motifs with structural
variations. Other tertiary motif identification methods consider nested canonical base-pairing structures
and cannot be used to identify complex RNA tertiary motifs which often consist of various non-canonical
base-pairs due to uncommon hydrogen bond interactions. In this paper, we present a new RNA structure
alignment method for RNA tertiary motif identification, which takes into consideration the isosteric (both
canonical and non-canonical) base-pairs and multi-pairings in RNA tertiary motifs. Our new method, named
RNAMotifScan, aims to find the maximum common isosteric base-pairs between two RNA structures. We
tested RNAMotifScan by searching for some previously known RNA strucural motifs within the RNA three
dimensional structures from Protein Data Bank (PDB). It is shown that RNAMotifScan has better perfor-
mance in terms of both speed and accuracy comparing to current competing methods.
Please cite the following paper:
Cuncong Zhong, Haixu Tang, and Shaojie Zhang, "RNAMotifScan: automatic identification of RNA structural motifs using secondary structural alignment", Nucleic Acids Research Aug. 8, 2010.Pubmed Full text
Predicted RNA motifs by searching the whole PDB using RNAMotifScan:
Results in 1S72 is based on FPR cutoff 0.1, instead of p-value cutoff shown in the paper.
Kink-turn Motif (non-redundant)
Kink-turn Motif (in 1S72, FPR 0.1)
C_loop (in 1S72, FPR 0.1)
Sarcin-ricin Motif (non-redundant)
Sarcin-ricin Motif (in 1S72, FPR 0.1)
Reverse Kink turn Motif
Reverse Kink turn (non_redundant)
Reverse Kink turn (in 1S72, FPR 0.1)
E-loop Motif (non-redundant)
E-loop Motif (in 1S72, FPR 0.1)
Notations in the alignment results:
The line labled 'iso' is the isostericity base-pair indication. A
Star '*' represents the matched location in the seqeunce is a part of a
matched isosteric base-pair. A plus symbol '+' represents the matched
location is a part of matched non-isosteric base-pair.
The line labeled 'edge' is the interacting edges. 'W' is for
Watson-Crick, 'H' is for Hoogsteen, and 'S' is for sugar edge. Upper
case letters are for 'cis' orientation base-pairs while lower case
letters are for 'trans' orientation base-pairs.
The line labeled 'struc' represents the base-pair locations. A pair
of '(' and ')' represent the base-pair that was identified in the first
stage, while '<' and '>' represent the base-pair that was identified in
the second stage.
A dot in the sequence represents the breakage location of the
original sequence due to presentation of multiple strands in the query
motif. An '-' symbol represents a gap in the sequence.
Comparision of Score Distribution between Real Segments in PDB and Simulated Segments:
Comparison of E-loop Superimpositions
Superimposing Model E-loop (blue) and new found instance (red)
Superimposing Model E-loop (blue) and regular A-form helix (green)
Please modify the 6th line in file 'RNAMotifScan.pl' and the 8th line in file 'RNAMotifScan.pm' to specify the directory where the package is located in your workstation. Please also recompile the excutables 'RNAMotifAlign' and 'RNAMotifAlign_align' using commands 'make new' and 'make' before running the program. Sorry for the inconvenience.