De novo discovery of structural motifs in RNA 3D structures through clustering
Ping Ge, Department Computer Science, University of Central Florida, Orlando, FL 32816 USA, pge at cs dot ucf dot edu
Shahidul Islam, Department Computer Science, University of Central Florida, Orlando, FL 32816 USA, shahidul at knights dot ucf dot edu
Cuncong Zhong, Deaprtment of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA, cczhong at ku dot edu
Shaojie Zhang*, School of EECS, University of Central Florida, Orlando, FL 32816 USA, shzhang at cs dot ucf dot edu
*To whom correspondence should be addressed to.
Abstract
As functional components in three-dimensional
conformation of an RNA, the RNA structural motifs provide
an easy way to associate the molecular architectures
with their biological mechanisms. In the past years, many
computational tools have been developed to search motif
instances by using the existing knowledge of well-studied
families. Recently, with the rapidly increasing number of
resolved RNA 3D structures, there is an urgent need to
discover novel motifs with the newly presented information.
In this work, we classify all the loops in non-redundant
RNA 3D structures to detect plausible RNA structural
motif families by using a clustering pipeline. Compared with
other clustering approaches, our method has two benefits:
first, the underlying alignment algorithm is tolerant to
the variations in 3D structures; second, sophisticated
downstream analysis has been performed to ensure the
clusters are valid and easily applied to further research.
The final clustering results contain many interesting new
variants of known motif families, such as GNAA tetraloop,
kink-turn, sarcin-ricin, and T-loop. We have also discovered
potential novel functional motifs conserved in ribosomal
RNA, sgRNA, SRP RNA, riboswitch, and ribozyme.
Please cite the following paper:
Ping Ge, Shahidul Islam, Cuncong Zhong, Shaojie Zhang, "De novo discovery of structural motifs in RNA 3D structures through clustering", bioRxiv 155580 June 27, 2017.Full text
Hairpin Loop
HL1_1 (GNAA)
HL2_1
HL2_2
HL2_3
HL2_4
HL2_5
HL2_6
HL2_7
HL2_8
HL2_9
HL3_1 (GNGA)
HL4_1 (t-loop)
HL5_1 (t-loop)
HL6_1 (sarcin-ricin)
HL6_3
HL7_1
HL8_1 (t-loop)
HL9_1
HL9_2
HL10_1
HL12_1
HL13_1
HL14_1 (t-loop)
HL15_1
HL16_1 (GNAA)
HL16_2
HL17_1
HL19_1
HL20_1
HL21_1
HL22_1 (GNGA)
HL23_1
HL25_1
HL26_1
HL27_1
HL28_1 (t-loop)
HL29_1
HL30_1
HL31_1
HL32_1 (HL31_0)
HL33_1
HL34_1
HL35_1 (sarcin-ricin)
HL36_1
HL38_1
HL39_1
HL40_1 (GNGA)
HL41_1
HL42_1 (HL42_1)
HL43_1
HL44_1
HL45_1
HL46_1 (HL46_1)
HL48_1
HL49_1
HL50_1
HL51_1
HL52_1
HL53_1
HL55_1 (HL42_1)
HL56_1 (HL32_1)
HL58_1
HL60_1
HL61_1 (GNGA)
HL64_1 (HL46_1)
HL65_1
HL66_1
HL69_1
Internal Loop
IL1_1
IL1_2 (kink-turn)
IL1_3
IL1_4 (reverse kink-turn)
IL2_1 (reverse kink-turn)
IL3_1 (sarcin-ricin)
IL3_2 (C-loop)
IL4_1 (tetraloop receptor)
IL4_2
IL4_3
IL5_1 (kink-turn)
IL5_2
IL5_3 (E-loop)
IL5_4
IL6_1 (Hook-turn)
IL8_1 (C-loop)
IL8_2
IL9_1 (E-loop)
IL9_2
IL10_1 (kink-turn)
IL11_1 (rope sling)
IL12_1 (robe sling)
IL13_1 (tandem shear)
IL13_2 (kink-turn)
IL13_3
IL14_1
IL16_1
IL16_2
IL16_3
IL17_1
IL17_2
IL18_1 (L1 complex)
IL18_2 (kink-turn)
IL19_1
IL19_2
IL21_1 (reverse kink-turn)
IL22_1 (tetraloop receptor)
IL22_2
IL23_1 (kink-turn)
IL24_1 (kink-turn)
IL25_1 (sarcin-ricin)
IL25_2 (kink-turn)
IL26_1 (T-loop)
IL27_1
IL28_1
IL28_2 (E-loop)
IL28_3
IL28_4 (sarcin-ricin)
IL29_1
IL29_2
IL30_1
IL30_2
IL30_3
IL31_1
IL32_1 (kink-turn)
IL33_1
IL33_2
IL35_1
IL37_1 (kink-turn)
IL38_1
IL38_2 (sarcin-ricin)
IL38_3
IL39_1
IL40_1
IL40_2
IL41_1
IL42_1
IL43_1
IL44_1 (tandem shear)
IL45_1 (kink-turn)
IL47_1
IL49_1
IL51_1
IL51_2
IL52_1
IL52_2 (sarcin-ricin)
IL55_1
IL55_2
IL56_1 (reverse kink-turn)
IL56_2 (sarcin-ricin)
IL59_1
IL60_1
IL61_1 (IL61_1)
IL62_1 (sarcin-ricin)
IL68_1 (tandem shear)
IL70_1
IL71_1
IL72_1
IL73_1
IL77_1
IL80_1 (IL61_1)
Multi Loop
ML1_1
ML1_2
ML2_1
ML3_1
ML4_1
ML4_2
ML5_1
ML5_2
ML6_1
ML7_1
ML8_1
ML9_1
ML10_1
ML11_1
ML12_1
ML13_1
ML15_1
ML16_1
ML17_1
ML17_2
ML20_1
ML23_1
ML25_1
ML27_1
ML28_1
ML29_1
ML30_1
ML18_1
ML19_1
ML24_1
ML26_1
ML31_1
ML32_1
ML33_1
ML34_1
ML35_1
ML36_1
ML37_1
ML38_1
ML39_1
ML40_1
ML41_1
ML42_1
ML44_1
ML45_1
ML46_1