De novo discovery of structural motifs in RNA 3D structures through clustering

  • Ping Ge, Department Computer Science, University of Central Florida, Orlando, FL 32816 USA, pge at cs dot ucf dot edu
  • Shahidul Islam, Department Computer Science, University of Central Florida, Orlando, FL 32816 USA, shahidul at knights dot ucf dot edu
  • Cuncong Zhong, Deaprtment of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS 66045, USA, cczhong at ku dot edu
  • Shaojie Zhang*, School of EECS, University of Central Florida, Orlando, FL 32816 USA, shzhang at cs dot ucf dot edu
  • *To whom correspondence should be addressed to.

    Abstract

    As functional components in three-dimensional conformation of an RNA, the RNA structural motifs provide an easy way to associate the molecular architectures with their biological mechanisms. In the past years, many computational tools have been developed to search motif instances by using the existing knowledge of well-studied families. Recently, with the rapidly increasing number of resolved RNA 3D structures, there is an urgent need to discover novel motifs with the newly presented information. In this work, we classify all the loops in non-redundant RNA 3D structures to detect plausible RNA structural motif families by using a clustering pipeline. Compared with other clustering approaches, our method has two benefits: first, the underlying alignment algorithm is tolerant to the variations in 3D structures; second, sophisticated downstream analysis has been performed to ensure the clusters are valid and easily applied to further research. The final clustering results contain many interesting new variants of known motif families, such as GNAA tetraloop, kink-turn, sarcin-ricin, and T-loop. We have also discovered potential novel functional motifs conserved in ribosomal RNA, sgRNA, SRP RNA, riboswitch, and ribozyme.

    Please cite the following paper:

    Ping Ge, Shahidul Islam, Cuncong Zhong, Shaojie Zhang, "De novo discovery of structural motifs in RNA 3D structures through clustering", bioRxiv 155580 June 27, 2017.Full text
  • RNA motifs clustering result: (Source Spreadsheet)

    Hairpin Loop

  • HL1_1 (GNAA)
  • HL2_1
  • HL2_2
  • HL2_3
  • HL2_4
  • HL2_5
  • HL2_6
  • HL2_7
  • HL2_8
  • HL2_9
  • HL3_1 (GNGA)
  • HL4_1 (t-loop)
  • HL5_1 (t-loop)
  • HL6_1 (sarcin-ricin)
  • HL6_3
  • HL7_1
  • HL8_1 (t-loop)
  • HL9_1
  • HL9_2
  • HL10_1
  • HL12_1
  • HL13_1
  • HL14_1 (t-loop)
  • HL15_1
  • HL16_1 (GNAA)
  • HL16_2
  • HL17_1
  • HL19_1
  • HL20_1
  • HL21_1
  • HL22_1 (GNGA)
  • HL23_1
  • HL25_1
  • HL26_1
  • HL27_1
  • HL28_1 (t-loop)
  • HL29_1
  • HL30_1
  • HL31_1
  • HL32_1 (HL31_0)
  • HL33_1
  • HL34_1
  • HL35_1 (sarcin-ricin)
  • HL36_1
  • HL38_1
  • HL39_1
  • HL40_1 (GNGA)
  • HL41_1
  • HL42_1 (HL42_1)
  • HL43_1
  • HL44_1
  • HL45_1
  • HL46_1 (HL46_1)
  • HL48_1
  • HL49_1
  • HL50_1
  • HL51_1
  • HL52_1
  • HL53_1
  • HL55_1 (HL42_1)
  • HL56_1 (HL32_1)
  • HL58_1
  • HL60_1
  • HL61_1 (GNGA)
  • HL64_1 (HL46_1)
  • HL65_1
  • HL66_1
  • HL69_1

    Internal Loop

  • IL1_1
  • IL1_2 (kink-turn)
  • IL1_3
  • IL1_4 (reverse kink-turn)
  • IL2_1 (reverse kink-turn)
  • IL3_1 (sarcin-ricin)
  • IL3_2 (C-loop)
  • IL4_1 (tetraloop receptor)
  • IL4_2
  • IL4_3
  • IL5_1 (kink-turn)
  • IL5_2
  • IL5_3 (E-loop)
  • IL5_4
  • IL6_1 (Hook-turn)
  • IL8_1 (C-loop)
  • IL8_2
  • IL9_1 (E-loop)
  • IL9_2
  • IL10_1 (kink-turn)
  • IL11_1 (rope sling)
  • IL12_1 (robe sling)
  • IL13_1 (tandem shear)
  • IL13_2 (kink-turn)
  • IL13_3
  • IL14_1
  • IL16_1
  • IL16_2
  • IL16_3
  • IL17_1
  • IL17_2
  • IL18_1 (L1 complex)
  • IL18_2 (kink-turn)
  • IL19_1
  • IL19_2
  • IL21_1 (reverse kink-turn)
  • IL22_1 (tetraloop receptor)
  • IL22_2
  • IL23_1 (kink-turn)
  • IL24_1 (kink-turn)
  • IL25_1 (sarcin-ricin)
  • IL25_2 (kink-turn)
  • IL26_1 (T-loop)
  • IL27_1
  • IL28_1
  • IL28_2 (E-loop)
  • IL28_3
  • IL28_4 (sarcin-ricin)
  • IL29_1
  • IL29_2
  • IL30_1
  • IL30_2
  • IL30_3
  • IL31_1
  • IL32_1 (kink-turn)
  • IL33_1
  • IL33_2
  • IL35_1
  • IL37_1 (kink-turn)
  • IL38_1
  • IL38_2 (sarcin-ricin)
  • IL38_3
  • IL39_1
  • IL40_1
  • IL40_2
  • IL41_1
  • IL42_1
  • IL43_1
  • IL44_1 (tandem shear)
  • IL45_1 (kink-turn)
  • IL47_1
  • IL49_1
  • IL51_1
  • IL51_2
  • IL52_1
  • IL52_2 (sarcin-ricin)
  • IL55_1
  • IL55_2
  • IL56_1 (reverse kink-turn)
  • IL56_2 (sarcin-ricin)
  • IL59_1
  • IL60_1
  • IL61_1 (IL61_1)
  • IL62_1 (sarcin-ricin)
  • IL68_1 (tandem shear)
  • IL70_1
  • IL71_1
  • IL72_1
  • IL73_1
  • IL77_1
  • IL80_1 (IL61_1)

    Multi Loop

  • ML1_1
  • ML1_2
  • ML2_1
  • ML3_1
  • ML4_1
  • ML4_2
  • ML5_1
  • ML5_2
  • ML6_1
  • ML7_1
  • ML8_1
  • ML9_1
  • ML10_1
  • ML11_1
  • ML12_1
  • ML13_1
  • ML15_1
  • ML16_1
  • ML17_1
  • ML17_2
  • ML20_1
  • ML23_1
  • ML25_1
  • ML27_1
  • ML28_1
  • ML29_1
  • ML30_1
  • ML18_1
  • ML19_1
  • ML24_1
  • ML26_1
  • ML31_1
  • ML32_1
  • ML33_1
  • ML34_1
  • ML35_1
  • ML36_1
  • ML37_1
  • ML38_1
  • ML39_1
  • ML40_1
  • ML41_1
  • ML42_1
  • ML44_1
  • ML45_1
  • ML46_1