Discovering Non-coding RNA Elements in Drosophila 3’ Un-translated Regions
Cuncong Zhong, School of EECS, University of Central Florida, Orlando, FL 32816-2362 USA, cczhong at cs dot ucf dot edu
Justen Andrews, Department of Biology, Indiana University, Bloomington, Indiana 47405, USA, jusandre at indiana dot edu;
Shaojie Zhang*, School of EECS, University of Central Florida, Orlando, FL 32816-2362 USA, shzhang at cs dot ucf dot edu
*To whom correspondence should be addressed to.
Abstract
The non-coding RNA (ncRNA) elements in the 3’ untranslated regions (3’-UTRs) are known to participate in
the genes’ post-transcriptional regulation, such as their stability, translation efficiency, and subcellular localization.
Inferring co-expression patterns of the genes by clustering their 3’-UTR ncRNA elements will provide invaluable
knowledge for further studies of their functionalities and interactions under specific physiological processes. In this
work, we propose an improved RNA structural clustering pipeline that takes into account the length-dependent
distribution of the structural similarity measure. Benchmark of the proposed pipeline on Rfam data clearly
demonstrates over 10% performance gain, when compared to a traditional hierarchical clustering pipeline. By
applying the proposed clustering pipeline to Drosophila melanogaster ’s 3’-UTRs, we have successfully identified
184 ncRNA clusters, of which 91.3% appear to be true RNA structural elements, based on RNAz’s prediction.
Among the clusters, we have rediscovered the well-known histone ncRNA family as well as a number of other
families whose potential functionalities may be inferred from existing studies. One of such families contains
genes that are preferentially expressed in male Drosophila. In situ hybridization further reveals their characteristic
‘cup’ or ‘comet’ localization patterns in Drosophila testis. The complete clustering results are available from
supplementary materials.
Supplementary information
Performance of clustering on all families in the Rfam data set
Additional file 1 (68k)
Clustering results of RNA elements in Drosophila 3’-UTR
Additional file 2 (65k)
Differential expression of each cluster
Additional file 3 (59k)
Other companion information
The entire Rfam data set and raw clustering file
Rfam data set (159k)
Original FlyAtlas T-test results used to generate Additional file 3
FlyAtlas T-test original results (1.7M)
Detailed information of fly 3'-UTR RNA elements clustering (genomic sequences, gene annotation links in UCSC genome browser and FlyBase, consensus structures, and GO analysis by Ontologizer)
Detailed table
Software download
CLCL (CLique CLustering)