Motivation
In order to improve the overall quality of research done on RNA structures, accessibility to highly accurate data must be ensured. Though the availability of data related to RNA structure has been growing tremendously over the past few years, maintaining their quality and integrity has become the greater challenge. Since the structural data available in PDB are results of different independent research, they might be highly similar to previously submitted data. To remove RNA chain redundancy in PDB, we have introduced a non-redundant dataset of RNA chains, known as RNA- NRD. Here, pair of RNA chains within the same organism containing a sequence identity ≥ 80%, RMSD less than 4Å and Alignment Ratio ≥ 80% are considered to be redundant. As depending on applications, the definition of redundant RNA structures can vary, we have generated another variation of RNA-NRD dataset where we don’t divide the RNA chains based on source organism. We refer to this dataset as RNA-NRD-without-Organism-Division.
RNA-NRD Dataset Features
The dataset is updated every three months on a regular basis. It contains the following features:
- Cluster ID
- Representative
- Redundant Cluster
- Organism
- Macromolecule Name
- Rfam Family Name
Versions of RNA-NRD Dataset
Versions of RNA-NRD-without-Organism-Division Dataset
About Us
We are a research group within the Department of Computer Science at the University of Central Florida. This group was founded in 2007.
Contact
Contact Person: Nabila Shahnaz Khan
E-mail: nabilakhan@knights.ucf.edu