TY - JOUR
T1 - Candidates for novel RNA topologies
AU - Kim, Namhee
AU - Shiffeldrim, Nahum
AU - Gan, Hin Hark
AU - Schlick, Tamar
N1 - Funding Information:
We thank Uri Laserson for the suggestion on Eulerian tour, Samuela Pasquali for discussions on graph-growing algorithms, and Daniela Fera for valuable assistance. This work was supported by Human Frontier Science Program (HFSP) and by a Joint NSF/NIGMS Initiative in Mathematical Biology (DMS-0201160).
PY - 2004/8/27
Y1 - 2004/8/27
N2 - Because the functional repertiore of RNA molecules, like proteins, is closely linked to the diversity of their shapes, uncovering RNA's structural repertoire is vital for identifying novel RNAs, especially in genomic sequences. To help expand the limited number of known RNA families, we use graphical representation and clustering analysis of RNA secondary structures to predict novel RNA topologies and their abundance as a function of size. Representing the essential topological properties of RNA secondary structures as graphs enables enumeration, generation, and prediction of novel RNA motifs. We apply a probabilistic graph-growing method to construct the RNA structure space encompassing the topologies of existing and hypothetical RNAs and cluster all RNA topologies into two groups using topological descriptors and a standard clustering algorithm. Significantly, we find that nearly all existing RNAs fall into one group, which we refer to as "RNA-like"; we consider the other group "non-RNA-like". Our method predicts many candidates for novel RNA secondary topologies, some of which are remarkably similar to existing structures; interestingly, the centroid of the RNA-like group is the tmRNA fold, a pseudoknot having both tRNA-like and mRNA-like functions. Additionally, our approach allows estimation of the relative abundance of pseudoknot and other (e.g. tree) motifs using the "edge-cut" property of RNA graphs. This analysis suggests that pseudoknots dominate the RNA structure universe, representing more than 90% when the sequence length exceeds 120 nt; the predicted trend for <100 nt agrees with data for existing RNAs. Together with our predictions for novel "RNA-like" topologies, our analysis can help direct the design of functional RNAs and identification of novel RNA folds in genomes through an efficient topology-directed search, which grows much more slowly in complexity with RNA size compared to the traditional sequence-based search.
AB - Because the functional repertiore of RNA molecules, like proteins, is closely linked to the diversity of their shapes, uncovering RNA's structural repertoire is vital for identifying novel RNAs, especially in genomic sequences. To help expand the limited number of known RNA families, we use graphical representation and clustering analysis of RNA secondary structures to predict novel RNA topologies and their abundance as a function of size. Representing the essential topological properties of RNA secondary structures as graphs enables enumeration, generation, and prediction of novel RNA motifs. We apply a probabilistic graph-growing method to construct the RNA structure space encompassing the topologies of existing and hypothetical RNAs and cluster all RNA topologies into two groups using topological descriptors and a standard clustering algorithm. Significantly, we find that nearly all existing RNAs fall into one group, which we refer to as "RNA-like"; we consider the other group "non-RNA-like". Our method predicts many candidates for novel RNA secondary topologies, some of which are remarkably similar to existing structures; interestingly, the centroid of the RNA-like group is the tmRNA fold, a pseudoknot having both tRNA-like and mRNA-like functions. Additionally, our approach allows estimation of the relative abundance of pseudoknot and other (e.g. tree) motifs using the "edge-cut" property of RNA graphs. This analysis suggests that pseudoknots dominate the RNA structure universe, representing more than 90% when the sequence length exceeds 120 nt; the predicted trend for <100 nt agrees with data for existing RNAs. Together with our predictions for novel "RNA-like" topologies, our analysis can help direct the design of functional RNAs and identification of novel RNA folds in genomes through an efficient topology-directed search, which grows much more slowly in complexity with RNA size compared to the traditional sequence-based search.
KW - RNA secondary structure
KW - clustering algorithm
KW - graph theory
KW - novel RNA
KW - pseudoknot
UR - http://www.scopus.com/inward/record.url?scp=4143102873&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=4143102873&partnerID=8YFLogxK
U2 - 10.1016/j.jmb.2004.06.054
DO - 10.1016/j.jmb.2004.06.054
M3 - Article
C2 - 15321711
AN - SCOPUS:4143102873
SN - 0022-2836
VL - 341
SP - 1129
EP - 1144
JO - Journal of Molecular Biology
JF - Journal of Molecular Biology
IS - 5
ER -