TY - JOUR
T1 - A Fiedler Vector Scoring Approach for Novel RNA Motif Selection
AU - Zhu, Qiyao
AU - Schlick, Tamar
N1 - Publisher Copyright:
© 2021 American Chemical Society.
PY - 2021/2/4
Y1 - 2021/2/4
N2 - Novel RNA motif design is of great practical importance for technology and medicine. Increasingly, computational design plays an important role in such efforts. Our coarse-grained RAG (RNA-As-Graphs) framework offers strategies for enumerating the universe of RNA 2D folds, selecting "RNA-like"candidates for design, and determining sequences that fold onto these candidates. In RAG, RNA secondary structures are represented as tree or dual graphs. Graphs with known RNA structures are called "existing", and the others are labeled "hypothetical". By using simplified features for RNA graphs, we have clustered the hypothetical graphs into "RNA-like"and "non-RNA-like"groups and proposed RNA-like graphs as candidates for design. Here, we propose a new way of designing graph features by using Fiedler vectors. The new features reflect graph shapes better, and they lead to a more clustered organization of existing graphs. We show significant increases in K-means clustering accuracy by using the new features (e.g., up to 95% and 98% accuracy for tree and dual graphs, respectively). In addition, we propose a scoring model for top graph candidate selection. This scoring model allows users to set a threshold for candidates, and it incorporates weighing of existing graphs based on their corresponding number of known RNAs. We include a list of top scored RNA-like candidates, which we hope will stimulate future novel RNA design.
AB - Novel RNA motif design is of great practical importance for technology and medicine. Increasingly, computational design plays an important role in such efforts. Our coarse-grained RAG (RNA-As-Graphs) framework offers strategies for enumerating the universe of RNA 2D folds, selecting "RNA-like"candidates for design, and determining sequences that fold onto these candidates. In RAG, RNA secondary structures are represented as tree or dual graphs. Graphs with known RNA structures are called "existing", and the others are labeled "hypothetical". By using simplified features for RNA graphs, we have clustered the hypothetical graphs into "RNA-like"and "non-RNA-like"groups and proposed RNA-like graphs as candidates for design. Here, we propose a new way of designing graph features by using Fiedler vectors. The new features reflect graph shapes better, and they lead to a more clustered organization of existing graphs. We show significant increases in K-means clustering accuracy by using the new features (e.g., up to 95% and 98% accuracy for tree and dual graphs, respectively). In addition, we propose a scoring model for top graph candidate selection. This scoring model allows users to set a threshold for candidates, and it incorporates weighing of existing graphs based on their corresponding number of known RNAs. We include a list of top scored RNA-like candidates, which we hope will stimulate future novel RNA design.
UR - http://www.scopus.com/inward/record.url?scp=85100270533&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100270533&partnerID=8YFLogxK
U2 - 10.1021/acs.jpcb.0c10685
DO - 10.1021/acs.jpcb.0c10685
M3 - Article
C2 - 33471540
AN - SCOPUS:85100270533
SN - 1520-6106
VL - 125
SP - 1144
EP - 1155
JO - Journal of Physical Chemistry B
JF - Journal of Physical Chemistry B
IS - 4
ER -