Identification of novel RNA design candidates by clustering the extended RNA-As-Graphs library

Swati Jain, Qiyao Zhu, Amiel S.P. Paz, Tamar Schlick

Research output: Contribution to journalArticlepeer-review

Abstract

Background: We re-evaluate our RNA-As-Graphs clustering approach, using our expanded graph library and new RNA structures, to identify potential RNA-like topologies for design. Our coarse-grained approach represents RNA secondary structures as tree and dual graphs, with vertices and edges corresponding to RNA helices and loops. The graph theoretical framework facilitates graph enumeration, partitioning, and clustering approaches to study RNA structure and its applications. Methods: Clustering graph topologies based on features derived from graph Laplacian matrices and known RNA structures allows us to classify topologies into ‘existing’ or hypothetical, and the latter into, ‘RNA-like’ or ‘non RNA-like’ topologies. Here we update our list of existing tree graph topologies and RAG-3D database of atomic fragments to include newly determined RNA structures. We then use linear and quadratic regression, optionally with dimensionality reduction, to derive graph features and apply several clustering algorithms on our tree-graph library and recently expanded dual-graph library to classify them into the three groups. Results: The unsupervised PAM and K-means clustering approaches correctly classify 72–77% of all existing graph topologies and 75–82% of newly added ones as RNA-like. For supervised k-NN clustering, the cross-validation accuracy ranges from 57 to 81%. Conclusions: Using linear regression with unsupervised clustering, or quadratic regression with supervised clustering, provides better accuracies than supervised/linear clustering. All accuracies are better than random, especially for newly added existing topologies, thus lending credibility to our approach. General significance: Our updated RAG-3D database and motif classification by clustering present new RNA substructures and RNA-like motifs as novel design candidates.

Original languageEnglish (US)
Article number129534
JournalBiochimica et Biophysica Acta - General Subjects
Volume1864
Issue number6
DOIs
StatePublished - Jun 2020

Keywords

  • Graph clustering
  • RAG-3D database
  • RNA design
  • RNA-like motifs
  • Tree and dual graph topologies

ASJC Scopus subject areas

  • Biophysics
  • Biochemistry
  • Molecular Biology

Fingerprint Dive into the research topics of 'Identification of novel RNA design candidates by clustering the extended RNA-As-Graphs library'. Together they form a unique fingerprint.

Cite this