Bipartite networks represent causality better than simple networks: evidence, algorithms, and applications

Bingran Shen, Gloria Curozzi, Dennis Shasha

Research output: Contribution to journalArticlepeer-review

Abstract

A network, whose nodes are genes and whose directed edges represent positive or negative influences of a regulatory gene and its targets, is often used as a representation of causality. To infer a network, researchers often develop a machine learning model and then evaluate the model based on its match with experimentally verified “gold standard” edges. The desired result of such a model is a network that may extend the gold standard edges. Since networks are a form of visual representation, one can compare their utility with architectural or machine blueprints. Blueprints are clearly useful because they provide precise guidance to builders in construction. If the primary role of gene regulatory networks is to characterize causality, then such networks should be good tools of prediction because prediction is the actionable benefit of knowing causality. But are they? In this paper, we compare prediction quality based on “gold standard” regulatory edges from previous experimental work with non-linear models inferred from time series data across four different species. We show that the same non-linear machine learning models have better predictive performance, with improvements from 5.3% to 25.3% in terms of the reduction in the root mean square error (RMSE) compared with the same models based on the gold standard edges. Having established that networks fail to characterize causality properly, we suggest that causality research should focus on four goals: (i) predictive accuracy; (ii) a parsimonious enumeration of predictive regulatory genes for each target gene g; (iii) the identification of disjoint sets of predictive regulatory genes for each target g of roughly equal accuracy; and (iv) the construction of a bipartite network (whose node types are genes and models) representation of causality. We provide algorithms for all goals.

Original languageEnglish (US)
Article number1371607
JournalFrontiers in Genetics
Volume15
DOIs
StatePublished - May 2024

Keywords

  • RNA sequencing
  • bipartite network
  • causal inference
  • gene regulatory network
  • random forest

ASJC Scopus subject areas

  • Molecular Medicine
  • Genetics
  • Genetics(clinical)

Fingerprint

Dive into the research topics of 'Bipartite networks represent causality better than simple networks: evidence, algorithms, and applications'. Together they form a unique fingerprint.

Cite this