TY - JOUR
T1 - Multi-study inference of regulatory networks for more accurate models of gene regulation
AU - Castro, Dayanne M.
AU - de Veaux, Nicholas R.
AU - Miraldi, Emily R.
AU - Bonneau, Richard
N1 - Funding Information:
This work was funded by the Simons Foundation and the U.S. National Institute of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher Copyright:
© 2019 Castro et al.
PY - 2019
Y1 - 2019
N2 - Gene regulatory networks are composed of sub-networks that are often shared across biological processes, cell-types, and organisms. Leveraging multiple sources of information, such as publicly available gene expression datasets, could therefore be helpful when learning a network of interest. Integrating data across different studies, however, raises numerous technical concerns. Hence, a common approach in network inference, and broadly in genomics research, is to separately learn models from each dataset and combine the results. Individual models, however, often suffer from under-sampling, poor generalization and limited network recovery. In this study, we explore previous integration strategies, such as batch-correction and model ensembles, and introduce a new multitask learning approach for joint network inference across several datasets. Our method initially estimates the activities of transcription factors, and subsequently, infers the relevant network topology. As regulatory interactions are context-dependent, we estimate model coefficients as a combination of both dataset-specific and conserved components. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. We evaluate generalization and network recovery using examples from Bacillus subtilis and Saccharomyces cerevisiae, and show that sharing information across models improves network reconstruction. Finally, we demonstrate robustness to both false positives in the prior information and heterogeneity among datasets.
AB - Gene regulatory networks are composed of sub-networks that are often shared across biological processes, cell-types, and organisms. Leveraging multiple sources of information, such as publicly available gene expression datasets, could therefore be helpful when learning a network of interest. Integrating data across different studies, however, raises numerous technical concerns. Hence, a common approach in network inference, and broadly in genomics research, is to separately learn models from each dataset and combine the results. Individual models, however, often suffer from under-sampling, poor generalization and limited network recovery. In this study, we explore previous integration strategies, such as batch-correction and model ensembles, and introduce a new multitask learning approach for joint network inference across several datasets. Our method initially estimates the activities of transcription factors, and subsequently, infers the relevant network topology. As regulatory interactions are context-dependent, we estimate model coefficients as a combination of both dataset-specific and conserved components. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. We evaluate generalization and network recovery using examples from Bacillus subtilis and Saccharomyces cerevisiae, and show that sharing information across models improves network reconstruction. Finally, we demonstrate robustness to both false positives in the prior information and heterogeneity among datasets.
UR - http://www.scopus.com/inward/record.url?scp=85061162782&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061162782&partnerID=8YFLogxK
U2 - 10.1371/journal.pcbi.1006591
DO - 10.1371/journal.pcbi.1006591
M3 - Article
C2 - 30677040
AN - SCOPUS:85061162782
VL - 15
JO - PLoS Computational Biology
JF - PLoS Computational Biology
SN - 1553-734X
IS - 1
M1 - e1006591
ER -