TY - JOUR
T1 - Mocap
T2 - Large-scale inference of transcription factor binding sites from chromatin accessibility
AU - Chen, Xi
AU - Yu, Bowen
AU - Carriero, Nicholas
AU - Silva, Claudio
AU - Bonneau, Richard
N1 - Publisher Copyright:
© The Author(s) 2017.
PY - 2017/5/5
Y1 - 2017/5/5
N2 - Differential binding of transcription factors (TFs) at cis-regulatory loci drives the differentiation and function of diverse cellular lineages. Understanding the regulatory interactions that underlie cell fate decisions requires characterizing TF binding sites (TFBS) across multiple cell types and conditions. Techniques, e.g. ChIP-Seq can reveal genome-wide patterns of TF binding, but typically requires laborious and costly experiments for each TF-cell-type (TFCT) condition of interest. Chromosomal accessibility assays can connect accessible chromatin in one cell type to many TFs through sequence motif mapping. Suchmethods, however, rarely take into account that the genomic context preferred by each factor differs from TF to TF, and from cell type to cell type. To address the differences in TF behaviors, we developed Mocap, a method that integrates chromatin accessibility, motif scores, TF footprints, CpG/GC content, evolutionary conservation and other factors in an ensemble of TFCT-specific classifiers. We show that integration of genomic features, such as CpG islands improves TFBS prediction in some TFCT. Further, we describe a method for mapping new TFCT, for which no ChIP-seq data exists, onto our ensemble of classifiers and show that our cross-sample TFBS prediction method outperforms several previously described methods.
AB - Differential binding of transcription factors (TFs) at cis-regulatory loci drives the differentiation and function of diverse cellular lineages. Understanding the regulatory interactions that underlie cell fate decisions requires characterizing TF binding sites (TFBS) across multiple cell types and conditions. Techniques, e.g. ChIP-Seq can reveal genome-wide patterns of TF binding, but typically requires laborious and costly experiments for each TF-cell-type (TFCT) condition of interest. Chromosomal accessibility assays can connect accessible chromatin in one cell type to many TFs through sequence motif mapping. Suchmethods, however, rarely take into account that the genomic context preferred by each factor differs from TF to TF, and from cell type to cell type. To address the differences in TF behaviors, we developed Mocap, a method that integrates chromatin accessibility, motif scores, TF footprints, CpG/GC content, evolutionary conservation and other factors in an ensemble of TFCT-specific classifiers. We show that integration of genomic features, such as CpG islands improves TFBS prediction in some TFCT. Further, we describe a method for mapping new TFCT, for which no ChIP-seq data exists, onto our ensemble of classifiers and show that our cross-sample TFBS prediction method outperforms several previously described methods.
UR - http://www.scopus.com/inward/record.url?scp=85020166402&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85020166402&partnerID=8YFLogxK
U2 - 10.1093/nar/gkx174
DO - 10.1093/nar/gkx174
M3 - Article
C2 - 28334916
AN - SCOPUS:85020166402
SN - 0305-1048
VL - 45
SP - 4315
EP - 4329
JO - Nucleic acids research
JF - Nucleic acids research
IS - 8
ER -