Click
here to close Hello! We notice that
you are using Internet Explorer, which is not supported by Echinobase
and may cause the site to display incorrectly. We suggest using a
current version of Chrome,
FireFox,
or Safari.
BMC Genomics
2021 Oct 19;221:751. doi: 10.1186/s12864-021-07936-0.
Show Gene links
Show Anatomy links
Identification and prediction of developmental enhancers in sea urchin embryos.
Arenas-Mena C
,
Miljovska S
,
Rice EJ
,
Gurges J
,
Shashikant T
,
Wang Z
,
Ercan S
,
Danko CG
.
???displayArticle.abstract???
BACKGROUND: The transcription of developmental regulatory genes is often controlled by multiple cis-regulatory elements. The identification and functional characterization of distal regulatory elements remains challenging, even in tractable model organisms like sea urchins.
RESULTS: We evaluate the use of chromatin accessibility, transcription and RNA Polymerase II for their ability to predict enhancer activity of genomic regions in sea urchin embryos. ATAC-seq, PRO-seq, and Pol II ChIP-seq from early and late blastula embryos are manually contrasted with experimental cis-regulatory analyses available in sea urchin embryos, with particular attention to common developmental regulatory elements known to have enhancer and silencer functions differentially deployed among embryonic territories. Using the three functional genomic data types, machine learning models are trained and tested to classify and quantitatively predict the enhancer activity of several hundred genomic regions previously validated with reporter constructs in vivo.
CONCLUSIONS: Overall, chromatin accessibility and transcription have substantial power for predicting enhancer activity. For promoter-overlapping cis-regulatory elements in particular, the distribution of Pol II is the best predictor of enhancer activity in blastula embryos. Furthermore, ATAC- and PRO-seq predictive value is stage dependent for the promoter-overlapping subset. This suggests that the sequence of regulatory mechanisms leading to transcriptional activation have distinct relevance at different levels of the developmental gene regulatory hierarchy deployed during embryogenesis.
Fig. 1. ATAC-seq, PRO-seq and Pol II ChIP-seq are used for the identification of TREs. A Experimental outlines of the 3 genomic profiles used. B IGV browser snapshot of replicate genomic profiles at the H2A.Z locus, a highly expressed gene [36], left, which also includes a gene expressed at lower levels, right side. Number of 3′ end reads per million of PRO-seq run-on transcripts are shown for the plus and minus strands. PRO-seq peaks mark transcriptional pause sites. MACS peak and dREG TRE predictions for the combined data sets are shown underscoring each genomic profile. The CRM panel underscores a genomic region with enhancer activity tested by deletion in large reporter constructs [25]. PRO-seq and ATAC-seq profiles are set to the same scale between 12 and 20 h stages, with the range indicated between brackets at the beginning of each track
Fig. 2. Genome-wide PRO-, ATAC- and ChIP-seq analysis. A Distribution of signal intensity and reproducibility estimation between distinct biological replicates for the different data sets in 12 h embryos. Overlap of points indicated by the color gradient. B Histograms of the number of reads per peak call for the different data sets in A. C Distribution of signal and reproducibility in 20 h embryos. D Histograms of the number of reads per peak call for the different data sets in C. E Venn diagrams of the overlap between ATAC and Pol II ChIP peak calls, and dREG predicted TREs in 20 h embryos. F Venn diagrams of the overlap of ATAC and dREG peak calls between stages
Fig. 3. ATAC-, Pol II ChIP- and PRO-seq sea urchin embryos at the SpHox11/13b locus. For ATAC- and PRO-seq, the scale in reads per million at the start of each track is maintained at the same range between states and equal between plus and minus strands. The whole region was scanned for enhancer activity by overlapping 3–5 Kb reporter constructs [24], only active CRMs are indicated, in green those active in both stages, and in gray those inactive or with unknown activity in these stages as indicated in the text
Fig. 4. Modeling of CRM reporter activity from of ATAC-, Pol II ChIP- and PRO-seq. A Violin/box-plot of the ATAC, Pol II ChIP peak call and dREG TRE prediction sizes, and the 389 CRMs. The inset plots the size distributions of active and inactive CRMs, which is not significatively different. B and C, ranked CRM expression plot in 12 and 24 h embryos, respectively. The blue line at 1 marks the CRM expression level when it equals that of the basal-promoter reporter. The red line by the curve “elbow” marks the 2 fold above control chosen as the expression threshold. D Violin/box-plots of PRO-, ATAC-, and Pol II ChIP-seq significatively different signals between active and inactive CRMs in 12 and 20 h embryos. E, top, 12 h embryo Receiver Operating Characteristics (ROC) and, bottom, Precision-Recall Curves (PRC) of the logistic regression models trained and tested by 5 fold cross-validation repeated 200 times. Area Under the ROC (AUROC) and AUPRC as indicated for each model. Dotted lines mark random guess prediction performance, a mid-diagonal for ROC and a horizontal line at the fraction of active CRMs for PRC. The absolute AUPRC indicated in bold and the difference with random guess in parenthesis. F ROCs and PRCs in 20 h embryos. G, top, PRCs evaluating the enhancer activity predictions for the CRM promoter-overlapping data set of models trained with the entire 20 h CRM data set. Bottom, model predictions for the complementary, non-promoter overlapping data set. H Violin/box-plot of the AUPRC after cross-validation with different predictors, as indicated; All, includes the sum and max of the 3 genomic profiles allowing second order interactions among predictors; dREG-max, signifies the sum of the maximum values at dREG peaks
Fig. 5. Quantitative prediction of enhancer activity from PRO-seq data. Plot of the hold-out predicted against the actual reporter expression of linear regression models using ATAC and PRO-seq signal at dREG predictions tested by five-fold cross-validation. Violin/Box-plot of R2 values, with the average indicated underneath