ENCODE/modENCODE Chromatin Supplemental Datasets
This collection contains the data described in Ho et al., "Comparative analysis of metazoan chromatin organization", Nature, 2014.
ENCODE-X Browser: We have developed web application for theses chromatin datasets. The main advantage of our web application is that it will allow one to quick see what chromatin-related data are available using faceted browsing and use the IGV browser to view the data, for all three organisms. The chromatin state maps generated in Ho et al, 2014 are automatically loaded in the encode-x browser.
Antibody Validation Database: Antibodies used in the project were rigorously tested, and this database contains the validation data. Please see Egelhofer et al., An assessment of histone-modification antibody quality, Nature Str & Mol Biology, 2011.
modENCODE data portal: This website also allows one to use faceted browsing to select datasets of interest (fly and worm only).
modMine: This warehouse by the modENCODE Data Coordinating Center contains a flexible query interface with access to extensive intermediate and metadata (fly and worm only).
ENCODE data portal: This contains human and mouse ENCODE data.
Gene Expression Omnibus (GEO) and Short Read Archive (SRA): Raw data are available from these two sites. Links to specific datasets are available from the above sites.
This table contains detailed meta-data for all chromatin datesets, including links to the source file.
To enable the cross-species comparisons described in this paper, we have reprocessed all data using MACS. (Due to the slight differences in the peak-calling and input normalization steps, there may be slight discrepancies between the fly profiles analyzed here
and profiles available at the modENCODE data portal or modMine.
For every pair of aligned ChIP and matching input-DNA data, we used MACS version 2 to generate fold enrichment signal tracks for every position in a genome:
macs2 callpeak -t ChIP.bam -c Input.bam -B --nomodel --shiftsize 73 --SPMR -g hs -n ChIP
macs2 bdgcmp -t ChIP_treat_pileup.bdg -c ChIP_control_lambda.bdg -o ChIP_FE.bedgraph -m FE
For the fly data, genomic DNA Tiling Arrays v2.0 (Affymetrix) were used to hybridize ChIP
and input DNA. We obtained the log-intensity ratio values (M-values) for all perfect match
(PM) probes: M = log2(ChIP intensity) - log2(input intensity), and performed a whole-genome
baseline shift so that the mean of M in each microarray is equal to 0. The smoothed log intensity
ratios were calculated using LOWESS with a smoothing span corresponding to 500 bp,
combining normalized data from two replicate experiments. For the worm data, a custom
Nimblegen two-channel whole genome microarray platform was used to hybridize both ChIP
and input DNA. MA2C was used to preprocess the data to obtain a normalized and median
centered log2 ratio for each probe. All data are publicly accessible through the modENCODE
data portal or modMine.
The input normalized profiles are availabe at ENCODE-X browser.
Aligned DNase-seq data were downloaded from the modENCODE data portal and the ENCODE UCSC download page. Additional Drosophila embryo DNase-seq data were downloaded. After confirming consistency, reads from biological replicates were merged. We calculated minimally-smoothed signals (by a Gaussian kernel smoother with bandwidth of 10 bp in fly and 50 bp in human) along the genome in 10 bp (fly) or 50 bp (human) non-overlapping bins.
The MNase-seq data were analyzed as described previously38. In brief, tags were mapped to the corresponding reference genome assemblies. The positions at which the number of mapped tags had a Z-score > 7 were considered anomalous due to potential amplification bias. The tags mapped to such positions were discarded. To compute profiles of nucleosomal frequency around TSS, the centers of the fragments were used in the case of paired-end data. In the case of single-end data, tag positions were shifted by the half of the estimated fragment size (estimated using cross-correlation analysis39 toward the fragment 3’-ends and tags mapping to positive and negative DNA strands were combined). Loess smoothing in the 11-bp window, which does not affect positions of the major minima and maxima on the plots, was applied to reduce the highfrequency noise in the profiles.
We downloaded the 5bp GC% data from the UCSC genome browser annotation download page (http://hgdownload.cse.ucsc.edu/downloads.html) for
PhastCons scores were then binned into 10 bp (fly and worm) or 50 bp (human) non-overlapping bins.
We generated empirial genomic sequence mappability tracks using input-DNA sequencing data. After merging input reads up to 100M, reads were extended to 149 bp which corresponds to the shift of 74 bp in signal tracks. The union set of empirically mapped regions was obtained. They are available here:
We downloaded the "Gap" table from the UCSC genome browser download page (http://hgdownload.cse.ucsc.edu/downloads.html):
The data were downloaded from published paper XXX and YYY. Here are the genomic coordinates used in our study.
The code and instruction for running hiHMM can be accessed here.
The chromatin state definition can be accessed via the ENCODE-X Browser.
Gene expression data can be accessed from the modENCODE/ENCODE transcription page.