MetaCGH - Description

This website is designed to provide array CGH (comparative genomic hybridization) based copy number profiles of ~8,000 human cancer genomes. The copy number profiles of high-resolution array CGH (Agilent 244K as well as Affymetrix 100K, 250K, 500K, and SNP6.0) were obtained from the Gene Expression Omnibus (GEO), a public repository of microarray datasets. For a description of the method used for data processing and segmentation, please refer to the article Functional genomic analysis of chromosomal aberrations in a compendium of cancer genomes (by Kim et al in Dr. Park's lab; in preparation).

Segmentation files are provided for visual inspection of copy number changes for 8,227 cancer genomes as well as for subsequent functional analyses (e.g. the identification of recurrent alterations or functional enrichment analyses). The segmentation files are CBS-output style (with a header line) with 6 columns (metaData ID, chromosome, start, end, num.mark, and seg.mean). Each row in the file corresponds to an individual genomic segment where chromosome/start/end are genomic coordinates. Seg.mean is the average log₂ ratio of probes in the segment representing the extent of copy number changes (positive and negative values represent copy gains and losses, respectively). Note that the data only contains autosomal segments. According to the segmentation methods (CBS (1) or GLAD (2)) and genomic version (hg18/Build36 and hg19/Build37), four segmentation files are provided.

Metadata for the 8,227 samples is available here (.txt file). The information includes the MetaCGH ID, three types of GEO accession ID (sample/GSM identifier; study/GSE identifier; platform/GPL identifier) for the corresponding sample, tumor types, tumor subtypes (if available), GEO description, and primary/cell lines (in order of columns in the file). For Affymetrix 100K and 500K platforms, two matched samples in a pair are given in the GEO columns. The Broad's IGV can read this file to sort or filter the samples by certain categories (e.g., tumor (sub)types, primary/cell lines).

This website also provides the analysis files related to the article by Kim et al.

Reference list

Olshen, A.B., Venkatraman, E.S., Lucito, R. and Wigler, M. (2004) Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5, 557-572.
Hupe, P., Stransky, N., Thiery, J.P., Radvanyi, F. and Barillot, E. (2004) Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics, 20, 3413-3422.
Beroukhim, R., Getz, G., Nghiemphu, L., Barretina, J., Hsueh, T., Linhart, D., Vivanco, I., Lee, J.C., Huang, J.H., Alexander, S. et al. (2007) Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc. Natl. Acad. Sci. U. S. A., 104, 20007-20012.
Mermel, C.H., Schumacher, S.E., Hill, B., Meyerson, M.L., Beroukhim, R. and Getz, G. (2011) GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol., 12, R41.
Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, T.D., Fiegler, H., Shapero, M.H., Carson, A.R., Chen, W. et al. (2006) Global variation in copy number in the human genome. Nature, 444, 444-454.
McCarroll, S.A. (2008) Integrated detection and population-genetic analysis of SNPs and copy number variation.
Futreal, P.A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., Rahman, N. and Stratton, M.R. (2004) A census of human cancer genes. Nat. Rev. Cancer, 4, 177-183.
Bignell, G.R., Greenman, C.D., Davies, H., Butler, A.P., Edkins, S., Andrews, J.M., Buck, G., Chen, L., Beare, D., Latimer, C. et al. (2010) Signatures of mutation and selection in the cancer genome. Nature, 463, 893-898.
Stephens, P.J., Greenman, C.D., Fu, B., Yang, F., Bignell, G.R., Mudie, L.J., Pleasance, E.D., Lau, K.W., Beare, D., Stebbings, L.A. et al. (2011) Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell, 144, 27-40.

Description of the MetaCGH database