We are interested in computational analysis of genomic data to answer important questions in epigenetics and cancer.

[updated 12/1/14]

Algorithm development for analysis of whole-genome sequencing (WGS) data - We specialize in detection of genomic alterations in whole-genome data. WGS for an individual generates about 150GB of data (twice that for a cancer patient). Accurate identification of sequence (single nucleotide variants and indels) and structural variations (e.g., copy number changes, translocations, transposable element insertions, microsatellite instability, viral insertions) involve sophisticated algorithms.

Large-scale analysis of cancer genomics data - We have been involved in The Cancer Genome Atlas (TCGA) that has performed comprehensive genomic characterization of nearly 10,000 cancer samples (~500 samples each for 20 tumor types). As part of this project, the Genome Characterization Center at Harvard Medical School has generated low-coverage WGS data for >1200 cases. The Park lab has contributed structural variation analysis to several TCGA marker papers.

Integrative analysis of epigenetic data - We have been part of the ENCODE project to profile a large number of histone modifications and chromatin-associated proteins to understand the relationship between chromatin structure and gene function. We led the cross-species (human-fly-worm) chromatin analysis for the consortium. The supplementary website for the final paper is here, and the data repository we have developed is here.

Genomic applications to the study of eye disorders - In this project, we aim to apply the algorithms that we and others have developed in cancer genomics to vision research, for identification of disease-causing variants and data sharing.

Clinical applications of high-throughput sequencing - We are involved in the data coordinating center for the Undiagnosed Disease Network, which aims to use genomic characterization and other state-of-the-art diagnostic methods to better understand rare, hard-to-diagnose diseases. We are also involved in the "Babyseq" project, in which babies from neonatal ICU at Boston Children's Hospital and healthy newborns at Brigham and Women's Hospitals will be sequenced to determine whether genome sequencing will be helpful in clinical medicine.

Development of an analytical and visualization platform - For the Harvard Stem Cell Institute, we are developing a platform called Refinery, a data management, analysis and visualization system for bioinformatics and computational biology applications. The platforms consists of three major components: a data repository with rich metadata capabilities, a workflow engine based on the popular Galaxy system, and visualization tools to support data exploration.

Active collaborations: We work with a large number of experimental collaborators around Harvard and beyond.