Meerkat

This site hosts the Meerkat software developed in Peter Park's lab at CBMI, Harvard Medical School. The Meerkat methodology is documented in

Diverse mechanisms of somatic structural variations in human cancer genomes

Lixing Yang¹, Lovelace J. Luquette¹, Nils Gehlenborg^1,3, Ruibin Xi^1,4, Psalm S. Haseley^1,5, Chih-Heng Hsieh⁶, Chengsheng Zhang⁶, Xiaojia Ren⁵, Alexei Protopopov⁷, Lynda Chin⁷, Raju Kucherlapati^2,5, Charles Lee^6,8, Peter J. Park^1,5,9,*

¹Center for Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
²Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
³Cancer Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
⁴School of Mathematical Sciences and Center for Statistical Science, Peking University, Beijing 100871, China
⁵Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA
⁶Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA
⁷Department of Genomic Medicine and Institute for Applied Cancer Science, MD Anderson Center, Houston, TX 77054, USA
⁸Department of Medicine, Seoul National University College of Medicine, Seoul 110-799, South Korea
Houston, Texas, USA
⁹Informatics Program, Children's Hospital, Boston, MA 02115, USA
^* To whom correspondence should be addressed

Cell. 153: 919-929.

Download Meerkat

Meerkat can be downloaded through the links below. For further instructions on installing and configuring Meerkat as well as required software for running the pipeline, please see the manual included in the source distribution.

Version 0.189 `.tar.gz` (556KB) md5sum: `1a31a5e3b946e70d3bc592e1202ab051`

Demo dataset

The following tarball contains a small dataset for demonstration and to test your Meerkat installation.

Example dataset `.tar.gz` (156MB) md5sum: `b274ea762038a6d9c9cea2d8999be066`

The Meerkat Workflow

The following diagram outlines the steps taken by Meerkat to identify structural variations. Please refer to the publication for a more detailed treatment of the process.

Pink and light blue lines denote two segments of the genome connected together in the formation of an SV. In Step 3, purple lines denote repetitive segments. Arrows connected by dashed lines represent paired-end reads. In Step 4, the dark blue line on the reference genome represents a deleted segment in the donor genome; the dark red line represents an inserted segment. Light and dark green arrows connected by dashed lines represent 2 clusters of discordant read pairs. In Step 5, the beginning and end of soft-clipped and unmapped reads identified in Step 1 are used as split reads.

Frequently Asked Questions

0. I have a problem, is there documentation available?

Yes! A manual is distributed with Meerkat. Before contacting us for help, please be sure to read the manual first.

1. I can't compile the binaries.

If you encounter error like this:

/usr/bin/ld: gzstream.o: undefined reference to symbol 'gzclose'
//lib/x86_64-linux-gnu/libz.so.1: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make: *** [bamreader] Error 1

Please modify the Makefile of each binary as follow:

from: ... -lbamtools -lbamtools-utils
to: ... -lbamtools -lbamtools-utils -lz

2. What if I can't get Meerkat to work on my server?

Please use the latest version of Meerkat. Meerkat requires a number of programs, modules, and references, make sure you have all of them in place and the parameters are given properly. The most likely error is some parameters are not specified correctly. If you have difficulty installing any programs, modules or references, please first contact your system admin. If you need further assistance, please contact us at ylixing@gmail.com

When you contact us, please provide the following: Are you able to run ./bin/bamreader from command?
Are you able to run example.bam?
The output of "ls -l" for run folder.
pre.log file if it's generated.
isinfo file if it's generated.
dre.log file if it's generated.
Error message if there is any.

3. The breakpoints are not properly annotated with fusions.pl script

The refGene.txt downloaded from UCSC needs to be sorted by chromosome and coordinate by following command:

sort refGene.txt -k 3,3 -k 5,5n > refGene_sorted.txt