Logo

Modfied:  December 5, 2005

Supplementary Material

Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, & Park PJ , Proc Natl Acad Sci USA, 2005


R package: sigPathway

Implemented by Weil R. Lai

The latest version is available from the Bioconductor website: http://www.bioconductor.org/packages/bioc/html/sigPathway.html


Version Package Source code Help Comment Date
1.1-0 sigPathway_1.1-0.zip sigPathway_1.1-0.tar.gz sigPathway.pdf   Sep, 05
1.1-1 sigPathway_1.1-1.zip sigPathway_1.1-1.tar.gz   a bug for the continuous phenotype case is fixed Dec, 05
1.1-2 sigPathway_1.1-2.zip sigPathway_1.1-2.tar.gz   a memory allocation routine was modified Feb, 06
1.1-3 sigPathway_1.1-3.zip sigPathway_1.1-3.tar.gz manual.pdf a minor bug for the weights w_{ki} option fixed Mar, 06
1.1-4 sigPathway_1.1-4.zip sigPathway_1.1-4.tar.gz   minor bugs for permutations; for small samples, all possible permutations are now performed Apr, 06


A "Package" is a Windows binary file. For compiling on non-windows platforms, try"R CMD INSTALL sigPathway_1.1-4.tar.gz"

Expression Data

Muscle Data (49 samples)

Gene sets (R object)

Click on the right mouse button and choose 'save link as' to download the files.


For other array types, Entrez Gene IDs can be used as identifiers: Genesets_EntrezGeneIDs (all species, 3MB)

How to create your own gene sets

########################################################################
## create_gene_sets_from_Bioconductor_annotations.R
## Weil Lai
## wlai@alum.mit.edu
## August 13, 2007
##
## This R script lets users convert human, mouse, or rat annotations
## from Bioconductor annotation packages to a "G" list, which is
## required for analyzing gene sets in sigPathway. Although the script
## looks a bit convoluted, it takes less than one minute to run the script
## on a Pentium M 1.86 Mhz computer.
##
## The values for egidDir, biocAnnot, and perhaps other variables will
## need to be changed so that the script will run properly. Please
## make sure the Bioconductor annotation package of interest has already
## been installed in R before running this script.
##
## I have tested this script for hgu133a, hgu95av2, hs25kresogen, rat2302,
## and mgu74av2. It should work for other human, mouse, and rat
## Bioconductor annotations. More details regarding the "G" list format
## can be found in the sigPathway vignette.
########################################################################

## "Genesets_EntrezGeneIDs.RData" is an R workspace containing Entrez Gene
## IDs corresponding to gene sets from GO, KEGG, and other pathway
## databases listed in the sigPathway vignette PDF. It can be found at
## http://www.chip.org/~ppark/Supplements/PNAS05/Genesets_EntrezGeneIDs.RData
##
## egidDir stands for the directory name where Genesets_EntrezGeneIDs.RData
## is located
egidDir <- "W:/sigPathway development/pathway lists/egids"
load(file.path(egidDir, "Genesets_EntrezGeneIDs.RData"))

## Give the Bioconductor annotation package name here
biocAnnot <- "rat2302"

## get Entrez Gene IDs for each "probe set" (Affymetrix terminology)
## on the array
library(biocAnnot, character.only = TRUE)
xx <- as.list(get(paste(biocAnnot, "ENTREZID", sep = "")))
xx <- xx[!is.na(xx)]
xx <- unlist(xx)

xxUnique <- unique(xx)

yy <- vector("list", length(xxUnique))

for(i in 1:length(yy))
yy[[i]] <- names(xx)[xx == xxUnique[i]]

## Match probe sets (by Entrez Gene IDs) to master gene list
zz <- vector("list", length(G.EGIDs))

for(i in 1:length(zz)) {
m <- match(G.EGIDs[[i]]$probes, xxUnique)
zz[[i]] <- unlist(yy[m])

if(i %% 1000 == 0)
cat("i = ", i, "\n", sep = "")
}

## Disregard gene sets that did not get represented on the array
idx <- which(sapply(zz, length) > 0)
G.biocAnnot <- G.EGIDs[idx]

for(i in 1:length(idx))
G.biocAnnot[[i]]$probes <- zz[[idx[i]]]

## Save mapped gene set to the directory specified in egidDir
save(G.biocAnnot, file = file.path(egidDir, "G_biocAnnot.RData"))


Please direct your questions to Peter J. Park.