GenePattern News

April 2007, Number 7http://www.broad.mit.edu/genepattern/

Now Available:
GenePattern 3.0 with SNP Analysis

We are pleased to announce a new release of GenePattern, the SNP Analysis Suite of modules, new modules for the analysis of proteomic and gene expression data, and updated GenePattern workshops.

GenePattern 3.0
New and Updated Modules
GenePattern Workshops
GenePattern Case Study
Talk to Us

1. GenePattern 3.0

GenePattern 3.0 is now available. New features in this release include:

Web Client: The Web Client is completely redesigned to improve useability and include features previously available only in the Java Desktop Client.

Pipelines: You can now build pipelines that are more complex and easier to use. You can build pipelines that include other pipelines. You can provide your own names and descriptions for the parameters passed to a pipeline.

Module execution: You can now define a command line prefix for all modules or for individual modules. This allows you, for example, to use clustering service software (SGE, LSF) to send different modules to different job queues.

Security: GenePattern now provides individual user accounts, optional password protection, and a more granular security model.

User settings: Individual user accounts allow users to customize GenePattern by determining how many jobs to display in the "recent jobs" list and how much memory to assign to GenePattern visualizers.

Server Administration: A greater number of GenePattern server settings are now customizable from the Web Client. Problems that you may have encountered using proxy settings to install modules from the Broad repository have been corrected.

GenePattern 3.0 is available at http://www.genepattern.org/download/.

GenePattern 3.0 release notes are available at http://www.genepattern.org/doc/relnotes/3.0/.

We welcome your feedback and encourage you to send questions and comments to gp-help@broad.mit.edu.

2. New and Updated Modules

A number of modules have been added to the GenePattern module repository since our last newsletter:

SNP Analysis modules provide support for the analysis of Affymetrix high-density SNP arrays:
- SNPFileCreator creates a GenePattern .snp file from a collection of Affymetrix CEL file, determining the probe intensity of each SNP by summarizing probe intensities across probe sets.
- CopyNumberDivideByNormals determines SNP copy numbers by dividing the intensity value of the target SNP by the intensity value of the normal SNP.
- GLAD invokes the R package GLAD (Gain and Loss Analysis of DNA), which detects the altered regions in the genomic pattern and assigns a status (normal, gained or lost) to each chromosomal region.
- LOHPaired detects loss of heterozygosity (LOH).
- SNPFileSorter sorts SNPs by chromosome and physical location. This is a prerequisite for some modules, such as SNPViewer.
- SNPViewer provides a powerful visual representation of SNP copy number and LOH data.
- XChromosomeCorrect, for each sample from a male donor, doubles the intensity value for each SNP on the X chromosome.
Optionally, create your own SNP Analysis pipeline by using GenePattern pipelines to combine the SNP Analysis modules into a single customized workflow.

LandmarkMatch and PeakMatch provide peak and landmark matching for advanced analysis of LC-MS data. They are based on work published by Jaffe, Mani, et al in PEPPeR, a Platform for Experimental Proteomic Pattern Recognition (Molecular & Cellular Proteomics 5:1927-1941, 2006).

Multiplot modules allow you to create 2-parameter scatter plots from microarray data. The plots, which are customizable and interactive, display each probe (gene) as an individual dot whose identity and characteristics can be queried. Use the MultiplotPreprocess module to prepare your expression data for plotting, Multiplot to view the interactive plots, and MultiplotExtractor to create expression datasets based on the multiplot data.

CART and CARTXValidation provide class prediction based on building classification and regression trees for predicting continuous dependent variables (regression) and categorical predictor variables (classification) (Breiman, et al., 1984).

GSEALeadingEdgeViewer runs the Leading Edge Analysis, which helps you visualize the overlap among the top gene sets returned by the Gene Set Enrichment Analysis (GSEA).

HierarchicalClusteringImage creates an image of the dendrogram generated from HierarchicalClustering, including support for the coloring of dendrogram nodes.

KMeansClustering clusters samples or features based on a randomly selected set of k cluster centers. Data points are assigned to the nearest cluster center and each cluster center is recalculated to be the mean value of its members. KMeansClustering repeats this process until the cluster centers stabilize.

MergeColumns and MergeRows create new datasets by merging existing datasets.

In addition, the following modules have been updated:

ConsensusClustering clusters samples or features by building consensus clusters across multiple runs of a selected clustering algorithm. KMeansClustering has been added as one of the supported clustering algorithms.

ExpressionFileCreator creates an expression dataset from Affymetrix CEL files. When you use the MAS5 conversion algorithm, expression data is now normalized using the method that you select.

GEOImporter can now be used to download GEO Datasets. The URL for downloading GEO files has now been updated.

GSEA now uses GSEA v2.0.1, the latest version of the Gene Set Enrichment Analysis software.

HierarchicalClusteringViewer now displays expression profiles for selected samples and features. It also saves images in eps format, as well as bmp, jpeg, png, and tiff formats.

HeatMapViewer now displays expression profiles for selected samples and features. It also saves images in eps format, as well as bmp, jpeg, png, and tiff formats.

SelectFeaturesColumns and SelectFeaturesRows now work with SNP files.

SVM no longer requires optional parameters. In addition, its R libraries are updated for compatibility with GenePattern 3.0. SVM versions 1 and 2 cannot be run on GenePattern 3.0.

To install new and updated modules, open the GenePattern Web Client and click Modules>Install from Repository. For comprehensive documentation on the modules in the repository, see our module page.

3. GenePattern Workshops

We've updated our popular GenePattern workshop to introduce participants to the features of GenePattern 3.0, including:

intuitive web and application interfaces for users at all levels of computational sophistication
comprehensive repository of analysis and visualization modules for analyzing gene expression data, proteomic data, and high-density SNP array data
pipelines that allows users to chain modules together to create and share methodologies
easy module creation that allows rapid, code-free integration of new tools
a programming environment that allows you to access GenePattern modules from the Java, MATLAB, and R programming languages

This one-day workshop is being offered on the following dates:

Thursday, May 17 (Broad employees only)
Monday, May 21
Tuesday, May 22

All workshops will be held 9am-5pm at MIT's Digital Instruction Resource Center (14N-132) in Cambridge, Massachusetts. Registration is free for attendees from academic or other nonprofit organizations and $600 for attendees from for profit organizations.

Register now at http://www.broad.mit.edu/genepattern/workshop/. Or, if these dates are inconvenient, use the registration form to request that we notify you of future workshops.

4. GenePattern Case Study

GenePattern at Harvard-Partners Center for Genetics and Genomics

The Gateway for Integrated Genomics-Proteomics Applications and Data (GIGPAD) is a software platform that allows investigators and clinicians at the Harvard-Partners Center for Genetics and Genomics (HPCGG) to share data and analysis results without compromising patient confidentiality. GIGPAD relies on GenePattern to provide the computational analysis framework for HPCGG laboratories. Eugene Clark, HPCGG senior software architect, explains why his team chose to incorporate GenePattern rather than build an in-house system, "Using GenePattern allows us to decouple bioinformatics from our main application infrastructure, thereby providing our biologists greater freedom to innovate without being constrained by formal software development practices."

Prior to GIGPAD, each HPCGG lab maintained unique processes that required manual intervention, custom scripts, and IT support. Data and analysis results were difficult to share, manual processes time consuming, and parallel IT support expensive. Today the labs are fully automated. GIGPAD receives raw data files from the lab machines and sends the files to GenePattern for processing. GenePattern runs selected computational analysis pipelines and forwards the results to GIGPAD. HPCGG associates can review the raw data and analysis results without compromising patient confidentiality. The GIGPAD-GenePattern integration centralizes data access, reduces processing time, and simplifies maintenance.

"We wanted the labs to retain their independence, but enable collaboration by having a central location for data and analysis results," explains Clark. The computational analysis framework was a critical component of the laboratory infrastructure. The framework had to be flexible enough to allow each lab to adapt its own methodologies, rigorous enough to enable reproducible research, extensible, and maintainable. Clark chose GenePattern based on the benefits it offered:

Easy definition and maintenance of analysis pipelines
Server-based architecture that supports clusters for efficient processing
Multiple platform technology that supports Windows, Mac, and UNIX
Comprehensive technical support and training
Freely available

Integrating GenePattern with GIGPAD makes it easy for the IT team to build, deploy, and maintain customized computational analysis pipelines for individual HPCGG laboratories.

5. Talk to Us

Please let us know how you're using GenePattern.

Publications

If you've published a paper that makes use of GenePattern, we'd love to hear about it:
email the GenePattern team. Even if you're just using GenePattern in a novel way, let us know!

User Survey

If you use GenePattern, we would like to know how your experience has been. Our user survey is a brief online form that lets you give us feedback about the software and other aspects of using GenePattern. Your responses are greatly appreciated - they help us to understand how GenePattern is being used and how to make it a more valuable tool.

Early Adopters

If you'd like early access to new GenePattern releases to help us test new GenePattern features,
join the early adopters mailing list.

To remove yourself from this list, unsubscribe.