GenePattern provides access to a broad array of computational methods used to analyze genomic data. Its extendable architecture makes it easy for computational biologists to add analysis and visualization modules, which ensures that GenePattern users have access to new methods on a regular basis.
This Concepts guide provides a brief introduction to GenePattern. All other GenePattern documentation assumes that you are familiar with the concepts covered here.
Analysis and visualization modules are at the heart of GenePattern:
Each module includes its own documentation, which is supplied by the module developer. The Modules page of the GenePattern web site lists the modules available from the Broad Institute with links to their documentation.
Pipelines combine analysis modules, visualization modules, and other pipelines into a single, reusable workflow. Pipelines can be defined to analyze a particular dataset; for example, you might create a pipeline to reproduce published analysis results. Or they can be parameterized, which allows the person running the pipeline to provide datasets and other analysis variables. Often a pipeline runs a progressive series of analyses, where the output from one analysis is used as input for the next.
When you create a pipeline, you select the modules (and pipelines) to be executed by the pipeline. Most modules require one or more parameters. You can specify the parameter values when you create the pipeline, have the pipeline use the output file from one module as the input parameter value for a subsequent module, or prompt the user for parameter values when the pipeline is run.
Pipelines can be used to share analysis methods or to document research. By providing a way to create and distribute an entire computational analysis methodology in a single executable script, pipelines enable a form of in silico reproducible research. Colleagues with access to the same GenePattern server can easily share pipelines. Alternatively, a pipeline can be exported from one GenePattern server and imported into another.
The repository maintained by the Broad Institute includes a number of pipelines that document analysis methodologies published by Broad researchers. The Modules page of the GenePattern web site lists the pipelines available from the Broad Institute with links to their documentation.
Suites group modules and pipelines into convenient packages. For example, if you tend to analyze copy number data, you might find it helpful to create a suite that includes the SNPFileCreator, GISTIC, and other related modules. Suites provide easy access to frequently accessed modules. They also provide a convenient way of collecting a set of modules and pipelines to be shared with other GenePattern users. Colleagues with access to the same GenePattern server can easily share suites. Alternatively, a suite can be exported from one GenePattern server and imported into another.
The repository maintained by the Broad Institute includes a number of suites. The Suites page of the GenePattern web site lists them.
To use GenePattern, you open a web browser and enter a URL. The URL that you enter is the address of a GenePattern server. The web browser provides the user interface. The server runs the analyses and stores the results.
You can use the GenePattern server hosted on AWS, the server hosted at University of California, San Diego (UCSD) or, if you need to administer your own server, you can download and install the GenePattern software. The server hosted on AWS is most often referred to as the public server or the Cloud-hosted server, the server at UCSD is often referred to as GP@UCSD or GenePattern @ UCSD. GenePattern servers which were downloaded and installed by local administrators are known as local servers.
The GenePattern team hosts a publicly available GenePattern server at http://cloud.genepattern.org/gp/ via AWS. You can use the Cloud-hosted GenePattern server without installing any software.
Using the Cloud-hosted server has several benefits:
When you download GenePattern, you install a local GenePattern server. You can install a local server on a standalone machine for your personal use or on a networked machine for use by several people or an entire organization. A local GenePattern server shared by several users is sometimes called a networked GenePattern server. Instructions for installing a local GenePattern server are provided on the Download GenePattern page.
Using a local server has several benefits:
When you run a module or pipeline in GenePattern, the web browser sends your request to the GenePattern server. The server starts a job to run the analysis. Job results (analysis result files and execution logs) are stored on the GenePattern server for a period of time (by default, one week) and then deleted. The GenePattern home page displays your most recent jobs and the Job Result Summary page displays all of your jobs.
Every job run on the GenePattern server is owned by person who submitted the job. Owners are identified by their GenePattern usernames. Every job is persistent, which means:
GenePattern provides a flexible architecture that allows a user with server administrator privilege to control access to the server in several ways:
GenePattern servers are generally configured to distinguish between users and administrators. The following table shows the permissions used on the Broad-hosted server and the default permissions for a local server. GenePattern adjusts its user interface based on the permissions assigned to the person logged in; for example, only administrators see the Administration menu. The GenePattern documentation describes all of the GenePattern features. Your permissions determine whether a particular feature is visible.
Server | User Permissions | Administrator Permissions |
---|---|---|
Cloud-hosted server | Run public modules/pipelines
Create and run your own pipelines Edit/delete your jobs and pipelines |
GenePattern team has all permissions
GenePattern team can view/delete all jobs, modules, and pipelines |
Local server, standalone | Same as administrator permissions | All users have all permissions
All users can view/delete all jobs, modules, and pipelines |
Local server, shared* | Run public modules/pipelines
Create and run your own pipelines Create and run your own modules Create public pipelines Edit/delete your jobs, modules, and pipelines |
All permissions
View/delete all jobs, modules, and pipelines |
* When several users share a local server, the system administrator typically secures the server by assigning only a few users to the Administrators group. When a local server has designated administrators, users and administrators have the default permissions shown here.
For more information about security and permissions, see Securing the Server in the Administrators Guide.
GenePattern uses version numbers to uniquely identify objects, such as modules and pipelines. When you create an object, GenePattern automatically assigns it a version number of one (1). When you update the object, GenePattern automatically updates the version number. By carefully versioning each object, GenePattern ensures you can accurately reproduce analysis results.
For example, you might create a pipeline that runs two modules: PreprocessDataset version 4 and HierarchicalClustering version 5. If the HierarchicalClustering module is updated (creating HierarchicalClustering version 6), version 1 of your pipeline still runs HierarchicalClustering version 5; thus, ensuring that the pipeline produces the same results each time it is run. However, depending on why you are using the pipeline, you might prefer to have the pipeline run the latest version of an analysis module rather than a specific version. You make that choice when you create or edit the pipeline. For example, you might update the pipeline (creating version 2) to have the pipeline always use the latest version of the HierarchicalClustering module. Now, when you run version 2 of the pipeline it uses HierarchicalClustering version 6. When you run version 1 of the pipeline, it uses HierarchicalClustering version 5.
When you view and edit modules or pipelines, GenePattern shows you their version numbers. Typically, you update the latest version of an object, which increments its version number. For example, editing version 1 creates version 2. At times, you may need to edit an older version, which creates a point version. For example, if you have versions 1 and 2, editing version 1 creates version 1.1.
GenePattern implements version numbers using Life Science Identifiers (LSIDs). Thus, object identifiers in GenePattern are sometimes called LSIDs. You can double-check the version you have called using the LSID listed in the web URL. In the example below, the LSID portion pertaining to the module is bolded (00230:8.7), with the release version in red (8.7).
http://genepattern.broadinstitute.org/gp/pages/index.jsf?lsid=urn:lsid:broad.mit.edu:cancer.software.genepattern.module.analysis:00230:8.7
GenePattern provides comments to version updates at the very bottom of the module documentation in a section titled Version Comments. Updates may include bug fixes or replacement of a module algorithm with a new algorithm version released by authors. For example, TopHat algorithm v1 and v2 are written by different computational biologists and offer different features, e.g. the version of Bowtie used to align reads. These algorithm versions correspond to GenePattern module versions 1–5 and 6–9, respectively. Because algorithms can differ greatly from one version to the next, you may choose to call upon an earlier algorithm version. In this case, be sure to use the latest GenePattern module version still encasing the same algorithm version.
When you choose an earlier module version from the module dropdown menu, the Documentation link automatically updates to provide corresponding version documentation. For earlier module versions, these will be downloadable PDF documents. Later module versions open the document in the web browser. If multiple versions offer web-based documentation, the version will be noted next to the module name in the documentation. The documentation web URL addresses may or may not specify a version number, as shown in the examples below for TopHat. If a version is not specified, as in the first example, then the documentation for the latest version of the module is provided. The latest available version may be a version in beta, meaning it is still undergoing testing, and this will be marked on the page. If available, beta module documentation may or may not have been updated from the prior version.
http://www.broadinstitute.org/modules/docs/TopHat | Calls the documentation for the latest version of the module |
http://www.broadinstitute.org/modules/docs/TopHat/9 | Calls the documentation for v9 of the module |
http://www.broadinstitute.org/modules/docs/TopHat/8 | Calls the documentation for v8 of the module |
A programmatic interface makes it easy for software programmers to call GenePattern modules from the Java, MATLAB, or R programming environments. For information about the programmatic interface, see the Programmers Guide.