GenePattern - CummeRbund.SelectedConditionsReport (v1) BETA

This module is currently in beta release. The module and/or documentation may be incomplete.

Cuffdiff visualization package providing plots based on a user-provided list of conditions.

Author: Loyal Goff, MIT Computer Science and Artificial Intelligence Lab; David Eby, Broad Institute

Contact:

gp-help@broadinstitute.org

Algorithm Version:

Summary

CummeRbund is a visualization package designed to help you navigate through the many inter-related files produced from a Cuffdiff RNA-Seq differential expression analysis and visualize the relevant results. CummeRbund helps promote rapid analysis of RNA-Seq data by aggregating, indexing, and allowing you to easily visualize and create publication-ready figures of your RNA-Seq data.

CummeRbund works with the output of the Cuffdiff module, processing its output files into a database to be used for reporting and plotting. The results are indexed to speed up access to specific feature data (genes, isoforms, transcript start sites, coding sequences, etc.), and preserve the various relationships between these features. Creation of this database means that the expression values and other results are stored in a rapidly accessible form, quickly searchable for future use in the other CummeRbund reporting modules or downloadable for direct use in CummeRbund with R for custom reports. For more details about CummeRbund, see the website and manual.

There are four CummeRbund modules available, each allowing you to examine your Cuffdiff results from a different perspective. All of the modules allow reporting at either the aggregate or the replicate level and can present quantification metrics at the level of genes, isoforms, transcription start sites, or coding sequences. The CummeRbund.QcReport provides high-level visualizations allowing for comparisons across all conditions and all genes - for example, to look at the distribution of expression values across conditions - to spot similarities and differences and to see the relationship between conditions.

The other modules allow you to focus on specific conditions and/or genes. The CummeRbund.SelectedConditionReport provides visualizations across all genes, but limited to a specific set of conditions so that you can compare individual condition pairs. The CummeRbund.GeneSetReport allows you to focus on a specific list of genes to be visualized, while the CummeRbund.SelectedGeneReport is focused on a single user-chosen gene. Both the GeneSetReport and the SelectedGeneReport can be further constrained to a selected set of conditions. The plots provided by each module differs based on the slice of data to be examined; the visualization possible vary due to reasons of both performance and practicality of visual presentation.

CummeRbund is a collaborative effort between the Computational Biology group led by Manolis Kellis at MIT's Computer Science and Artificial Intelligence Laboratory, and the Rinn Lab at the Harvard University department of Stem Cells and Regenerative Medicine - See more at: http://compbio.mit.edu/cummeRbund/#sthash.dunKB0RP.dpuf

CummeRbund is a collaborative effort between the Computational Biology group led by Manolis Kellis at MIT's Computer Science and Artificial Intelligence Laboratory, and the Rinn Lab at the Harvard University department of Stem Cells and Regenerative Medicine. This document is adapted from the CummeRbund manual for release 2.0.0.

Usage

Unlike most modules in GenePattern, the CummeRbund reporting modules require the entire output of a Cuffdiff job as they work with not just one or two files but rather with all of the Cuffdiff output files. Simply drag the top-level Cuffdiff job folder into the 'cuffdiff.input' parameter from the 'Jobs' tab ('Recent Jobs' in versions of GenePattern before 3.8.0) or from the Job Results page. The CummeRbund modules can also be directly accessed from the context menu of jobs in either of these locations. Remember, you are submitting the entire job folder and not just a single file.

Alternatively, once a given job has been run through any one of the CummeRbund reporting modules, a reusable database file named cuffData.db will be produced that can be submitted in place of the Cuffdiff job for other CummeRbund reports. You can use this file for job submission via all of the usual GenePattern mechanisms or you can submit the entire CummeRbund job folder for a subsequent CummeRbund job in the same way as described above for Cuffdiff jobs. You are highly encouraged to reuse these database files wherever possible as your jobs will run much quicker and use less storage space than by starting from scratch with a Cuffdiff job.

CummeRbund.SelectedConditionsReport will produce a variety of result files in the form of both plots and text tables; these are described further in the Output Files section below. You can use the feature.level parameter to control whether these should be generated at the level of genes, isoforms, transcript start sites (TSS), or coding sequences (CDS).

The report.as.aggregate parameter controls whether results will be reported with replicates split out separately or together in aggregated samples. Note that some result files are always generated for aggregate samples regardless of this setting; see the Output Files section for details.

For more information on using RNA-seq modules in GenePattern, see the RNA-seq Analysis page.

References

Trapnell C, Hendrickson D,Sauvageau S, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nature Biotechnology. 2013;31:46-53.

Links

CummeRbund website and manual.

Parameters

Name	Description
cuffdiff input *	A Cuffdiff job, a previous CummeRbund job, or a cuffData.db file from a previous CummeRbund job.
selected conditions	Specifies the conditions (samples) to be used in the plots. This should be a comma-separated list of conditions, using the same names as in the upstream Cuffdiff job; leave this blank to use all conditions. If blank, no pairwise comparison plots will be generated.
output format *	The output file format.
feature level *	Feature level for the report.
report as aggregate *	Controls whether reporting should be done for individual replicates or aggregate condition/sample values. The default is to use aggregate sample values. Note that the Volcano plot always shows aggregate samples.
log transform *	Whether or not to log transform the FPKM values. Note that the FPKM values are always log transformed for the Volcano plots and that the Scatter plot will use a log2 rather than log10 transformation.

* - required

Input Files

<cuffdiff.input> (required)
A Cuffdiff job, a previous CummeRbund job, or a cuffData.db file from a previous CummeRbund job. Unlike most modules in GenePattern, the CummeRbund reporting modules require the entire output of a Cuffdiff job as they work with not just one or two files but rather with all of the Cuffdiff output files. Simply drag the top-level Cuffdiff job folder into the 'cuffdiff.input' parameter from the 'Jobs' tab ('Recent Jobs' in versions of GenePattern before 3.8.0) or from the Job Results page. The CummeRbund modules can also be directly accessed from the context menu of jobs in either of these locations. Remember, unless you use a cuffData.db file, you are submitting the entire job folder and not just a single file.

Output Files

cuffData.db
The RSQLite database created from the original Cuffdiff job. This file can be used in other CummeRbund jobs to avoid the need for extra computation and storage, in which case the new job will instead hold a link back to the file from the original job.
SelectedConditions.MAplot
This set of files plot Average Intensity vs. log ratio (M vs. A, or MA, plots) across all FPKM values for each pair of samples. The full set of files show all the pairwise comparisons, for each sample against all other samples. For each pairwise comparison two plots will be made, with the samples in the pair exchanging the X-axis and the Y-axis positions so that you may choose the comparison you prefer in each case. This Wikipedia entry gives some background on MA plots.
SelectedConditions.Scatter
These files hold a set of scatter plots with smooth-fit regression lines, showing the pairwise comparisons of FPKM values for the given gene list, for each sample against all other samples. For each pairwise comparison two plots will be made, with the samples in the pair exchanging the X-axis and the Y-axis positions so that you may choose the comparison you prefer in each case.
SelectedConditions.Volcano
These files hold a set of volcano plots of showing the pairwise comparisons for the given gene list, for each sample against all other samples. The volcano plot is a useful visualization to compare fold change between any two conditions and significance (log fold change in expression vs. -log P-values). Those points that are found on either the far top-left or top-right sides of the plot represent values that display both large magnitude fold changes as well as high statistical significance.
stdout.txt (and stderr.txt)
A log of output (and errors) produced during the database creation and plotting process. In case of an error, check both of these files for more details. The module has been designed to skip those plots where it encounters a problem along the way, continuing on to the next; if a given plot is missing, it should be noted in one of these files along with a reason if one could be determined.

Example Data

There is an example reusable database file available on our FTP site. This was generated using the example data and workflow from the Differential analysis of gene regulation at transcript resolution with RNA-seq article referenced above, by Trapnell, et al.

Requirements

CummeRbund.SelectedGeneReport requires R 2.15. When installing this module, GenePattern will automatically check for the presence of this exact version of R and will not proceed without it. See the section of our Administrator's Guide on the R Installer plug-in for details. Installing this module requires a number of supporting R packages from CRAN and Bioconductor; it will also check for their presence and install any that are missing in the process. These packages will be installed in a separate area specific to GenePattern and will not affect any other R library on the machine.

Please install R2.15.3 instead of R2.15.2 before installing the module. The GenePattern team has confirmed test data reproducibility for this module using R2.15.3 compared to R2.15.2 and can only provide limited support for other versions. The GenePattern team recommends R2.15.3, which fixes significant bugs in R2.15.2, and which must be installed and configured independently as discussed in Using Different Versions of R and Using the R Installer Plug-in. These sections also provide patch level fixes that are necessary when additional installations of R are made and considerations for those who use R outside of GenePattern.

Platform Dependencies

Task Type:
RNA-seq

CPU Type:

Operating System:
any

Language:

Version Comments

Version	Release Date	Description
0.11	2015-10-13	Updated to make use of the R package installer.