This module is currently in beta release. The module and/or documentation may be incomplete.
Cuffdiff visualization package providing plots based on a single user-specified gene.
Author: Loyal Goff, MIT Computer Science and Artificial Intelligence Lab
Contact:
gp-help@broadinstitute.org
Algorithm Version: 2.0.0
Summary
CummeRbund is a visualization package designed to help you navigate through the many inter-related files produced from a Cuffdiff RNA-Seq differential expression analysis and visualize the relevant results. CummeRbund helps promote rapid analysis of RNA-Seq data by aggregating, indexing, and allowing you to easily visualize and create publication-ready figures of your RNA-Seq data.
CummeRbund is a collaborative effort between the Computational Biology group led by Manolis Kellis at MIT's Computer Science and Artificial Intelligence Laboratory, and the Rinn Lab at the Harvard University department of Stem Cells and Regenerative Medicine. This document is adapted from the CummeRbund manual for release 2.0.0.
Usage
Unlike most modules in GenePattern, the CummeRbund reporting modules require the entire output of a Cuffdiff job as they work with not just one or two files but rather with all of the Cuffdiff output files. Simply drag the top-level Cuffdiff job folder into the 'cuffdiff.input' parameter from the 'Jobs' tab ('Recent Jobs' in versions of GenePattern before 3.8.0) or from the Job Results page. The CummeRbund modules can also be directly accessed from the context menu of jobs in either of these locations. Remember, you are submitting the entire job folder and not just a single file.
Alternatively, once a given job has been run through any one of the CummeRbund reporting modules, a reusable database file named cuffData.db will be produced that can be submitted in place of the Cuffdiff job for other CummeRbund reports. You can use this file for job submission via all of the usual GenePattern mechanisms or you can submit the entire CummeRbund job folder for a subsequent CummeRbund job in the same way as described above for Cuffdiff jobs. You are highly encouraged to reuse these database files wherever possible as your jobs will run much quicker and use less storage space than by starting from scratch with a Cuffdiff job.
CummeRbund.SelectedGeneReport will produce a variety of result files in the form of both plots and text tables; these are described further in the Output Files section below. You can use the feature.level parameter to control whether these should be generated at the level of genes, isoforms, transcript start sites (TSS), or coding sequences (CDS), although the Similarity plots will always be generated at the "genes" level.
The report.as.aggregate parameter controls whether reporting should be done for individual replicates or aggregate condition/sample values. The default is to use aggregate sample values. Similar to feature.level, however, the Similarity plots are always generated for aggregate samples.
For more information on using RNA-seq modules in GenePattern, see the RNA-seq Analysis page.
References
Trapnell C, Hendrickson D,Sauvageau S, Goff L, Rinn JL, Pachter L. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nature Biotechnology. 2013;31:46-53.
Links
Parameters
Name | Description |
---|---|
cuffdiff input * | A Cuffdiff job, a previous CummeRbund job, or a cuffData.db file from a previous CummeRbund job |
feature id * | The gene or feature of interest. This can be a gene symbol (short name), gene ID, isoform_id, tss_group_id, or cds_id. |
selected.conditions | Specifies the conditions (samples) to be used in the plots. This should be a comma-separated list of conditions, using the same names as in the upstream Cuffdiff job; leave this blank to use all conditions. The Similarity plots will always operate across all conditions. |
find.similar | Optionally, find and plot the top genes (up to this count) with an expression profile most similar to the given gene of interest. If blank, this will be skipped. |
output format * | The output file format. |
feature level * | Feature level for the report. Note that the Similarity plots will always be generated at the "genes" level. |
report as aggregate * | Controls whether reporting should be done for individual replicates or aggregate condition/sample values. The default is to use aggregate sample values. Note that the Similarity plots always show aggregate samples. |
log transform * | Whether or not to log transform the FPKM values. This directs that the y-axis will be drawn on a log10 scale. |
* - required
Input Files
- <cuffdiff.input> (required)
A Cuffdiff job, a previous CummeRbund job, or a cuffData.db file from a previous CummeRbund job. Unlike most modules in GenePattern, the CummeRbund reporting modules require the entire output of a Cuffdiff job as they work with not just one or two files but rather with all of the Cuffdiff output files. Simply drag the top-level Cuffdiff job folder into the 'cuffdiff.input' parameter from the 'Jobs' tab ('Recent Jobs' in versions of GenePattern before 3.8.0) or from the Job Results page. The CummeRbund modules can also be directly accessed from the context menu of jobs in either of these locations. Remember, unless you use a cuffData.db file, you are submitting the entire job folder and not just a single file.
Output Files
- cuffData.db
The RSQLite database created from the original Cuffdiff job. This file can be used in other CummeRbund jobs to avoid the need for extra computation and storage, in which case the new job will instead hold a link back to the file from the original job. - SelectedGene.ExpressionBarplot
A barplot of the FPKM values (Fragments Per Kilobase of transcript per Million mapped read) with confidence intervals, calculated for the selected gene across all samples (or replicates) in the Cuffdiff dataset. The value for each replicate is noted as a black dot along the confidence interval, while the top of the bar represents the value for a given aggregate sample. See this explanation from the Cufflinks website and the this entry from the Cufflinks FAQ for more information about FPKM values. - SelectedGene.ExpressionPlot
A line plot of the FPKM values with confidence intervals, calculated for the selected gene across all samples (or replicates) in the Cuffdiff dataset. The value for each replicate is noted as a dot of matching color along the confidence interval, while the aggregate sample value is noted as a black dot within the confidence interval. At this time, the line plots can only display tracking IDs rather than gene symbols. - SelectedGene.SimilarityExpressionBarplot
An FPKM barplot of a number of genes (up to the find.similar count) with an expression profile most similar to that of the selected gene. The Similarity plots will always use aggregate sample values. - SelectedGene.SimilarityExpressionPlot
An FPKM line plot of a number of genes (up to the find.similar count) with an expression profile most similar to that of the selected gene. The Similarity plots will always use aggregate sample values. At this time, the line plots can only display tracking IDs rather than gene symbols. - stdout.txt (and stderr.txt)
A log of output (and errors) produced during the database creation and plotting process. In case of an error, check both of these files for more details. The module has been designed to skip those plots where it encounters a problem along the way, continuing on to the next; if a given plot is missing, it should be noted in one of these files along with a reason if one could be determined.
Example Data
There is an example reusable database file available on our FTP site. This was generated using the example data and workflow from the Differential analysis of gene regulation at transcript resolution with RNA-seq article referenced above, by Trapnell, et al.
Requirements
CummeRbund.SelectedGeneReport requires R 2.15. When installing this module, GenePattern will automatically check for the presence of this exact version of R and will not proceed without it. See the section of our Administrator's Guide on the R Installer plug-in for details. Installing this module requires a number of supporting R packages from CRAN and Bioconductor; it will also check for their presence and install any that are missing in the process. These packages will be installed in a separate area specific to GenePattern and will not affect any other R library on the machine.
Please install R2.15.3 instead of R2.15.2 before installing the module. The GenePattern team has confirmed test data reproducibility for this module using R2.15.3 compared to R2.15.2 and can only provide limited support for other versions. The GenePattern team recommends R2.15.3, which fixes significant bugs in R2.15.2, and which must be installed and configured independently as discussed in Using Different Versions of R and Using the R Installer Plug-in. These sections also provide patch level fixes that are necessary when additional installations of R are made and considerations for those who use R outside of GenePattern.
Platform Dependencies
Task Type:
RNA-seq
CPU Type:
any
Operating System:
any
Language:
R
Version Comments
Version | Release Date | Description |
---|---|---|
0.17 | 2015-10-13 | Updated to make use of the R package installer. |