Generates a QC report on raw sequence data.
Author: Brabaham Institute
Contact:
Marc-Danie Nazaire, gp-help@broadinstitute.org
Algorithm Version: 0.10.1
Introduction
The FastQC module runs the FastQC quality control tool developed at Brabaham instititute. FastQC takes as input the raw sequencing data (short read data contained within a FastQ, BAM or SAM file) produced by an NGS sequencing platform and produces a quality control report which can identify problems that might have originated either in the sequencer or during library preparation. FastQC's analysis is performed by a series of analysis modules. The report provides a quick overview that presents a status (normal, slightly abnormal, very unusual) for each quality analysis module. For each quality analysis module the report contains a graph or table presenting corresponding quality statistics.
Parameters
Name | Description |
---|---|
input file * | A raw sequence file - .fastq, .sam, .bam. |
input format | Bypasses the normal sequence file format detection and forces the program to use the specified format. Valid formats are bam,sam,bam_mapped,sam_mapped and fastq |
contaminant file | Specifies a non-default file which contains the list of contaminants to screen overrepresented sequences against. The file must contain sets of named contaminants in the form name[tab]sequence. Lines prefixed with a hash will be ignored. |
kmer size * | Specifies the length of Kmer to look for in the Kmer content module. Specified Kmer length must be between 2 and 10. Default length is 5. |
extract output | Whether to output an uncompressed version of the report. Set this to yes to view the report directly from within GenePattern. |
* - required
Input Files
- input file
A raw sequence file in FASTQ, SAM, or BAM format. - contaminant file
A tab delimited file in the following format name[tab]sequence. Header lines starting with "#" are ignored.
# This is an example contaminant file | |||
Illumina Single End Adapter | 1 | GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG | |
Illumina Single End Adapter | 2 | CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT | |
Illumina Single End PCR Primer | 1 | AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT |
Output Files
- <input.file_basename>_fastqc.zip
A zip file containing an HTML report.
Example Data
An example of a report from a good Illumina dataset can be found here.
An example of a report from a bad Illumina dataset can be found here.
Platform Dependencies
Task Type:
RNA-seq
CPU Type:
any
Operating System:
any
Language:
Java
Version Comments
Version | Release Date | Description |
---|---|---|
1 | 2017-03-17 | Production release |
.5 | 2014-05-13 | Beta Release |