Computing SNP Copy Number and Loss of Heterozygosity

Posted on Saturday, September 15, 2012 at 12:29PM by The GenePattern Team


In cancer genomics, copy number change is one of the hallmarks of the genetic instability common to most human cancers and loss of heterozygosity (LOH) of tumor suppressor genes is a crucial step in the development of sporadic and hereditary cancer (Monti, 2005). Using modules available in GenePattern, you can compute SNP copy number and LOH based on Affymetrix SNP chip data for paired target/normal samples and then view them in the Integrative Genomics Viewer (IGV). The following modules are used for this computation, with IGV at the end for viewing the results:

  • SNPFileCreator
  • XChromosomeCorrect
  • CopyNumberDivideByNormals
  • LOHPaired
  • IGV


SNPFileCreator converts the .CEL files from an Affymetrix array into a GenePattern .SNP file. Raw data for the probes in each SNP probe set are converted to a single intensity value per SNP using one of four modeling algorithms: Average Difference, PM/MM Difference Model (dChip, the default), Median Probe, or Trimmed Mean. Note that processing times for this module can average upwards of 30 minutes, depending on the speed of the server, the size of the dataset, and available memory. At least 2GB of memory are needed to run most SNPFileCreator jobs.

SNPFileCreator Inputs, Parameters, and Considerations

  • CEL files from the Affymetrix 500k Array Chip Set (250K Sty, 250K NSP) or 100K Array Chip Set (50K Xba, 50K Hind) in a ZIP archive.
  • Optionally, for each CEL file, a TXT file containing the genotype calls for the SNP array. These files are contained in the ZIP archive with the CEL files. (More information on file formats can be found here.)
  • Each chip set uses two unique high density arrays to genotype over 500,000 and 100,000 SNPs in one experiment, respectively. The module converts the CEL files for one array into a .SNP file. To create a .SNP file for a chip set, use the MergeRows module to combine the .SNP files for the two arrays.
  • SNPFileCreator uses the Human Genome of March 2004 (hg17) to include Chromosome and Physical Location columns in the .SNP file. By default, it sorts the SNPs by chromosome and physical location. (SNPFileCreator Version 2 - currently in beta - has been updated to use hg18)
  • SNPFileCreator creates a .SNP file in one of two formats: Non Allele-Specific (default) or Allele-Specific. For each sample, the Non Allele-Specific format contains an intensity value and a genotype call; the Allele-Specific format contains an intensity value for allele A, intensity value for allele B, and genotype call. All GenePattern modules accept the Non Allele-Specific format; many do not yet accept the Allele-Specific format.

For more information about SNPFileCreator please see the SNPFileCreator Documentation


For gender-specific samples, run the XChromosomeCorrect module on the output of SNPFileCreator to correct intensity values for SNPs on the X chromosome. For each sample from a male donor, the module doubles the intensity value for SNPs on the X chromosome.

XChromosomeCorrect Inputs, Parameters and Considerations

The sample information file describes the SNP array and must be tab-delimited, include a column labeled Gender that contains a value of M or F for each sample and include target/normal paired samples for copy number and LOH determination. (More information on file formats can be found here)

For more information about XChromosomeCorrect please see the XChromosomeCorrect Documentation


CopyNumberDivideByNormals computes the raw copy number of each target SNP by dividing its intensity value by the mean intensity value of all normal SNPs. This calculation is referred to as copy number normalization or normalization with respect to normals.

CopyNumberDivideByNormals Inputs, Parameters, and Considerations

  • The input file is a .SNP file output from either SNPFileCreator or XChromosomeCorrect (if there were gender-specific samples). The file must contain both normal and target samples so that CopyNumberDivideByNormals can determine the raw copy number of a target SNP with respect to normals.
  • CopyNumberDivideByNormals creates one of two files:
    • .CN (default) does not include genotype calls.
    • .XCN includes genotype calls.

For more information about CopyNumberDivideByNormals please see the CopyNumberDivideByNormals Documentation


The LOHPaired module detects loss of heterozygosity (LOH). It takes as input a GenePattern .SNP
file that contains paired normal-target samples with genotype calls. (LOHPaired accepts only nonallele-
specific .SNP files; .SNP files that contain one intensity value per probe.) It returns as output a
GenePattern .LOH file that contains, for each probe, the LOH calls for each array pair.

LOH call values are as follows.

Call Value
L LOH: AB in normal and A or B in tumor
R Retention: AB in both normal and tumor or No Call in normal and AB in tumor
C Conflict: A or B in normal and AB in tumor
N Non-informative call: A or B in normal
No call: No Call in normal or tumor

LOHPaired Input, Parameters, and Considerations

  • LOHPaired takes as input a GenePattern .snp file that contains paired normal-target samples and genotype calls. Use the output from SNPFileCreator (or XChromosomeCorrect) as described above.
    Note: LOHPaired accepts only non-allele-specific .SNP files (.SNP files that contain one intensity value per probe).
  • A sample info file is also required.This is a tab-delimited file where:
    • the first row contains labels identifying the content of each column
    • Each remaining row describes one sample.
  • LOH detection requires columns with the following label, all other columns
    are ignored:
    • Paired: indicates the normal/target pairs. For the normal array, Paired
      is Yes; for the target sample, Paired is the array name of the normal
    • Array: contains the array name.
    • It is probably easiest to modify the file previously used in XChromosomeCorrect.


The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated datasets. It supports a wide variety of data types and provides easy access to genomes and datasets hosted by the Broad Institute.

Adding a track line to view LOH data

  • An LOH file (.LOH) is a copy number file that contains "loss of heterozygosity" values. The format is identical to the .CN format, but the numbers have the following meanings:
    • -1: Conflict (homozygous in the normal and heterozygous in the tumor)
    • 0: Retained
    • 1: Loss of heterozygosity
  • Numbers that fall between these values represent the probability of LOH. IGV treats the values as a continuum and colors them according to the heatmap scale set for the LOH track.
  • In order to modify IGV's default display settings for the LOH data (.LOH file, output from LOHPaired, see above), a track line must be included in the file.
  • When IGV loads a data file, it uses the file extension to determine the file format, the file format to determine the data type, and the data type to determine the default display options (see Default Display). Adding a track line to a data file modifies IGV's default display options.
  • IGV track lines are based on WIG track lines. See the UCSC site for the track line syntax. The table below describes the track line specifiers that IGV supports. IGV includes a few options that are not part of the UCSC specification.
Specifier Value Description
name track label Track name (ignored when used in the IGV file format)
description center label Currently ignored
visibility full | dense | hide Currently ignored
color RRR,GGG,BBB Color for positive values in all tracks
altColor RRR,GGG,BBB Color for negative values in all tracks
priority N Currently ignored
autoScale on | off Currently ignored; all tracks autoscale unless an explicit data range is defined (e.g., by including the viewlimits specifier).
gridDefault on | off Currently ignored
maxHeightPixels max:default:min Default and min are supported; max is currently ignored
graphType bar | points | heatmap Scatter plot | heatmap. IGV only: The heatmap value is an IGV addition to the WIG specification.
midRange x:y Defines the neutral range for a three-color heatmap. Values in this range are rendered with the midColor value, which is white by default. Example: midRange=20:80 IGV only: This specifier is an IGV addition to the WIG specification.
midColor RRR,GGG,BBB Color to use in the "mid range" of a heatmap. Example: midColor=0.0.150 IGV only: This specifier is an IGV addition to the WIG specification.
viewLimits lower:upper Defines the data range
yLineMark real-value Currently ignored
yLineOnOff on | off Currently ignored
windowingFunction maximum | minimum | mean Function that summarizes the values in a window of data represented by one pixel
smoothingWindow off | [MATKC:2-16] Currently ignored
coords 0 | 1 Indicate whether the file uses 0 or 1 based coordinates.The UCSC specification for WIG files uses 1 based coordinates and for BED files uses 0 based coordinates. If data looks off by one, check for a possible 0 vs 1 based coordinate issue. IGV only: This specifier is an IGV addition to the WIG specification.

Launching IGV and Viewing your data

To launch IGV and view your Copy Number and/or LOH data:

  • Go to the IGV downloads page:, register or login as directed, then launch IGV.
  • Once IGV is launched, go to the File menu and choose to either upload your data from a local directory or from a URL.
  • By default, IGV displays the chromosome and genome that you were viewing when you last exited IGV. If you've never used it before the whole genome view for hg18 will be displayed.
  • To change to a different genome, click the genome drop-down list in the toolbar and select the genome on which your data was processed
  • To change the chromosome, click the chromosome drop-down box to select another chromosome or click on the chromosome number in the top panel.

For more information on navigating or displaying data in IGV please see the IGV User Guide.

Back to Blog