Flow Cytometry Data Preprocessing


There is significant demand for several data preprocessing steps in the analysis of flow cytometry data. These include data preview and transformations, conversion between spreadsheets (i.e., CSV files), editing keywords in the FCS data file, merging and sub-sampling data, and other tools that are included in the Flow Cytometry Data Preprocessing suite.
(Click for documents)

FCS data preview

  • PreviewFCS provides structural metadata, parameters, and descriptive statistics about an FCS data file. It can be used to get a quick overview of the contents of an FCS data file, providing a description that includes the number of events, number of parameters, and, for each parameter, the name, full name, minimum, maximum, median, mean, and 1st and 3rd quartiles of the distribution. The format of the output may be selected either as an HTML report or an XML document.

Data format conversions

  • CsvToFcs converts a Comma Separated Values (CSV) file to a Flow Cytometry Standard (FCS) file.
  • FcsToCsv converts a Flow Cytometry Standard (FCS) file to a Comma Separated Values (CSV) file.
  • ImmPortCSV2TXT converts a Comma Separated Values (CSV) file to a TXT file.

Data transformations

  • CompensateFCS compensates an FCS data file. Before analyzing your flow cytometry data, you usually need to make sure that these are properly compensated. This module can help you compensate your uncompensated data if you have a spillover matrix. The spillover matrix may be part of your FCS already, or it may be provided in an external text file.
  • FCSNormalization perfoms per-channel normalization of FCS data files. Between-sample variation in high-throughput flow cytometry data poses a significant challenge for analysis of large scale data sets, such as those derived from multi-center clinical trials. It is often hard to match biologically relevant cell populations across samples due to technical variation in sample acquisition and instrumentation differences. Thus normalization of data is a critical step prior to analysis, particularly in large-scale data sets from clinical trials, where group-specific differences may be subtle and patient-to-patient variation common.
  • ImmPortFCSConvLogicleTrans (FCSTrans) generates numeric data matrices into TXT files from binary FCS files. This module is a GenePattern implementation of FCSTrans method at  the Immunology Database and Analysis Portal (ImmPort). The FCSTrans method automatically identifies transformation methods and parameters that generate consistent results with commercial software. The user is not required to select methods or transformation parameters. The module aims to remove the preprocessing obstacle of FCS file conversion and data transformation, and provide a starting point to independent FCM data analysts, statisticians, and software developers.
  • LogicleTransformFCS performs a Logicle transformation of (selected) parameters in a list mode FCS data file. In most flow cytometry applications, fluorescence signals of interest can range down to essentially zero.  After fluorescence compensation, some cell populations will have low means and include events with negative data values. The Logicle display method provides more complete, appropriate, and readily interpretable representations of data that include populations with low-to-zero means (including distributions resulting from fluorescence compensation procedures) than can be produced using either logarithmic or linear displays.

FCS keywords manipulation

  • DeIdentifyFCS de-identifies FCS data files by removing the values of specified keywords. An FCS file may contain information about the sample (source) in the text segment of the file. In a clinical environment, before sharing FCS files, you may want to remove this information. This module lets you do so while leaving the rest of the file intact.
  • ExtractFCSKeywords extracts keywords and their values from an FCS TEXT segment. Specific metadata information (data about data) is usually stored in FCS data files as part if the TEXT segment in the form of keyword/value pairs. For example, this may include date and time of data acquisition, description of measured cells, comments, type and/or serial number of cytometer used, name of the operator, tube or plate identification, excitation wavelength and power for measured channels, types and voltages of detectors used, etc. This module can be used to extract and save the information in all keyword/value pair(s) either in a CSV file (default) or as an FCS text segment chunk in a text file.
  • SetFCSKeywords sets keyword value(s) in an FCS TEXT segment. It can be used to add additional or replace existing information stored in a keyword/value pair.

FCS dataset manipulation

  • AddFCSEventIndex adds an index to events in an FCS data file. The index is added as a new parameter and it may be used to identify events across different analytical steps later on.
  • AddFCSParameter adds one or more parameters (measurement types) and their values to an FCS data file. The additional parameter values are provided in a CSV spreadsheet.
  • AddNoiseToFCS adds noise to specified (or all) parameters in an FCS data file. Additionally, it can also remove saturated events, i.e., events with parameter values very close to the maximum range of particular scales. This functionality serves as a useful clustering preprocessing step since several clustering algorithms are looking at the variance among events in different directions. Saturated events as well as other "aligned" events may create a group (cluster) with a zero variance in a certain dimension, which causes issues for many model-based clustering algorithms. Using this module can help prevent these issues by removing saturated events and by adding a tiny bit of noise, which is not biologically relevant; however, it minimizes the chances of having a group of perfectly aligned events.
  • ExtractFCSDataset extracts one or more FCS datasets from an FCS data file. Typically, there is only a single dataset per FCS data file. However, in certain cases, there may be more than one data set in an FCS data file. Since most software does not support more than one dataset per FCS datafile, using this module is a workaround to analyze data files with multiple datasets with common tools.
  • ExtractFCSParameters extracts specified parameters (measurement types) from an FCS data file.
  • ImmPortColSelection extracts specified columns from a TXT file generated from an FCS data file to exclude unused fluorescence/scatter parameters/markers from later FCM data analysis.
  • MergeFCSDataFiles merges multiple FCS data files into a single FCS dataset. It allows you to combine (add) events from multiple FCS data files and save these in a single file (as long as events in all the source files contain the same parameters). This may be useful in certain cases; for example, in using clustering to identify all the possible subpopulations based on samples from several patients. Sub-sampling options are included.
  • RemoveSaturatedFCSEvents removes saturated events from an FCS data file. Events are considered saturated if their parameter values are on (or very close to, i.e., within 0.1%) the maximum range of particular scales. V Saturated events are usually created when the voltage on the instrument is set too high so that event values fall outside of the recordable scale. These events are commonly recorded with the maximum allowed value, which causes problems for subsequent analysis.