Online Simulation

And More

Top Tags

  1. OMIC analysis
  2. colorectal cancer
  3. biomarker discovery
  4. health services research
  5. screening
  6. mass spectrometry
  7. proteomics
  8. proteome discovery pipeline
  9. statistical models
  10. population-based models
  11. sample acquisition
  12. metabolomics
  13. visual analytics
  14. global proteomics
  15. lipidomics
  16. cceHUB
  17. peptide synthesis
  18. cancer care engineering
  19. diet
  20. peptides
  21. XCT PLUS
  22. tool:workspace
  23. alignment
  24. multi-agent based modeling
  25. cancer care systems

Other

Support

Trouble Report

For immediate assistance browse through our support center. You can find answers to many questions in just a few minutes.

If still experiencing problems, send us a report.

required
Why the math question?

The Purdue Proteome Discovery Pipeline

Posted 16 Jun, 2009 in Series

Contributor(s) Ann Christine Catlin
Rosen Center for Advanced Computing

George Howlett
Purdue University
Abstract

The Proteome Discovery Pipeline (PDP) is a data analysis system for mass spectrometry-based proteomics which was developed through the collaborative research efforts of the Bindley Biosciences Center, the Cyber Center and the eEnterprise Center at Purdue University. The PDP offers a data analysis pipeline, with components for spectrum visualization, deconvolution, alignment, normalization, statistical significance tests and pattern recognition for LC-MS data. All of the PDP components have been integrated into the cceHUB as tools, and can be run in the browser using the cceHUB interface and results visualization.

image
Mass spectrometry data for pipeline analysis can be accessed from the cceHUB repository, where a number of OMIC studies -- both public and private -- have been stored. Users can also submit their own OMIC datasets for analysis by the cceHUB pipeline tools.

Support for Instrument-generated Mass Spectrometry Data Formats

The first tool in the cceHUB pipeline performs spectrum deconvolution on instrument-generated liquid chromatography mass spectrometry (LC-MS) datasets. While mzXML format is widely accepted as a standardized format for representing LC-MS data, not all mass spectrometer raw data can be converted to mzXML. The Deconvolution Tool accommodates a large number of other LC-MS data formats, including NetCDF, AgilentCsv, mzData and ThermalTxt. Examples of instruments, instrument-generated formats,format converters and input formats supported by the deconvolution tool are:

  • mzXML format files from the XCT Plus Ion Trap, converted from the generated .D format to mzXML by Compass eXport software.
  • CDF format files from the Micromass Q-TOF, converted from the generated .RAW format to CDF by Databridge software.
  • mzData format files from the LC/MSD TOF, converted from the generated .d format by Qualitative Analysis Mass Hunter software.


Using the Proteome Discovery Pipeline Tools in cceHUB

Spectrum Deconvolution Tool. imageSpectral deconvolution differentiates analyte signals from contaminants or instrumental noise, and reduces data dimensionality to benefit downstream statistical analysis. Input to this tool is the following:
-- LC-MS dataset: Users can select an LC-MS dataset from studies stored in the cceHUB repository, or they can load in an LC-MS dataset from their own collection.
-- Deconvolution parameters: Users are prompted for parameters which control deconvolution processing.

The Deconvolution Tool generates output graphs for "peak intensity vs. retention time", "mass charge vs. retension time" and other plots related to deconvolution processing. The tool also generates a "Deconvolution results" dataset for each run; this data can be viewed and then downloaded as a file. Users who intend to run a collection of deconvoluted datasets through the Peak Alignment Tool -- the next step in the cceHUB pipeline -- should save each "Deconvolution Results" dataset produced by deconvolution processing as a ".dlt" file (i.e., a file with a dlt extension) to a location where it will be accessible for alignment processing ... such as a folder in their cceHUB home directory. The ".dlt" files saved from Deconvolution Tool runs are the input to the Peak Alignment Tool.

See the cceHUB Deconvolution Tool resource page to launch the tool or investigate this pipeline resource. Tool information includes a Getting Started document with case examples.


Peak Alignment Tool. imagePeak alignment addresses retention time shift by recognizing and aligning significant peaks; it then uses discrete deconvolution to align overlapped peaks. Input to this tool is the following:
-- Collection of LC-MS datasets, each of which was output by the Deconvolution Tool: Users can select a collection of LC-MS datasets from studies stored in the cceHUB repository, or they can load in their own collection of LC-MS datasets. Each of the input LC-MS datasets to the alignment tool must be the result of Deconvolution Tool processing.
-- Alignment parameters: Users are prompted for parameters which control alignment processing.

The Peak Alignment Tool generates an output graph of the aligned "peak intensities vs. retention time". All aligned datasets appear on the graph, and users can view and compare subsets of the collection of aligned datasets. The tool also generates several datasets, including an alignment table (known as the ".org" file), peak quality (.qcd ), mass charge variation (.vmz ), etc -- all of which can be viewed and then downloaded. In particular, the "alignment table" dataset (.org file) can be viewed and downloaded. Users who intend to run the Normalization Tool -- the next step in the cceHUB pipeline -- should save the alignment table resulting from alignment processing as an ".org" file (i.e., a file with an org extension) to a location where it will be accessible for normalization processing ... such as a folder in their cceHUB home directory.The ".org" file saved from the Peak Alignment Tool run is the input to the Normalization Tool.

See the cceHUB Alignment Tool resource page to launch the tool or investigate this pipeline resource.


Normalization Tool. image Normalization attempts to quantitatively filter overall peak intensity variations due to experiment errors such as systematic variable injection volumes loaded onto LC-MS. Input to this tool is the following:
-- LC-MS dataset which was output by the Peak Alignment Tool: Users can select an LC-MS dataset from studies stored in the cceHUB repository, or they can load in their own LC-MS datasets. The input LC-MS dataset must be the result of Peak Alignment Tool processing.
-- Normalization parameters: Users are prompted for parameters which control normalization processing.

The Normalization Tool generates several images and datasets as output for normalization processing. All datasets and images can be viewed and downloaded. In particular, the "Normalized Data" output (.txt file) can be viewed and downloaded. Users who intend to run the Significance Tests Tool -- the next step in the cceHUB pipeline -- should save the normalized data resulting from normalization processing as a ".txt" file (i.e., a file with a txt extension) to a location where it will be accessible for significance test processing ... such as a folder in their cceHUB home directory.The normalized data ".txt" file saved from the Normalization Tool run is the input to the Significance Test Tool and the Pattern Recognition Tool.

See the cceHUB Normalization Tool resource page to launch the tool or investigate this pipeline resource.


Significance Test Tool. image Several statistical significance tests are employed to identify peptide or metabolite peaks that either make significant contributions to the molecular profile of a sample or distinguish a group of samples from others. Input to this tool is the following:
-- LC-MS dataset which was output by the Normalization Tool: Users can select an LC-MS dataset from studies stored in the cceHUB repository, or they can load in their own LC-MS datasets. The input LC-MS dataset must be the result of Normalization Tool processing.
-- Significance Test parameters: Users are prompted for parameters which control significance test processing.

The Significance Test Tool generates several images and datasets as output for significance processing. All datasets and images can be viewed and downloaded.

See the cceHUB Significance Test Tool resource page to launch the tool or investigate this pipeline resource.



Pattern Recognition Tool. image The Pattern Recognition Tool provides principal component analysis (PCA), linear discriminate analysis (LDA), and canonical discriminate analysis (CDA) for data clustering. Input to this tool is the following:
-- LC-MS dataset which was output by the Normalization Tool: Users can select an LC-MS dataset from studies stored in the cceHUB repository, or they can load in their own LC-MS datasets. The input LC-MS dataset must be the result of Normalization Tool processing.
-- Pattern Recognition parameters: Users are prompted for parameters which control pattern recognition processing.

The Pattern Recognition Tool generates several images as output. All images can be viewed and downloaded.

See the cceHUB Pattern Recognition Tool resource page to launch the tool or investigate this pipeline resource.



Data Input to cceHUB Tools

The cceHUB shared data repository contains a large number of instrument generated mass spectrometry datasets from various studies carried out at Purdue University. LC-MS data stored at the repository can be accessed by users through the cceHUB pipeline tools. Users can load in repository datasets and run the analysis tools or browse the results. Some LC-MS datasets stored at the repository are private and can be accessed only by members of the private group. Users are also able to load in their own dataset collections through the cceHUB tool interfaces.
credits

The Purdue Discovery Pipeline was created by the Bindley Biosciences Center under the direction of Charles Buck with Xiang Zhang (now at the Department of Chemistry, Center for Regulatory and Environmental Analytical Metabolomics, University of Kentucky), Catherine P. Riley, Erik S. Gough and Jing He (Bindley Biosciences Center), Shrinivas S. Jandhyala, Brad Kennedy, and Mourad Ouzzani (Cyber Center), and Seza Orcus (eEnterprise Center).

The integration of Purdue Discovery Pipeline models as cceHUB tools and the contribution of test datasets to the cceHUB respository are part of a collaborative effort with Jiri Adamec, Amber Jannasch and Catherine P. Riley of the Bindley Biosciences Center and George Howlett of the Rosen Center for Advanced Computing.

references

Gough, E., Oh, C.; He, J.; Riley, C.; Buck, C.; and Zhang, X. Proteome discovery pipeline for mass spectrometry-based proteomics. Click to access the paper online.

Zhang, X.; Hines, W.; Adamec, J.; Asara, J.; Naylor, S.; and Regnier, F. E. An automated method for the analysis of stable isotope labeling data for proteomics. J. Am. Soc. Mass Spectrom. 2005, 16, 1181-1191.

Zhang, X; Asara, J.; Adamec, J.; Ouzzani, M.; and Elmagarmid, A. Data pre-processing in liquid chromatography–mass spectrometry-based proteomics. Bioinformatics [1367-4803]. 2005 vol:21 iss:21 pg:4054. Link to full text from Oxford University Press Journals.

Cite this work

If you reference this work in a publication, please cite as follows:

  • Ann Christine Catlin; George Howlett (2009), "The Purdue Proteome Discovery Pipeline," http://ccehub.org/resources/263.

    BibTex | EndNote

Tags
  1. alignment
  2. deconvolution
  3. mass spectrometry
  4. normalization
  5. proteome discovery pipeline
  6. proteomics

In This Series

  1. Pattern Recognition for Normalized LC-MS Data

    01 Jul. 2009 | Tools | Contributor(s): Ann Christine Catlin, George Howlett

    This tool provides principal component analysis (PCA), linear discriminate analysis (LDA), and canonical discriminate analysis (CDA) for data clustering on aligned, normalized LC-MS datasets.

  2. Peak Alignment of LC-MS Data

    24 Jun. 2009 | Tools | Contributor(s): Ann Christine Catlin, George Howlett

    Peak alignment addresses retention time shift by recongnizing and aligning significant peaks; it then uses discrete deconvolution to align overlapped peaks.

  3. Significance Testing of Normalized LC-MS Data

    23 Jun. 2009 | Tools | Contributor(s): Ann Christine Catlin

    Several statistical significance tests are employed to identify peptide or metabolite peaks that either make significant contributions to the molecular profile of a sample or distinguish a group of samples from others.

  4. Normalization of Aligned LC-MS Data

    18 Jun. 2009 | Tools | Contributor(s): Ann Christine Catlin, George Howlett

    Normalization attempts to quantitatively filter overall peak intensity variations due to experiment errors such as systematic variable injection volumes loaded onto LC-MS.

  5. Spectrum Deconvolution of LC-MS Data

    03 Dec. 2008 | Tools | Contributor(s): Ann Christine Catlin, George Howlett

    Spectral deconvolution differentiates analyte signals from contaminants or instrumental noise, and reduces data dimensionality to benefit downstream statistical analysis.