The Purdue Proteome Discovery Pipeline
Posted 16 Jun, 2009 in Series
| Contributor(s) | Ann Christine Catlin Rosen Center for Advanced Computing George Howlett Purdue University |
|---|---|
| Abstract | The Proteome Discovery Pipeline (PDP) is a data analysis system for mass spectrometry-based proteomics which was developed through the collaborative research efforts of the Bindley Biosciences Center, the Cyber Center and the eEnterprise Center at Purdue University. The PDP offers a data analysis pipeline, with components for spectrum visualization, deconvolution, alignment, normalization, statistical significance tests and pattern recognition for LC-MS data. All of the PDP components have been integrated into the cceHUB as tools, and can be run in the browser using the cceHUB interface and results visualization.
Using the Proteome Discovery Pipeline Tools in cceHUB Spectrum Deconvolution Tool. Spectral deconvolution differentiates analyte signals from contaminants or instrumental noise, and reduces data dimensionality to benefit downstream statistical analysis. Input to this tool is the following:-- LC-MS dataset: Users can select an LC-MS dataset from studies stored in the cceHUB repository, or they can load in an LC-MS dataset from their own collection. -- Deconvolution parameters: Users are prompted for parameters which control deconvolution processing. The Deconvolution Tool generates output graphs for "peak intensity vs. retention time", "mass charge vs. retension time" and other plots related to deconvolution processing. The tool also generates a "Deconvolution results" dataset for each run; this data can be viewed and then downloaded as a file. Users who intend to run a collection of deconvoluted datasets through the Peak Alignment Tool -- the next step in the cceHUB pipeline -- should save each "Deconvolution Results" dataset produced by deconvolution processing as a ".dlt" file (i.e., a file with a dlt extension) to a location where it will be accessible for alignment processing ... such as a folder in their cceHUB home directory. The ".dlt" files saved from Deconvolution Tool runs are the input to the Peak Alignment Tool. See the cceHUB Deconvolution Tool resource page to launch the tool or investigate this pipeline resource. Tool information includes a Getting Started document with case examples. Peak Alignment Tool. Peak alignment addresses retention time shift by recognizing and aligning significant peaks; it then uses discrete deconvolution to align overlapped peaks. Input to this tool is the following:-- Collection of LC-MS datasets, each of which was output by the Deconvolution Tool: Users can select a collection of LC-MS datasets from studies stored in the cceHUB repository, or they can load in their own collection of LC-MS datasets. Each of the input LC-MS datasets to the alignment tool must be the result of Deconvolution Tool processing. -- Alignment parameters: Users are prompted for parameters which control alignment processing. The Peak Alignment Tool generates an output graph of the aligned "peak intensities vs. retention time". All aligned datasets appear on the graph, and users can view and compare subsets of the collection of aligned datasets. The tool also generates several datasets, including an alignment table (known as the ".org" file), peak quality (.qcd ), mass charge variation (.vmz ), etc -- all of which can be viewed and then downloaded. In particular, the "alignment table" dataset (.org file) can be viewed and downloaded. Users who intend to run the Normalization Tool -- the next step in the cceHUB pipeline -- should save the alignment table resulting from alignment processing as an ".org" file (i.e., a file with an org extension) to a location where it will be accessible for normalization processing ... such as a folder in their cceHUB home directory.The ".org" file saved from the Peak Alignment Tool run is the input to the Normalization Tool. See the cceHUB Alignment Tool resource page to launch the tool or investigate this pipeline resource. Normalization Tool. Normalization attempts to quantitatively filter overall peak intensity variations due to experiment errors such as systematic variable injection volumes loaded onto LC-MS. Input to this tool is the following:-- LC-MS dataset which was output by the Peak Alignment Tool: Users can select an LC-MS dataset from studies stored in the cceHUB repository, or they can load in their own LC-MS datasets. The input LC-MS dataset must be the result of Peak Alignment Tool processing. -- Normalization parameters: Users are prompted for parameters which control normalization processing. The Normalization Tool generates several images and datasets as output for normalization processing. All datasets and images can be viewed and downloaded. In particular, the "Normalized Data" output (.txt file) can be viewed and downloaded. Users who intend to run the Significance Tests Tool -- the next step in the cceHUB pipeline -- should save the normalized data resulting from normalization processing as a ".txt" file (i.e., a file with a txt extension) to a location where it will be accessible for significance test processing ... such as a folder in their cceHUB home directory.The normalized data ".txt" file saved from the Normalization Tool run is the input to the Significance Test Tool and the Pattern Recognition Tool. See the cceHUB Normalization Tool resource page to launch the tool or investigate this pipeline resource. Significance Test Tool. Several statistical significance tests are employed to identify peptide or metabolite peaks that either make significant contributions to the molecular profile of a sample or distinguish a group of samples from others. Input to this tool is the following:-- LC-MS dataset which was output by the Normalization Tool: Users can select an LC-MS dataset from studies stored in the cceHUB repository, or they can load in their own LC-MS datasets. The input LC-MS dataset must be the result of Normalization Tool processing. -- Significance Test parameters: Users are prompted for parameters which control significance test processing. The Significance Test Tool generates several images and datasets as output for significance processing. All datasets and images can be viewed and downloaded. See the cceHUB Significance Test Tool resource page to launch the tool or investigate this pipeline resource. Pattern Recognition Tool. The Pattern Recognition Tool provides principal component analysis (PCA), linear discriminate analysis (LDA), and canonical discriminate analysis (CDA) for data clustering. Input to this tool is the following:-- LC-MS dataset which was output by the Normalization Tool: Users can select an LC-MS dataset from studies stored in the cceHUB repository, or they can load in their own LC-MS datasets. The input LC-MS dataset must be the result of Normalization Tool processing. -- Pattern Recognition parameters: Users are prompted for parameters which control pattern recognition processing. The Pattern Recognition Tool generates several images as output. All images can be viewed and downloaded. See the cceHUB Pattern Recognition Tool resource page to launch the tool or investigate this pipeline resource. Data Input to cceHUB Tools The cceHUB shared data repository contains a large number of instrument generated mass spectrometry datasets from various studies carried out at Purdue University. LC-MS data stored at the repository can be accessed by users through the cceHUB pipeline tools. Users can load in repository datasets and run the analysis tools or browse the results. Some LC-MS datasets stored at the repository are private and can be accessed only by members of the private group. Users are also able to load in their own dataset collections through the cceHUB tool interfaces. |
| credits | The Purdue Discovery Pipeline was created by the Bindley Biosciences Center under the direction of Charles Buck with Xiang Zhang (now at the Department of Chemistry, Center for Regulatory and Environmental Analytical Metabolomics, University of Kentucky), Catherine P. Riley, Erik S. Gough and Jing He (Bindley Biosciences Center), Shrinivas S. Jandhyala, Brad Kennedy, and Mourad Ouzzani (Cyber Center), and Seza Orcus (eEnterprise Center). The integration of Purdue Discovery Pipeline models as cceHUB tools and the contribution of test datasets to the cceHUB respository are part of a collaborative effort with Jiri Adamec, Amber Jannasch and Catherine P. Riley of the Bindley Biosciences Center and George Howlett of the Rosen Center for Advanced Computing. |
| references | Gough, E., Oh, C.; He, J.; Riley, C.; Buck, C.; and Zhang, X. Proteome discovery pipeline for mass spectrometry-based proteomics. Click to access the paper online. Zhang, X.; Hines, W.; Adamec, J.; Asara, J.; Naylor, S.; and Regnier, F. E. An automated method for the analysis of stable isotope labeling data for proteomics. J. Am. Soc. Mass Spectrom. 2005, 16, 1181-1191. Zhang, X; Asara, J.; Adamec, J.; Ouzzani, M.; and Elmagarmid, A. Data pre-processing in liquid chromatography–mass spectrometry-based proteomics. Bioinformatics [1367-4803]. 2005 vol:21 iss:21 pg:4054. Link to full text from Oxford University Press Journals. |
| Cite this work | If you reference this work in a publication, please cite as follows: |
| Tags |

Spectral deconvolution differentiates analyte signals from contaminants or instrumental noise, and reduces data dimensionality to benefit downstream statistical analysis. Input to this tool is the following:
Peak alignment addresses retention time shift by recognizing and aligning significant peaks; it then uses discrete deconvolution to align overlapped peaks. Input to this tool is the following:
Normalization attempts to quantitatively filter overall peak intensity variations due to experiment errors such as systematic variable injection volumes loaded onto LC-MS. Input to this tool is the following:
Several statistical significance tests are employed to identify peptide or metabolite peaks that either make significant contributions to the molecular profile of a sample or distinguish a group of samples from others. Input to this tool is the following:
The Pattern Recognition Tool provides principal component analysis (PCA), linear discriminate analysis (LDA), and canonical discriminate analysis (CDA) for data clustering. Input to this tool is the following: