TY - JOUR
T1 - Fast elastic peak detection for mass spectrometry data mining
AU - Zhang, Xin
AU - Shasha, Dennis E.
AU - Song, Yang
AU - Wang, Jason T L
N1 - Funding Information:
This work was supported in part by NIH Grant 2R01GM032877-25A1, and US NSF Grants DBI-0445666, MCB-0929339, and IOS-0922738. The authors thank the anonymous reviewers for their constructive suggestions, which helped improve the presentation and content of this paper. They also thank Dr. Tong Liu of the Center for Advanced Proteomics Research at UMDNJ-New Jersey Medical School for useful conversations and assistance on LC-MS data collection and analysis.
PY - 2012
Y1 - 2012
N2 - We study a data mining problem concerning the elastic peak detection in 2D liquid chromatography-mass spectrometry (LC-MS) data. These data can be modeled as time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity values in a sliding time window exceeds a user-determined threshold. The elastic peak detection problem is to locate all peaks across multiple window sizes of interest in the data set. We propose a new data structure, called a Shifted Aggregation Tree or AggTree for short, and use the data structure to find the different peaks. Our method, called PeakID, solves the elastic peak detection problem in 2D LC-MS data yielding neither false positives nor false negatives. The method works by first constructing an AggTree in a bottom-up manner from the given data set, and then searching the AggTree for the peaks in a top-down manner. We describe a state-space algorithm for finding the topology and structure of an efficient AggTree to be used by PeakID. Our experimental results demonstrate the superiority of the proposed method over other methods on both synthetic and real-world data.
AB - We study a data mining problem concerning the elastic peak detection in 2D liquid chromatography-mass spectrometry (LC-MS) data. These data can be modeled as time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity values in a sliding time window exceeds a user-determined threshold. The elastic peak detection problem is to locate all peaks across multiple window sizes of interest in the data set. We propose a new data structure, called a Shifted Aggregation Tree or AggTree for short, and use the data structure to find the different peaks. Our method, called PeakID, solves the elastic peak detection problem in 2D LC-MS data yielding neither false positives nor false negatives. The method works by first constructing an AggTree in a bottom-up manner from the given data set, and then searching the AggTree for the peaks in a top-down manner. We describe a state-space algorithm for finding the topology and structure of an efficient AggTree to be used by PeakID. Our experimental results demonstrate the superiority of the proposed method over other methods on both synthetic and real-world data.
KW - Knowledge discovery from LC-MS data
KW - algorithms and data structures
KW - bioinformatics
KW - computational proteomics
KW - time series data mining
UR - http://www.scopus.com/inward/record.url?scp=84863259465&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863259465&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2010.238
DO - 10.1109/TKDE.2010.238
M3 - Article
AN - SCOPUS:84863259465
SN - 1041-4347
VL - 24
SP - 634
EP - 648
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 4
M1 - 5645627
ER -