TY - GEN
T1 - Lattice based clustering of temporal gene-expression matrices
AU - Huang, Yang
AU - Farach-Colton, Martin
PY - 2007
Y1 - 2007
N2 - Individuals show different cell classes when they are in the different stages of a disease, have different disease subtypes, or have different response to a treatment or environmental stress. It is important to identify the individuals' cell classes, for example, to decide which disease subtype they have or how they will respond to a certain drug. In a temporal gene-expression matrix (TGEM) each row represents a time series of expression values of a gene. TGEMs of the same cell class should show similar gene-expression patterns. However, given a set of TGEMs, it can be difficult to classify matrices by cell classes. In this paper, we develop a tool called LABSTER (LAttice Based cluSTERing) to cluster gene-expression matrices by cell classes. Rather than treating each row or column as a vector, we create a Galois lattice for each matrix, which yields a natural distance function between gene expression matrices. Finally, we cluster based on these distances. A key advantage of our method is that it effectively handles missing values, which is a problem in gene expression data. We evaluated LABSTER on both simulation data and clinical data. The results show that LABSTER has better clustering performance than several widely used vector-based clustering methods. A bootstrapping procedure is also proposed to further improve the performance of LABSTER. LABSTER has the poteiitial to be used on matrices containing data other than gene expression.
AB - Individuals show different cell classes when they are in the different stages of a disease, have different disease subtypes, or have different response to a treatment or environmental stress. It is important to identify the individuals' cell classes, for example, to decide which disease subtype they have or how they will respond to a certain drug. In a temporal gene-expression matrix (TGEM) each row represents a time series of expression values of a gene. TGEMs of the same cell class should show similar gene-expression patterns. However, given a set of TGEMs, it can be difficult to classify matrices by cell classes. In this paper, we develop a tool called LABSTER (LAttice Based cluSTERing) to cluster gene-expression matrices by cell classes. Rather than treating each row or column as a vector, we create a Galois lattice for each matrix, which yields a natural distance function between gene expression matrices. Finally, we cluster based on these distances. A key advantage of our method is that it effectively handles missing values, which is a problem in gene expression data. We evaluated LABSTER on both simulation data and clinical data. The results show that LABSTER has better clustering performance than several widely used vector-based clustering methods. A bootstrapping procedure is also proposed to further improve the performance of LABSTER. LABSTER has the poteiitial to be used on matrices containing data other than gene expression.
KW - Clustering
KW - Galois lattice
KW - Gene expression
KW - Matrix distance
UR - http://www.scopus.com/inward/record.url?scp=66349138902&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=66349138902&partnerID=8YFLogxK
U2 - 10.1137/1.9781611972771.36
DO - 10.1137/1.9781611972771.36
M3 - Conference contribution
AN - SCOPUS:66349138902
SN - 9780898716306
T3 - Proceedings of the 7th SIAM International Conference on Data Mining
SP - 398
EP - 409
BT - Proceedings of the 7th SIAM International Conference on Data Mining
PB - Society for Industrial and Applied Mathematics Publications
T2 - 7th SIAM International Conference on Data Mining
Y2 - 26 April 2007 through 28 April 2007
ER -