Feature-specific penalized latent class analysis for genomic data

E. Andrés Houseman, Brent A. Coull, Rebecca A. Betensky

Research output: Contribution to journalArticlepeer-review


Genomic data are often characterized by a moderate to large number of categorical variables observed for relatively few subjects. Some of the variables may be missing or noninformative. An example of such data is loss of heterozygosity (LOH), a dichotomous variable, observed on a moderate number of genetic markers. We first consider a latent class model where, conditional on unobserved membership in one of k classes, the variables are independent with probabilities determined by a regression model of low dimension q. Using a family of penalties including the ridge and LASSO, we extend this model to address higher-dimensional problems. Finally, we present an orthogonal map that transforms marker space to a space of "features" for which the constrained model has better predictive power. We demonstrate these methods on LOH data collected at 19 markers from 93 brain tumor patients. For this data set, the existing unpenalized latent class methodology does not produce estimates. Additionally, we show that posterior classes obtained from this method are associated with survival for these patients.

Original languageEnglish (US)
Pages (from-to)1062-1070
Number of pages9
Issue number4
StatePublished - Dec 2006


  • Constrained estimation
  • Loss of heterozygosity
  • Mixture models
  • Penalized likelihood
  • Ridge regression

ASJC Scopus subject areas

  • Statistics and Probability
  • General Biochemistry, Genetics and Molecular Biology
  • General Immunology and Microbiology
  • General Agricultural and Biological Sciences
  • Applied Mathematics


Dive into the research topics of 'Feature-specific penalized latent class analysis for genomic data'. Together they form a unique fingerprint.

Cite this