Abstract
Background: In human pedigree data age at disease occurrence frequently is missing and is imputed using various methods. However, little is known about the performance of these methods when applied to families. In particular, there is little information about the level of agreement between imputed and actual values of temporal data and their effects on inferences. Methods: We performed two evaluations of five imputation methods used to generate complete data for repositories to be shared by many investigators. Two of the methods are mean substitution methods, two are regression methods and one is a multiple imputation method based on one of the regression methods. To evaluate the methods, we randomly deleted the years of disease diagnosis of some men in a sample of pedigrees ascertained as part of a prostate cancer study. In the first evaluation, we used the five methods to impute the missing diagnosis years and evaluated agreement between imputed and actual values. In the second evaluation, we compared agreement between regression coefficients estimated using imputed diagnosis years with those estimated using the actual years. Results/Conclusions: For both evaluations, we found optimal or near-optimal performance from a regression method that imputes a man's diagnosis year based on the year of birth and year of last observation of all affected men with complete data. The multiple imputation analogue of this method also performed well.
Original language | English (US) |
---|---|
Pages (from-to) | 168-174 |
Number of pages | 7 |
Journal | Human Heredity |
Volume | 63 |
Issue number | 3-4 |
DOIs | |
State | Published - Mar 2007 |
Keywords
- Cancer
- Disease onset
- Imputation methods
- Missing data
ASJC Scopus subject areas
- Genetics
- Genetics(clinical)