Human leucocyte antigen (HLA) genes play a central role in response to pathogens and in autoimmunity. Research to understand the effects of HLA genes on health has been limited because HLA genotyping protocols are labour intensive and expensive. Recently, algorithms to impute HLA genotype data using genome-wide association study (GWAS) data have been published. However, imputation accuracy for most of these algorithms was based primarily on training data sets of European ancestry individuals. We considered performance of two HLA-dedicated imputation algorithms – SNP2HLA and HIBAG – in a multiracial population of n = 1587 women with HLA genotyping data by gold standard methods. We first compared accuracy – defined as the percentage of correctly predicted alleles – of HLA-B and HLA-C imputation using SNP2HLA and HIBAG using a breakdown of the data set into an 80% training group and a 20% testing group. Estimates of accuracy for HIBAG were either the same or better than those for SNP2HLA. We then conducted a more thorough test of HIBAG imputation accuracy using five independent 10-fold cross-validation procedures with delineation of ancestry groups using ancestry informative markers. Overall accuracy for HIBAG was 89%. Accuracy by HLA gene was 93% for HLA-A, 84% for HLA-B, 94% for HLA-C, 83% for HLA-DQA1, 91% for HLA-DQB1 and 88% for HLA-DRB1. Accuracy was highest in the African ancestry group (the largest group) and lowest in the Hispanic group (the smallest group). Despite suboptimal imputation accuracy for some HLA gene/ancestry group combinations, the HIBAG algorithm has the advantage of providing posterior estimates of accuracy which enable the investigator to analyse subsets of the population with high predicted (e.g. >95%) imputation accuracy.
ASJC Scopus subject areas
- Molecular Biology