Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption

Esha Sarkar, Eduardo Chielle, Gamze Gursoy, Oleg Mazonka, Mark Gerstein, Michail Maniatakos

Research output: Contribution to journalArticlepeer-review

Abstract

The recent advances in genome sequencing technologies provide unprecedented opportunities to understand the relationship between human genetic variation and diseases. However, genotyping whole genomes from a large cohort of individuals is still cost prohibitive. Imputation methods to predict genotypes of missing genetic variants are widely used, especially for genome-wide association studies. Accurate genotype imputation requires complex statistical methods. Due to the data and computing-intensive nature of the problem, imputation is increasingly outsourced, raising serious privacy concerns. In this work, we investigate solutions for fast, scalable, and accurate privacy-preserving genotype imputation using Machine Learning (ML) and a standardized homomorphic encryption scheme, Paillier cryptosystem. ML-based privacy-preserving inference has been largely optimized for computation-heavy non-linear functions in a single-output multi-class classification setting. However, having a large number of multi-class outputs per genome per individual calls for further optimizations and/or approximations specific to this application. Here we explore the effectiveness of linear models for genotype imputation to convert them to privacy-preserving equivalents using standardized homomorphic encryption schemes. Our results show that performance of our privacy-preserving genotype imputation method is equivalent to the state-of-the-art plaintext solutions, achieving up to 99% micro area under curve score, even on real-world large-scale datasets up to 80,000 targets.

Original languageEnglish (US)
Article number9466098
Pages (from-to)93097-93110
Number of pages14
JournalIEEE Access
Volume9
DOIs
StatePublished - 2021

Keywords

  • Genotype imputation
  • machine learning
  • privacy-preserving computation

ASJC Scopus subject areas

  • Computer Science(all)
  • Materials Science(all)
  • Engineering(all)

Fingerprint

Dive into the research topics of 'Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption'. Together they form a unique fingerprint.

Cite this