A conditional gradient approach for nonparametric estimation of mixing distributions

Srikanth Jagabathula, Lakshminarayanan Subramanian, Ashwin Venkataraman

Research output: Contribution to journalArticlepeer-review


Mixture models are versatile tools that are used extensively in many fields, including operations, marketing, and econometrics. The main challenge in estimating mixture models is that the mixing distribution is often unknown, and imposing a priori parametric assumptions can lead to model misspecification issues. In this paper, we propose a new methodology for nonparametric estimation of the mixing distribution of a mixture of logit models. We formulate the likelihood-based estimation problem as a constrained convex program and apply the conditional gradient (also known as Frank-Wolfe) algorithm to solve this convex program. We show that our method iteratively generates the support of the mixing distribution and the mixing proportions. Theoretically, we establish the sublinear convergence rate of our estimator and characterize the structure of the recovered mixing distribution. Empirically, we test our approach on real-world datasets. We show that it outperforms the standard expectation-maximization (EM) benchmark on speed (16 times faster), in-sample fit (up to 24% reduction in the log-likelihood loss), and predictive (average 28% reduction in standard error metrics) and decision accuracies (extracts around 23% more revenue). On synthetic data, we show that our estimator is robust to different ground-truth mixing distributions and can also account for endogeneity.

Original languageEnglish (US)
Pages (from-to)3635-3656
Number of pages22
JournalManagement Science
Issue number8
StatePublished - Aug 2020


  • Consideration sets
  • Convex optimization
  • Mixture of logit
  • Nonparametric estimation

ASJC Scopus subject areas

  • Strategy and Management
  • Management Science and Operations Research


Dive into the research topics of 'A conditional gradient approach for nonparametric estimation of mixing distributions'. Together they form a unique fingerprint.

Cite this