LARGE-SCALE INFERENCE OF MULTIVARIATE REGRESSION FOR HEAVY-TAILED AND ASYMMETRIC DATA

Youngseok Song, Wen Zhou, Wen Xin Zhou

Research output: Contribution to journalArticlepeer-review

Abstract

Large-scale multivariate regression is a fundamental statistical tool with a wide range of applications. This study considers the problem of simultaneously testing a large number of general linear hypotheses, encompassing covariate-effect analysis, analysis of variance, and model comparisons. The challenge that accompanies a large number of tests is the ubiquitous presence of heavy-tailed and/or highly skewed measurement noise, which is the main reason for the failure of conventional least squares-based methods. For large-scale multivariate regression, we develop a set of robust inference methods to explore data features such as heavy tailedness and skewness, which are not visible to least squares methods. The new testing procedure is based on the data-adaptive Huber regression and a new covariance estimator of regression estimates. Under mild conditions, we show that our methods produce consistent estimates of the false discovery proportion. Extensive numerical experiments and an empirical study on quantitative linguistics demonstrate the advantage of the proposed method over many state-of-the-art methods when the data are generated from heavy-tailed and/or skewed distributions.

Original languageEnglish (US)
Pages (from-to)1831-1852
Number of pages22
JournalStatistica Sinica
Volume33
Issue number3
DOIs
StatePublished - Jul 2023

Keywords

  • General linear hypotheses
  • heavy-tailed and/or skewed regression errors
  • Huber loss
  • large-scale multiple testing
  • multivariate regression
  • quantitative linguistics

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'LARGE-SCALE INFERENCE OF MULTIVARIATE REGRESSION FOR HEAVY-TAILED AND ASYMMETRIC DATA'. Together they form a unique fingerprint.

Cite this