A new approach to the estimation of inter-variable correlation

Marc Sobel, Bud Mishra

Research output: Contribution to journalArticlepeer-review


The use of different measures of similarity between observed vectors for the purposes of classifying or clustering them has been expanding dramatically in recent years. One result of this expansion has been the use of many new similarity measures, designed for the purpose of satisfying various criteria. A noteworthy application involves estimating the relationships between genes using microarray experimental data. We consider the class of 'correlation-type' similarity measures. The use of these new measures of similarity suggest that the whole problem needs to be formulated in statistical terms to clarify their relative benefits. Pursuant to this need, we define, for each given observed vector, a baseline representing the 'true' value common to each of the component observations. These 'true' values are taken to be parameters. We define the 'true correlation' between each two observed vectors as the average (over the distribution of the observations for given baseline parameters) of Pearson's correlation with sample means replaced by the corresponding baseline parameters. Estimators of this true correlation are assessed using their mean squared error (MSE). Proper Bayes estimators of this true correlation, being based on the predictive posterior distribution of the data, are both difficult to calculate/analyze and highly non robust. By constrast, empirical Bayes estimators are: (i) close to their Bayesian counterparts; (ii) easy to analyze; and (iii) strongly robust. For these reasons, we employ empirical Bayes estimators of correlation in place of their Bayesian counterparts. We show how to construct two different kinds of simultaneous Bayes correlation estimators: the first assumes no apriori correlation between baseline parameters; the second assumes a common unknown correlation between them. Estimators of the latter type frequently have significantly smaller MSE than those of the former type which, in turn, frequently have significantly smaller MSE than their Pearson estimator counterparts. For purposes of illustrating our results, we examine the problem of inferring the relationships between gene expression level vectors, in the context of observing microarray experimental data.

Original languageEnglish (US)
Pages (from-to)2315-2330
Number of pages16
JournalCommunications in Statistics - Theory and Methods
Issue number15
StatePublished - Sep 2008


  • Admissibility
  • Bayes estimation
  • Bioinformatics
  • Correlation
  • Empirical Bayes

ASJC Scopus subject areas

  • Statistics and Probability


Dive into the research topics of 'A new approach to the estimation of inter-variable correlation'. Together they form a unique fingerprint.

Cite this