How to Borrow Information From Unlinked Data? A Relative Density Approach for Predicting Unobserved Distributions

Siwei Cheng

    Research output: Contribution to journalArticlepeer-review

    Abstract

    One of the most important developments in the current era of social sciences is the growing availability and diversity of data, big and small. Social scientists increasingly combine information from multiple data sets in their research. While conducting statistical analyses with linked data is relatively straightforward, borrowing information across unlinked data can be much more challenging due to the absence of unit-to-unit linkages. This article proposes a new methodological approach for borrowing information across unlinked surveys to predict unobserved distributions. The gist of the proposed approach lies in the idea of using the relative density between the observed and unobserved distributions in the reference data to characterize the difference between the two distributions and borrow that information to the base data. Relying on the assumption that the relative density between the observed and unobserved distributions is similar between data sets, the proposed relative density approach has the key advantage of allowing the researcher to borrow information about the shape of the distribution, rather than a few summary statistics. The approach also comes with a method for incorporating and quantifying the uncertainty in its output. We illustrate the formulation of this approach, demonstrate with simulation examples, and finally apply it to address the problem of employment selection in wage inequality research.

    Original languageEnglish (US)
    JournalSociological Methods and Research
    DOIs
    StateAccepted/In press - 2020

    Keywords

    • borrowing strength
    • relative distribution methods
    • sample selection bias
    • wage inequality

    ASJC Scopus subject areas

    • Social Sciences (miscellaneous)
    • Sociology and Political Science

    Fingerprint Dive into the research topics of 'How to Borrow Information From Unlinked Data? A Relative Density Approach for Predicting Unobserved Distributions'. Together they form a unique fingerprint.

    Cite this