How Many Communities Are There?

D. Franco Saldaña, Yi Yu, Yang Feng

Research output: Contribution to journalArticlepeer-review

Abstract

Stochastic blockmodels and variants thereof are among the most widely used approaches to community detection for social networks and relational data. A stochastic blockmodel partitions the nodes of a network into disjoint sets, called communities. The approach is inherently related to clustering with mixture models; and raises a similar model selection problem for the number of communities. The Bayesian information criterion (BIC) is a popular solution, however, for stochastic blockmodels, the conditional independence assumption given the communities of the endpoints among different edges is usually violated in practice. In this regard, we propose composite likelihood BIC (CL-BIC) to select the number of communities, and we show it is robust against possible misspecifications in the underlying stochastic blockmodel assumptions. We derive the requisite methodology and illustrate the approach using both simulated and real data. Supplementary materials containing the relevant computer code are available online.

Original languageEnglish (US)
Pages (from-to)171-181
Number of pages11
JournalJournal of Computational and Graphical Statistics
Volume26
Issue number1
DOIs
StatePublished - Jan 2 2017

Keywords

  • Community detection
  • Composite likelihood
  • Degree-corrected stochastic blockmodel
  • Model selection
  • Spectral clustering
  • Stochastic blockmodel

ASJC Scopus subject areas

  • Statistics and Probability
  • Discrete Mathematics and Combinatorics
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'How Many Communities Are There?'. Together they form a unique fingerprint.

Cite this