GraphClust: A method for clustering database of graphs

Diego Reforgiato, Rodrigo Gutierrez, Dennis Shasha

Research output: Contribution to journalArticlepeer-review

Abstract

Any application that represents data as sets of graphs may benefit from the discovery of relationships among those graphs. To do this in an unsupervised fashion requires the ability to find graphs that are similar to one another. That is the purpose of GraphClust. The GraphClust algorithm proceeds in three phases, often building on other tools: (1) it finds highly connected substructures in each graph; (2) it uses those substructures to represent each graph as a feature vector; and (3) it clusters these feature vectors using a standard distance measure. We validate the cluster quality by using the Silhouette method. In addition to clustering graphs, GraphClust uses SVD decomposition to find frequently co-occurring connected substructures. The main novelty of GraphClust compared to previous methods is that it is application-independent and scalable to many large graphs.

Original languageEnglish (US)
Pages (from-to)231-241
Number of pages11
JournalJournal of Information and Knowledge Management
Volume7
Issue number4
DOIs
StatePublished - 2008

Keywords

  • Text clustering
  • document vectors
  • graph clustering
  • graph substructure matching

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Networks and Communications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'GraphClust: A method for clustering database of graphs'. Together they form a unique fingerprint.

Cite this