Open-source code repository attributes predict impact of computer science research

Prajjwal Bhattarai, Mohammed Ghassemi, Tuka Alhanai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With an increased importance of transparency and reproducibility in computer science research, it has become common to publicly release open-source repositories that contain the code, data, and documentation alongside a publication. We study the relationship between transparency of a publication (as represented by the attributes of its open-source repository) and its scientific impact (as represented by paper citations). Using the Mann-Whitney test and Cliff's delta, we observed a statistically significant difference in citations between papers with and without an associated open-source repository. We also observed a statistically significant correlation (p < 0.01) between citations and several repository interaction features: Stars, Forks, Subscribers and Issues. Finally, using timeseries features of repository growth (Stars), we trained a classifier to predict whether a paper would be highly cited (top 10%) with cross-validated AUROC of 0.8 and AUPRC of 0.65. Our results provide evidence that those who make sustained efforts in making their works transparent also tend to have a higher scientific impact.

Original languageEnglish (US)
Title of host publicationJCDL 2022 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781450393454
DOIs
StatePublished - Jun 20 2022
Event22nd ACM/IEEE Joint Conference on Digital Libraries, JCDL 2022 - Virtual, Online, Germany
Duration: Jun 20 2022Jun 24 2022

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Conference

Conference22nd ACM/IEEE Joint Conference on Digital Libraries, JCDL 2022
Country/TerritoryGermany
CityVirtual, Online
Period6/20/226/24/22

Keywords

  • Academic transparency
  • Citations
  • Open-source repositories
  • Reproducibility
  • Scientific impact
  • Time-series analysis

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Open-source code repository attributes predict impact of computer science research'. Together they form a unique fingerprint.

Cite this