TY - GEN
T1 - Open-source code repository attributes predict impact of computer science research
AU - Bhattarai, Prajjwal
AU - Ghassemi, Mohammed
AU - Alhanai, Tuka
N1 - Publisher Copyright:
© 2022 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
PY - 2022/6/20
Y1 - 2022/6/20
N2 - With an increased importance of transparency and reproducibility in computer science research, it has become common to publicly release open-source repositories that contain the code, data, and documentation alongside a publication. We study the relationship between transparency of a publication (as represented by the attributes of its open-source repository) and its scientific impact (as represented by paper citations). Using the Mann-Whitney test and Cliff's delta, we observed a statistically significant difference in citations between papers with and without an associated open-source repository. We also observed a statistically significant correlation (p < 0.01) between citations and several repository interaction features: Stars, Forks, Subscribers and Issues. Finally, using timeseries features of repository growth (Stars), we trained a classifier to predict whether a paper would be highly cited (top 10%) with cross-validated AUROC of 0.8 and AUPRC of 0.65. Our results provide evidence that those who make sustained efforts in making their works transparent also tend to have a higher scientific impact.
AB - With an increased importance of transparency and reproducibility in computer science research, it has become common to publicly release open-source repositories that contain the code, data, and documentation alongside a publication. We study the relationship between transparency of a publication (as represented by the attributes of its open-source repository) and its scientific impact (as represented by paper citations). Using the Mann-Whitney test and Cliff's delta, we observed a statistically significant difference in citations between papers with and without an associated open-source repository. We also observed a statistically significant correlation (p < 0.01) between citations and several repository interaction features: Stars, Forks, Subscribers and Issues. Finally, using timeseries features of repository growth (Stars), we trained a classifier to predict whether a paper would be highly cited (top 10%) with cross-validated AUROC of 0.8 and AUPRC of 0.65. Our results provide evidence that those who make sustained efforts in making their works transparent also tend to have a higher scientific impact.
KW - Academic transparency
KW - Citations
KW - Open-source repositories
KW - Reproducibility
KW - Scientific impact
KW - Time-series analysis
UR - http://www.scopus.com/inward/record.url?scp=85133258155&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85133258155&partnerID=8YFLogxK
U2 - 10.1145/3529372.3530927
DO - 10.1145/3529372.3530927
M3 - Conference contribution
AN - SCOPUS:85133258155
T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries
BT - JCDL 2022 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd ACM/IEEE Joint Conference on Digital Libraries, JCDL 2022
Y2 - 20 June 2022 through 24 June 2022
ER -