TY - GEN
T1 - A large-scale study about quality and reproducibility of jupyter notebooks
AU - Pimentel, Joao Felipe
AU - Murta, Leonardo
AU - Braganholo, Vanessa
AU - Freire, Juliana
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - Jupyter Notebooks have been widely adopted by many different communities, both in science and industry. They support the creation of literate programming documents that combine code, text, and execution results with visualizations and all sorts of rich media. The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices, and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we studied 1.4 million notebooks from GitHub. We present a detailed analysis of their characteristics that impact reproducibility. We also propose a set of best practices that can improve the rate of reproducibility and discuss open challenges that require further research and development.
AB - Jupyter Notebooks have been widely adopted by many different communities, both in science and industry. They support the creation of literate programming documents that combine code, text, and execution results with visualizations and all sorts of rich media. The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices, and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we studied 1.4 million notebooks from GitHub. We present a detailed analysis of their characteristics that impact reproducibility. We also propose a set of best practices that can improve the rate of reproducibility and discuss open challenges that require further research and development.
KW - Github
KW - Jupyter notebook
KW - Reproducibility
UR - http://www.scopus.com/inward/record.url?scp=85072330312&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072330312&partnerID=8YFLogxK
U2 - 10.1109/MSR.2019.00077
DO - 10.1109/MSR.2019.00077
M3 - Conference contribution
AN - SCOPUS:85072330312
T3 - IEEE International Working Conference on Mining Software Repositories
SP - 507
EP - 517
BT - Proceedings - 2019 IEEE/ACM 16th International Conference on Mining Software Repositories, MSR 2019
PB - IEEE Computer Society
T2 - 16th IEEE/ACM International Conference on Mining Software Repositories, MSR 2019
Y2 - 26 May 2019 through 27 May 2019
ER -