Analyzing Developer Use of ChatGPT Generated Code in Open Source GitHub Projects

Balreet Grewal, Wentao Lu, Sarah Nadi, Cor Paul Bezemer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The rapid development of large language models such as ChatGPT have made them particularly useful to developers in generating code snippets for their projects. To understand how ChatGPT's generated code is leveraged by developers, we conducted an empirical study of 3,044 ChatGPT-generated code snippets integrated within GitHub projects. A median of 54% of the generated lines of code is found in the project's code and this code typically remains unchanged once added. The modifications of the 76 code snippets that changed in a subsequent commit, consisted of minor functionality changes and code reorganizations that were made within a day. Our findings offer insights that help drive the development of AI-assisted programming tools. We highlight the importance of making changes in ChatGPT code before integrating it into a project.

Original languageEnglish (US)
Title of host publicationProceedings - 2024 IEEE/ACM 21st International Conference on Mining Software Repositories, MSR 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages157-161
Number of pages5
ISBN (Electronic)9798400705878
DOIs
StatePublished - 2024
Event21st IEEE/ACM International Conference on Mining Software Repositories, MSR 2024 - Lisbon, Portugal
Duration: Apr 15 2024Apr 16 2024

Publication series

NameProceedings - 2024 IEEE/ACM 21st International Conference on Mining Software Repositories, MSR 2024

Conference

Conference21st IEEE/ACM International Conference on Mining Software Repositories, MSR 2024
Country/TerritoryPortugal
CityLisbon
Period4/15/244/16/24

Keywords

  • Code Reuse
  • LLM
  • SE4AI

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Analyzing Developer Use of ChatGPT Generated Code in Open Source GitHub Projects'. Together they form a unique fingerprint.

Cite this