Evaluating the Effectiveness of LLMs in Fixing Maintainability Issues in Real-World Projects

Henrique Nunes, Eduardo Figueiredo, Larissa Rocha, Sarah Nadi, Fischer Ferreira, Geanderson Esteves

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Large Language Models (LLMs) have gained attention for addressing coding problems, but their effectiveness in fixing code maintainability remains unclear. This study evaluates LLMs capability to resolve 127 maintainability issues from 10 GitHub repositories. We use zero-shot prompting for Copilot Chat and Llama 3.1, and few-shot prompting with Llama only. The LLM-generated solutions are assessed for compilation errors, test failures, and new maintainability problems. Llama with few-shot prompting successfully fixed 44.9 % of the methods, while Copilot Chat and Llama zero-shot fixed 32.29 % and 30 %, respectively. However, most solutions introduced errors or new maintainability issues. We also conducted a human study with 45 participants to evaluate the readability of 51 LLM-generated solutions. The human study showed that 68.63 % of participants observed improved readability. Overall, while LLMs show potential for fixing maintainability issues, their introduction of errors highlights their current limitations.

Original languageEnglish (US)
Title of host publicationProceedings - 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages669-680
Number of pages12
ISBN (Electronic)9798331535100
DOIs
StatePublished - 2025
Event32nd IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025 - Montreal, Canada
Duration: Mar 4 2025Mar 7 2025

Publication series

NameProceedings - 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025

Conference

Conference32nd IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2025
Country/TerritoryCanada
CityMontreal
Period3/4/253/7/25

Keywords

  • large language models
  • maintainability
  • refactoring

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Evaluating the Effectiveness of LLMs in Fixing Maintainability Issues in Real-World Projects'. Together they form a unique fingerprint.

Cite this