Evaluating the evaluations of code recommender systems: A reality check

Sebastian Proksch, Sven Amann, Sarah Nadi, Mira Mezini

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

While researchers develop many new exciting code recommender systems, such as method-call completion, code-snippet completion, or code search, an accurate evaluation of such systems is always a challenge. We analyzed the current literature and found that most of the current evaluations rely on artificial queries extracted from released code, which begs the question: Do such evaluations reect real-life usages? To answer this question, we capture 6,189 fine-grained development histories from real IDE interactions. We use them as a ground truth and extract 7,157 real queries for a specific method-call recommender system. We compare the results of such real queries with different artificial evaluation strategies and check several assumptions that are repeatedly used in research, but never empirically evaluated. We find that an evolving context that is often observed in practice has a major effect on the prediction quality of recommender systems, but is not commonly reected in artificial evaluations.

Original languageEnglish (US)
Title of host publicationASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering
EditorsSarfraz Khurshid, David Lo, Sven Apel
PublisherAssociation for Computing Machinery, Inc
Pages111-121
Number of pages11
ISBN (Electronic)9781450338455
DOIs
StatePublished - Aug 25 2016
Event31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016 - Singapore, Singapore
Duration: Sep 3 2016Sep 7 2016

Publication series

NameASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering

Conference

Conference31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016
Country/TerritorySingapore
CitySingapore
Period9/3/169/7/16

Keywords

  • Artificial Evaluation
  • Empirical Study
  • IDE Interaction Data

ASJC Scopus subject areas

  • Software
  • Computational Theory and Mathematics
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Evaluating the evaluations of code recommender systems: A reality check'. Together they form a unique fingerprint.

Cite this