LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection

Mervat Abassy, Kareem Elozeiri, Alexander Aziz, Minh Ngoc Ta, Raj Vardhan Tomar, Bimarsha Adhikari, Saad El Dine Ahmed, Yuxia Wang, Osama Mohammed Afzal, Zhuohan Xie, Jonibek Mansurov, Ekaterina Artemova, Vladislav Mikhailov, Rui Xing, Jiahui Geng, Hasan Iqbal, Zain Muhammad Mujahid, Tarek Mahmoud, Akim Tsvigun, Alham Fikri AjiArtem Shelmanov, Nizar Habash, Iryna Gurevych, Preslav Nakov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The ease of access to large language models (LLMs) has enabled a widespread of machine-generated texts, and now it is often hard to tell whether a piece of text was human-written or machine-generated. This raises concerns about potential misuse, particularly within educational and academic domains. Thus, it is important to develop practical systems that can automate the process. Here, we present one such system, LLM-DetectAIve, designed for fine-grained detection. Unlike most previous work on machine-generated text detection, which focused on binary classification, LLM-DetectAIve supports four categories: (i) human-written, (ii) machine-generated, (iii) machine-written, then machine-humanized, and (iv) human-written, then machine-polished. Category (iii) aims to detect attempts to obfuscate the fact that a text was machine-generated, while category (iv) looks for cases where the LLM was used to polish a human-written text, which is typically acceptable in academic writing, but not in education. Our experiments show that LLM-DetectAIve can effectively identify the above four categories, which makes it a potentially useful tool in education, academia, and other domains. LLM-DetectAIve is publicly accessible at https://github.com/mbzuai-nlp/LLM-DetectAIve. The video describing our system is available at https://youtu.be/E8eT_bE7k8c.

Original languageEnglish (US)
Title of host publicationEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of System Demonstrations
EditorsDelia Irazu Hernandez Farias, Tom Hope, Manling Li
PublisherAssociation for Computational Linguistics (ACL)
Pages336-343
Number of pages8
ISBN (Electronic)9798891761674
DOIs
StatePublished - 2024
Event2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2024 - Hybrid, Miami, United States
Duration: Nov 12 2024Nov 16 2024

Publication series

NameEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of System Demonstrations

Conference

Conference2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2024
Country/TerritoryUnited States
CityHybrid, Miami
Period11/12/2411/16/24

ASJC Scopus subject areas

  • Linguistics and Language
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint

Dive into the research topics of 'LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection'. Together they form a unique fingerprint.

Cite this