Be Sure to Use the Same Writing Style: Applying Authorship Verification on Large-Language-Model-Generated Texts

Janith Weerasinghe, Ovendra Seepersaud, Genesis Smothers, Julia Jose, Rachel Greenstadt

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Recently, there have been significant advances and wide-scale use of generative AI in natural language generation. Models such as OpenAI’s GPT3 and Meta’s LLaMA are widely used in chatbots, to summarize documents, and to generate creative content. These advances raise concerns about abuses of these models, especially in social media settings, such as large-scale generation of disinformation, manipulation campaigns that use AI-generated content, and personalized scams. We used stylometry (the analysis of style in natural language text) to analyze the style of AI-generated text. Specifically, we applied an existing authorship verification (AV) model that can predict if two documents are written by the same author on texts generated by GPT2, GPT3, ChatGPT and LLaMA. Our AV model was trained only on human-written text and was effectively used in social media settings to analyze cases of abuse. We generated texts by providing the language models with fanfiction snippets and prompting them to complete the rest of it in the same writing style as the original snippet. We then applied the AV model across the texts generated by the language models and the human written texts to analyze the similarity of the writing styles between these texts. We found that texts generated with GPT2 had the highest similarity to the human texts. Texts generated by GPT3 and ChatGPT were very different from the human snippet, and were similar to each other. LLaMA-generated texts had some similarity to the original snippet but also has similarities with other LLaMA-generated texts and texts from other models. We then conducted a feature analysis to identify the features that drive these similarity scores. This analysis helped us answer questions like which features distinguish the language style of language models and humans, which features are different across different models, and how these linguistic features change over different language model versions. The dataset and the source code used in this analysis have been made public to allow for further analysis of new language models.

    Original languageEnglish (US)
    Article number2467
    JournalApplied Sciences (Switzerland)
    Volume15
    Issue number5
    DOIs
    StatePublished - Mar 2025

    Keywords

    • authorship verification
    • language style
    • large language models
    • natural language processing
    • stylometry

    ASJC Scopus subject areas

    • General Materials Science
    • Instrumentation
    • General Engineering
    • Process Chemistry and Technology
    • Computer Science Applications
    • Fluid Flow and Transfer Processes

    Fingerprint

    Dive into the research topics of 'Be Sure to Use the Same Writing Style: Applying Authorship Verification on Large-Language-Model-Generated Texts'. Together they form a unique fingerprint.

    Cite this