Here are common words in AI-generated content

Detecting AI-generated text has long been a challenge for researchers and developers. With the rapid advancement of large language models (LLMs), such as Google’s Gemini Advanced and OpenAI’s GPT-4o, the ability to produce human-like text has become increasingly sophisticated.

However, a new study from researchers at the University of Tübingen and Northwestern University offers a breakthrough in identifying AI-crafted content.

By focusing on the sudden surge in specific vocabulary in scientific writing, they have developed a method to detect the use of LLMs with surprising accuracy. This technique, inspired by pandemic studies that measured excess deaths, reveals how changes in word usage can signal the presence of AI-generated text.

Here are common words in AI-generated content — **Researchers developed a method to identify AI-generated text based on sudden surges in specific vocabulary in scientific writing** (Image credit)

What are the words that give AI content away?

To measure these changes, the team scrutinized the frequency of each word annually. By comparing the expected word frequency, based on pre-2023 trends, to actual usage in 2023 and 2024, they identified a dramatic increase in certain terms. For example, the word “delves” appeared 25 times more frequently in 2024 abstracts than anticipated. Similarly, “showcasing” and “underscores” saw a ninefold increase in usage.

Here are the most used words in AI-generated text with their corresponding rates of increase in usage:

Delves – 25 times increase
Showcasing – 9 times increase
Underscores – 9 times increase
Potential – 4.1 percentage points increase
Findings – 2.7 percentage points increase
Crucial – 2.6 percentage points increase
Across – significant increase (exact rate not specified)
Additionally – significant increase (exact rate not specified)
Comprehensive – significant increase (exact rate not specified)
Enhancing – significant increase (exact rate not specified)
Exhibited – significant increase (exact rate not specified)
Insights – significant increase (exact rate not specified)
Notably – significant increase (exact rate not specified)
Particularly – significant increase (exact rate not specified)
Within – significant increase (exact rate not specified)

These words have become telltale signs of AI involvement, showing up far more frequently than expected. While language evolves naturally, such abrupt changes are unusual and often tied to major global events.

In this case, the widespread use of LLMs has led to a noticeable shift in the vocabulary of scientific literature.

Inspiration from pandemic analysis

The researchers’ approach draws heavily from techniques used during the COVID-19 pandemic. Just as excess deaths were calculated by comparing observed fatalities to historical data, this study compares current word usage against historical trends to identify anomalies. They analyzed over 14 million scientific abstracts published on PubMed from 2010 to 2024, identifying a significant uptick in certain words starting in late 2022, coinciding with the broader adoption of LLMs.

The researchers noted that the rise in specific words, termed “marker words,” is a clear indicator of LLM usage. This phenomenon differs from past vocabulary shifts linked to events like the COVID-19 pandemic, which saw an increase in noun-heavy language.

In contrast, the post-LLM period has seen a spike in verbs, adjectives, and adverbs. This shift highlights how AI-generated text subtly changes the texture and style of writing.

By identifying these marker words, the researchers estimate that at least 10% of scientific abstracts in 2024 were either generated or significantly assisted by LLMs. This estimate is likely conservative, as not all AI-assisted texts will contain these specific markers. Nonetheless, the presence of these words provides a reliable metric for detecting AI influence in academic writing.

Geographical trends in LLM usage

The study also uncovered geographical variations in the adoption of LLMs. Countries like China, South Korea, and Taiwan showed a higher frequency of marker words in scientific papers, indicating that LLMs are particularly valuable for non-native English speakers. These tools help refine and enhance their writing, making it more polished and publication-ready.

Conversely, native English speakers may be more skilled at recognizing and eliminating these markers, thereby concealing their use of AI. This difference suggests that while LLMs are widely used across the globe, their impact is more pronounced in regions where English is not the primary language.

Featured image credit: Freepik

Tags: AI featured

Avoid these words at all cost if you don’t want to get caught using AI

Research shows at least 10% of scientific abstracts in 2024 were either generated or significantly assisted by LLMs. Here are the most common ones.

Emre Çıtak

Related Posts

Microsoft’s AI chip shopping spree leaves rivals in the dust

AI will transform filmmaking but human actors can still play their part

OpenAI launches real-time video features for ChatGPT

Google’s Gemini 2.0 is here: Multimodal and mighty

LATEST

Walmart’s new gaming site aims to fill the game informer void

Apple goes Pixel style? iPhone 17 Pro Max features bold camera redesign

Microsoft’s AI chip shopping spree leaves rivals in the dust

Samsung Galaxy S25 leaks are more of the same

Pokémon Go holiday event adds Festive Dedenne and big rewards

Windows 11 adds a game-changing webcam feature

Nvidia stock falls hard: Are thermal chip issues just the start?

Blackmagic’s $30K camera promises Vision Pro like you’ve never seen