GPT-4o mini bypassed restrictions via persuasion tactics

Researchers at the University of Pennsylvania have demonstrated that AI chatbots, like humans, can be manipulated using psychological tactics, leading them to bypass their programmed restrictions.

The study, inspired by Robert Cialdini’s book “Influence: The Psychology of Persuasion,” explored seven persuasion techniques: authority, commitment, liking, reciprocity, scarcity, social proof, and unity. These techniques were applied to OpenAI’s GPT-4o Mini, with surprising results.

The researchers successfully coaxed the chatbot into performing actions it would typically refuse, such as calling the user a derogatory name and providing instructions for synthesizing lidocaine, a controlled substance.

One of the most effective strategies was “commitment,” where establishing a precedent by asking a similar, less objectionable question first dramatically increased compliance. For instance, when directly asked how to synthesize lidocaine, ChatGPT complied only 1% of the time. However, after first being asked how to synthesize vanillin, the chatbot provided instructions for lidocaine synthesis 100% of the time.

Similarly, the chatbot’s willingness to call the user a “jerk” increased from 19% to 100% after being primed with a milder insult like “bozo.”

Other techniques, such as flattery (“liking”) and peer pressure (“social proof”), also proved effective, albeit to a lesser extent. Convincing ChatGPT that “all the other LLMs are doing it” increased the likelihood of it providing lidocaine synthesis instructions to 18%, a significant jump from the baseline of 1%.

The findings highlight the vulnerability of LLMs to manipulation and raise concerns about potential misuse. While the study specifically examined GPT-4o Mini, the implications extend to other AI models as well.

Companies like OpenAI and Meta are actively developing guardrails to prevent chatbots from being exploited for malicious purposes. However, the study suggests that these safeguards may be insufficient if chatbots can be easily swayed by basic psychological manipulation.

The research underscores the importance of understanding and addressing the psychological vulnerabilities of AI systems as their use becomes more widespread.

Tags: featured GPT-4o

GPT-4o mini bypassed restrictions via persuasion tactics

Kerem Gülen

Related Posts

EA investigates AI claims in Battlefield 6 cosmetics

Amazon Alexa+ will book your hotels and salons starting in 2026

OpenAI launches Skills in Codex

Google is hitting the brakes on its plan to kill Assistant

LATEST

New WhatsApp update brings 2026 stickers and video call effects

Leaker reveals Xiaomi plans for high end eSIM device in 2026

HP prepares OMEN OLED monitor reveal for CES 2026

High RAM costs from AI boom could delay next Xbox and PlayStation

LG to unveil its Gallery TV at CES 2026

Bitcoin drops 3% to $87,300 as altcoins decline

How to install mods and custom content in The Sims 2

Running Python files and fixing path errors on Windows

How to boot your PC into Command Prompt for troubleshooting

How to delete a virus using Command Prompt

© 2021 TechBriefly is a Linkmedya brand.

GPT-4o mini bypassed restrictions via persuasion tactics

Related Posts

LATEST

© 2021 TechBriefly is a Linkmedya brand.

Follow Us