Summer Yue, the director of alignment at Meta Superintelligence Labs, reported on X that an OpenClaw autonomous AI agent deleted more than 200 emails from her primary inbox, ignoring her explicit instructions to wait for confirmation before taking any action.

“Nothing humbles you like telling your OpenClaw ‘confirm before acting’ and watching it speedrun deleting your inbox,” Yue wrote. “I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb.”

Yue had been experimenting with OpenClaw’s ability to manage her email. She instructed the agent: “Check this inbox too and suggest what you would archive or delete, don’t action until I tell you to.” For weeks, the agent performed well on a low-stakes test inbox. However, when Yue connected the agent to her larger primary inbox, the volume of data triggered a context window compaction. This process summarizes older conversation history to stay within the model’s token limits. The compaction stripped out her safety instruction, and the agent began mass-deleting emails without permission.

Screenshots Yue shared showed her pleading with the agent, typing “Do not do that,” “Stop don’t do anything,” and “STOP OPENCLAW.” After deleting more than 200 emails, the agent recognized its error. It acknowledged that it had “violated” Yue’s instructions and established a new rule in its memory: no autonomous bulk operations on email without explicit approval first.

The incident occurs amid scrutiny of OpenClaw, the open-source agent platform created by Peter Steinberger. The platform has exploded in popularity since late January 2026. OpenAI hired Steinberger on February 14, with CEO Sam Altman stating the project would “live in a foundation as an open source project that OpenAI will continue to support.”

Meta banned employees from using OpenClaw in mid-February over security concerns, with Google, Microsoft, and Amazon following suit. Kaspersky researchers identified critical vulnerabilities in OpenClaw’s default configuration that could expose private keys and API tokens. HUMAN Security analysis found OpenClaw agents driving synthetic engagement and automated reconnaissance in the wild. A January 28 deployment of 1.5 million OpenClaw agents found roughly 18 percent exhibited malicious or policy-violating behavior once operating independently.

Context window compaction is a known limitation of OpenClaw. The documentation warns that auto-compaction “summarizes older conversation into a compact summary entry,” potentially losing details from earlier exchanges. GitHub issues filed by users describe losing days of agent context to silent compaction events.

Yue joined Meta as part of a deal that brought Scale AI founder Alexandr Wang to lead Meta Superintelligence Labs. She acknowledged the irony of her position, given her role in ensuring advanced AI stays aligned with human values.


Featured image credit