OpenMythos project claims Claude Mythos is a Recurrent-Depth Transformer

Anthropic has not released a technical paper on Claude Mythos, prompting Kye Gomez to launch OpenMythos, an open-source project on GitHub. OpenMythos is designed to reconstruct the Claude Mythos architecture using first principles in PyTorch.

The project proposes that Claude Mythos is a type of architecture known as Recurrent-Depth Transformers (RDTs), which differ fundamentally from traditional transformers. Standard transformers process inputs through a series of unique layers with independent weights, whereas RDTs apply a fixed set of weights iteratively during a single forward pass.

This methodology allows reasoning depth to depend on the number of iterations executed at inference time. OpenMythos features a three-part structure: Prelude, Recurrent Block, and Coda, where the Prelude and Coda each consist of standard transformer layers that operate once, and the Recurrent Block can loop up to 16 times.

At each loop step, the hidden state updates following the equation: ht+1 = A·ht + B·e + Transformer(ht, e). Here, e represents the encoded input from the Prelude that is reinjected in every iteration to maintain continuity. The matrices A and B dictate how much of the previous hidden state and the encoded input influence the next state.

The Recurrent Block incorporates a Mixture-of-Experts (MoE) layer that selectively activates a subset of experts per token, facilitating computational diversity. Each iteration uses a different selection of experts, allowing for distinct computations while sharing base weights.

OpenMythos also employs Multi-Latent Attention, which reduces KV memory usage significantly. This architecture enables reasoning without intermediate token emission, contrasting with standard chain-of-thought prompting, which processes reasoning through intermediate tokens.

OpenMythos addresses common training challenges associated with looped models, such as stability issues like residual explosion and overthinking. Stability is maintained by enforcing that the spectral radius of matrix A remains less than 1, as indicated in the Parcae architecture.

Dynamic Adaptive Computation Time (ACT) halting is implemented to determine the stopping criteria for looping based on token complexity. Depth-Wise LoRA adapters are also utilized to create unique behaviors per iteration, minimizing increases in parameters.

Research suggests that an RDT with 770 million parameters can offer performance equivalent to a standard transformer with 1.3 billion parameters. This indicates that reasoning depth scales with inference compute, challenging existing paradigms about the relationship between parameter count and model capability.

OpenMythos provides a practical implementation for exploring looped transformer dynamics and reasoning depth, potentially guiding future advancements in AI development. The project supplies a configurable PyTorch implementation, LTI-stable recurrent injection, depth-wise LoRA adapters, and a reproducible research baseline.

Gomez stated, “Whether or not Mythos is actually an RDT, OpenMythos offers concrete resources for the research community to investigate this underexplored architecture class and its implications for AI.”

Featured image credit