OpenAI unveiled its first custom-built inference processor, Jalapeño, developed in collaboration with Broadcom. The processor, specifically designed for OpenAI’s inference systems, was assisted by the company’s own AI models, the company stated.
Jalapeño is currently undergoing testing, with early results indicating significantly better performance-per-watt compared to current state-of-the-art alternatives. The partnership with Broadcom was officially announced in October, with ongoing speculation that it aims to reduce OpenAI’s reliance on Nvidia’s GPUs.
Similar to OpenAI, Google and Amazon have also created custom chips termed “AI accelerators” to enhance machine learning performance. OpenAI President Greg Brockman discussed the company’s chip development strategy on a podcast shortly after announcing the Broadcom partnership.
“We have a deep understanding of the workload,” Brockman said. “We’ve really been looking for specific workloads that are underserved, [and asking] how can we build something that will be able to accelerate what’s possible?”
The Jalapeño processor is optimized for inference, which involves executing pre-built AI models in response to user commands. OpenAI highlighted the chip’s low operating costs for real-time coding models. More performance-intensive tasks, such as pre-training, are likely to continue relying on Nvidia hardware, but reduced inference costs could positively impact OpenAI’s finances.
Optimizing inference systems may be essential for the economic viability of AI moving forward. OpenAI is focused on building various agentic products, including Codex, alongside the necessary data centers for deployment. The shift towards custom chips allows the company to enhance its infrastructure further.
“OpenAI is not only developing frontier models or building products on top of them; it is designing the infrastructure underneath them: chip architecture, kernels, memory systems, networking, scheduling, deployment systems, and product experience,” the company stated. “Because OpenAI operates across the stack, each layer can be optimized around the same goal: making its models faster, more reliable, and more affordable for users.”








