OpenAI launches advanced real-time voice AI models

OpenAI announced new voice intelligence features for its API designed to assist developers in creating interactive applications capable of conversing, transcribing, and translating in real-time. The newly launched GPT‑Realtime‑2 model, built on GPT‑5 class reasoning, aims to handle more complex user requests compared to its predecessor, GPT-Realtime-1.5.

Additionally, OpenAI introduced GPT‑Realtime‑Translate, which provides real-time translation services for over 70 input languages and 13 output languages. This feature is designed to keep pace with users during conversations.

Another major update is the GPT-Realtime-Whisper capability, which offers live speech-to-text transcription for real-time interactions. “Together, the models we are launching move real-time audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds,” OpenAI stated.

These updates target several industries, including customer service, education, media, and events, according to OpenAI. The company noted that the new features could also present risks of misuse, such as creating spam or fraud. To mitigate this, OpenAI has implemented guardrails designed to halt conversations that violate harmful content guidelines.

All new voice models are part of OpenAI’s Realtime API. The billing structure varies, with GPT-Realtime-Translate and GPT-Realtime-Whisper billed by the minute, while GPT-Realtime-2 is billed based on token consumption.

Featured image credit