OpenAI has launched real-time video capabilities for ChatGPT, integrating visual feature functionality into its Advanced Voice Mode. This announcement was made during a livestream event held on December 12, 2024. The updated version can recognize objects in real time through the user’s device camera, process visual information via screen sharing, and engage in human-like conversations. Available to subscribers of ChatGPT Plus, Team, and Pro, the update will roll out over the next week, with some users in the EU facing potential delays.
With this enhancement, ChatGPT can respond to users by interpreting what it sees, such as offering explanations of settings on a device or providing suggestions for solving math problems. To utilize these features, users can tap the voice icon in the ChatGPT app and activate video or screen sharing. The rollout is prioritized for Plus subscribers, while ChatGPT Enterprise and Edu users will gain access in January 2025.
OpenAI showcases advanced features in live demonstration
During the recent live demonstration, OpenAI President Greg Brockman highlighted the engaging capabilities of Advanced Voice Mode with vision. Brockman quizzed CNN’s Anderson Cooper on anatomy while demonstrating how ChatGPT could comprehend and comment on drawings made by Cooper on a blackboard. The interaction illustrated the potential for real-time, interactive learning and feedback, though ChatGPT did exhibit some errors, such as miscalculating a geometry problem, emphasizing the ongoing challenges related to model accuracy.
This updated feature has faced delays in development, with OpenAI previously hinting at launch windows that were ultimately extended. Originally demonstrated in May 2024, Advanced Voice Mode had been expected to arrive sooner. The recent announcement showcases the culmination of efforts to refine and finalize this capability, indicating a strategic focus on enhancing the user experience and technological capabilities of the chatbot.
In addition to the new visual functionalities, OpenAI also unveiled a “Santa Mode” for the holiday season, enabling users to interact with ChatGPT in a festive tone. To access this feature, users can click the snowflake icon in the application. This seasonal option resets usage limits for the initial interaction, allowing users more opportunity to engage with the festive feature.
Rivals such as Google and Meta are developing similar capabilities, like Google’s Project Astra, which was made available to selected testers on Android recently. These efforts reflect a broader trend wherein major tech companies are working to integrate interactive video functionalities into their AI models, enhancing user engagement and application versatility.
OpenAI has also indicated plans to roll out other enhancements, such as reinforcing its Reinforcement Fine-Tuning Research Program, designed to optimize model performance in specialized tasks. This program is aimed at research institutions and enterprises, allowing them to customize OpenAI models for complex applications.
Featured image credit: OpenAI