Google Gemini now supports audio file uploads

Google’s Gemini AI assistant now supports audio file uploads, enabling users to transcribe, summarize, and extract key information from recordings. This new feature transforms up to 10 minutes of voice memos, meetings, lectures, and interviews into searchable documents.

The audio upload capability is available on both the web and through the mobile apps, accessible via the standard file-upload interface. According to Google’s VP of Gemini, Josh Woodward, the audio file uploading feature was the most requested by users.

This functionality differs from Gemini Live, which focuses on real-time voice commands, while the new feature is designed to process data from uploaded audio files. During testing, Gemini accurately transcribed sketches from comedy albums and phone conversations, with only minor errors related to name recognition. The AI also effectively identified key elements and items suitable for creating to-do lists.

The addition of audio processing aligns with recent Gemini improvements, including app integration, a card-based visual interface, and expanded personalization options. This feature allows users to convert saved audio logs and memos into searchable content, streamlining a process that previously required external transcription software.

While other AI assistants such as ChatGPT (using Whisper), Anthropic’s Claude, and Perplexity also offer audio processing capabilities, Gemini’s implementation is geared towards everyday use cases. Users can leverage Gemini to simplify language, isolate speaker-specific comments, generate questions, and create study guides from audio content.

However, the 10-minute audio limit and daily usage caps for free-tier users may restrict the frequency of use. Google has not yet released formal pricing for high-volume audio processing, as it currently falls under the regular Gemini quota. Users planning to process extensive audio content should manage their usage accordingly.

In essence, Gemini’s new audio feature provides a streamlined way to process and extract valuable information from audio files, making it a useful tool for various personal and professional applications.