Google launched Gemini 3.1 Flash-Lite, its fastest and most affordable Gemini 3 model yet, priced at $0.25 per million input tokens and $1.50 per million output tokens.

The model targets high-volume developer workloads, data processing, and translation tasks. It is available in preview via the Gemini API in Google AI Studio and Vertex AI but is not included in the Gemini consumer app.

Compared to Gemini 2.5 Flash-Lite, the new version is more expensive but significantly more capable. It generally outperforms Gemini 2.5 Flash at a lower price point.

The model outperforms competitors including GPT-5 mini and Claude 4.5 Haiku. Grok 4.1 Fast is more affordable, but Gemini 3.1 Flash-Lite is faster, promising up to 363 tokens per second.

On multimodal benchmarks, the model scored 1432 Elo points on the Arena.ai Leaderboard. This places it among open-weight models and last-generation commercial offerings.

Google did not publish agent benchmarks for the release. The company stated the model is intended for high-volume tasks and data processing, not for managing fleets of agents.

Developers can use the API to tune the model’s reasoning time for cost control. Lower reasoning settings produce fewer tokens, which is relevant for high-volume workloads.

This is the first Flash-Lite version for Gemini 3.1. Google traditionally launches more capable Flash versions first or skips Flash-Lite entirely, as it did with Gemini 3.

Google launched Gemini 3.1 Pro two weeks prior. The company describes Flash-Lite as meant for high-volume developer workloads at scale.


Featured image credit