Google AI has released EmbeddingGemma, a new on-device embedding model boasting 308 million parameters. According to Google, its compact size allows it to function effectively on mobile devices and in offline settings. The model achieves sub-15ms inference latency for 256 tokens on EdgeTPU, making it suitable for real-time applications.
Trained on data spanning over 100 languages, EmbeddingGemma secured the top position on the Massive Text Embedding Benchmark (MTEB) among models with fewer than 500 million parameters. Google reports its performance rivals or surpasses that of embedding models almost twice its size, especially in cross-lingual retrieval and semantic search tasks.
More information is available via the provided links to a full analysis, the model on Hugging Face, and technical details.





