Microsoft has announced MAI-Image-1, its first image generation model developed entirely in-house. The company stated the model will be available on Copilot and Bing Image Creator “very soon” and is currently available for testing at LMArena, a platform where users evaluate two anonymous chatbots and vote for the best response.
On LMArena’s text-to-image leaderboard, MAI-Image-1 ranked ninth, achieving a score of 1,096 points. For comparison, Google’s Gemini-2.5-Flash, also known as Nano-Banana, scored 1,154 points and holds the second rank, while OpenAI’s model scored 1,123 points for the seventh rank. The leaderboard is led by Hunyuan-image-3.0, a model developed by the Chinese tech company Hunyuan.
Microsoft stated that its development team focused on avoiding repetitive or generically stylized outputs with MAI-Image-1. “For example, we prioritised rigorous data selection and nuanced evaluation focused on tasks that closely mirror real-world creative use cases,” the company explained, adding that it incorporated feedback from professionals in creative industries.
The model is reported to excel at generating landscapes and photorealistic imagery. Its performance is noted for accurately capturing details such as lighting, shadows, and reflections, particularly in comparison to “many larger, slower models.”
In addition to MAI-Image-1, Microsoft has developed other internal models, including MAI-Voice-1 for natural speech generation and the Phi series of small language models designed for efficient reasoning tasks. This internal development occurs alongside the company’s continued financial and infrastructural support for OpenAI.
The field of AI image generation is currently experiencing a period of high activity. OpenAI’s model recently gained viral attention for its ability to imitate the Studio Ghibli art style, while Google’s “Nano-Banana” was recognized for its advanced editing capabilities.
Using LMArena, AIM conducted a comparison of Microsoft’s MAI-Image-1, Google’s Gemini-2.5-Flash, and OpenAI’s GPT-image-1. The models were tested with a prompt depicting two people in a café by a window during the late afternoon. The evaluation focused on how each model handled mixed lighting, reflections, and the realism of shadows. Users can visit LMArena to test these models with similar prompts.




