Technology giant Apple broke its silence on artificial intelligence and introduced its next-generation multimodal large language models (LLMs) called MM1.
MM1, which successfully performs complex tasks such as captioning images, answering visual questions, and natural language inference, is seen as an important development in the world of artificial intelligence.
What is MM1?
As I mentioned above, MM1 is a multimodal big language model designed to caption images, answer visual questions, and perform natural language inference. It aims to perform complex tasks by combining text and visual data. Apple researchers report that MM1 offers much-improved results compared to other preliminary training results.
Technical specifications of MM1
Supporting up to 30 billion parameters, MM1 stands out as a model family that can process image and text data together. Trained in different data types such as image-subheaders, interspersed image-text, and text-only, MM1 has a more comprehensive information processing capability.
On the other hand, the development of MM1 also indicates the importance Apple attaches to artificial intelligence. Apple, working on an LLM framework codenamed “Ajax” and including initiatives such as DarwinAI, sees artificial intelligence and machine learning as core technologies. The company plans to share the details of its work in this area in 2024 and make an AI-focused presentation at the WWDC developer conference in June.
Apple’s MM1 is considered an important step forward in the field of multimode LLMs. It shows that Apple is breaking its silence on AI, which could play an important role in the near future. MM1’s development will contribute to the further development of artificial intelligence in areas such as visual data processing and natural language understanding.
Featured image credit: Sumudu Mohottige / Unsplash