AI News Bureau

All About Apple’s New GenAI Model MM1

MM1 consists of dense models and mixture-of-experts (MoE) variants.

Written by: CDO Magazine Bureau

Updated 3:31 PM UTC, Thu March 21, 2024

Representative image by pvproductions.

Apple has reportedly come up with MM1, a family of multimodal large language models with up to 30 billion parameters, designed to achieve state-of-the-art performance in pre-training and competitive results after supervised fine-tuning.

MM1 consists of dense models and mixture-of-experts (MoE) variants. MM1 models aim to achieve state-of-the-art performance in pre-training metrics and competitive results after supervised fine-tuning across various established multimodal benchmarks.

The researchers emphasize the importance of using a diverse dataset comprising both text and image data, which enables models to excel in tasks such as image captioning, visual question answering, and natural language inference.

Surprisingly, the largest MM1 model with 30 billion parameters demonstrated impressive in-context learning capabilities, enabling it to conduct multi-step reasoning across multiple input images using a few-shot “chain-of-thought” prompting. This suggests the potential of large multimodal models to tackle complex, open-ended problems requiring grounded language understanding and generation.

The research highlights the significance of the choice of image encoder, image resolution, and the number of image tokens in influencing model performance. While the design of the vision-language connector was deemed less critical, scaling and refining visual components remain crucial for further advancements in multimodal models.

Around the same time when the company is actively experimenting with generative AI, it is reportedly in talks to integrate Google’s Gemini iPhone. Additionally, the tech giant has reportedly engaged in talks with OpenAI, regarding the use of its model. Recently, it also acquired Canadian AI startup DarwinAI.