AI News Bureau

Google Unveils Gemini 1.5 Pro with Improved Long-context Understanding

Gemini 1.5 boasts groundbreaking long-context understanding, allowing it to process up to 1 million tokens, the largest context window among large-scale foundation models.

Written by: CDO Magazine Bureau

Updated 9:59 AM UTC, Tue February 20, 2024

Google has unveiled Gemini 1.5, an updated version of its generative AI model, delivering improved performance and a breakthrough in long-context understanding as it can consistently process up to 1 million tokens.

Gemini 1.5 Pro, the first model released for early testing, is a mid-size multimodal model that achieves comparable quality to its predecessor, Gemini 1.0 Ultra, while using less computing power.

“This new generation also delivers a breakthrough in long-context understanding. We’ve been able to significantly increase the amount of information our models can process — running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet,” said Google and Alphabet CEO Sundar Pichai in an official blog.

“Longer context windows show us the promise of what is possible. They will enable entirely new capabilities and help developers build much more useful models and applications. We’re excited to offer a limited preview of this experimental feature to developers and enterprise customers,” Pichai added.

Primary features of Gemini 1.5 Pro

Gemini 1.5 boasts long-context understanding, allowing it to process up to 1 million tokens, the largest context window among large-scale foundation models. This expanded context window enables the model to handle vast amounts of information, including processing one hour of video, 11 hours of audio, codebases with over 30,000 lines of code, or over 700,000 words.

The increased context window enhances the model’s capabilities for analyzing, classifying, and summarizing large datasets, as demonstrated with a 402-page transcript from Apollo 11’s mission.

Also Read

OpenAI Launches Sora, New Text-to-Video Model

It also incorporates a Mixture-of-Experts (MoE) architecture, where the model is divided into smaller expert neural networks. This specialization enhances efficiency by selectively activating relevant expert pathways.

The model’s proficiency is further demonstrated by its ability to perform complex reasoning tasks, such as analyzing a 44-minute silent Buster Keaton movie or problem-solving across more than 100,000 lines of code.

In terms of performance, Gemini 1.5 Pro outperforms its predecessor, Gemini 1.0 Pro, on 87% of the benchmarks used for LLMs. Even when compared to the larger Gemini 1.0 Ultra, it maintains a similar level of performance. Notably, in the ‘Needle In A Haystack’ evaluation, 1.5 Pro successfully located embedded text 99% of the time in blocks of data as long as 1 million tokens.

Emphasizing the commitment to ethics and safety, Google ensures extensive testing in these aspects for its models. The responsible deployment of Gemini 1.5 involves evaluations on content safety and representational harms, aligning with Google’s AI Principles. Google plans to introduce pricing tiers for Gemini 1.5, with a standard 128,000 token context window and an option to scale up to 1 million tokens, providing flexibility for users based on their needs.

To enable developers and enterprise customers to explore and experiment with Gemini models, Google offers a limited preview of Gemini 1.5 Pro through AI Studio and Vertex AI. While the initial testing period for the 1 million token context window is free, users can expect longer latency times, with improvements anticipated in the near future.