
Days after launching Deep Research powered by Gemini 2.5 Pro Experimental, Google is back at it again with a new model, DolphinGemma. This large language model is meant to help scientists "study how dolphins communicate" and "hopefully find out what they're saying, too."
The company is working with researchers at Georgia Tech and the Wild Dolphin Project (WDP), led by its founder, Dr. Denise Herzing. The WDP's primary mission, as you can probably guess, is to observe, document, and report on the natural behaviors, social structures, communication patterns, and habitats of wild dolphins, specifically the Atlantic spotted dolphin (Stenella frontalis), through "non-invasive, long-term field research."
Over the years, the WDP has collected data that allows it to correlate certain dolphin sounds with behaviors. For example:
- Signature whistles (unique names) that can be used by mothers and calves to reunite
- Burst-pulse "squawks" often seen during fights
- Click "buzzes" often used during courtship or chasing sharks
According to Google, "analyzing dolphins' natural, complex communication is a monumental task, and WDP's vast, labeled dataset provides a unique opportunity for cutting-edge AI."
That's where DolphinGemma comes in. Simply put, it's an AI model developed by Google on the WDP's dataset, which uses Google's own SoundStream tokenizer to break down dolphin vocalizations into more manageable audio units.
These are then run through a specialized model architecture designed to make sense of complex sequences. The whole setup sits at around 400 million parameters, making it lightweight enough to run natively on Pixel phones, which the WDP researchers carry with them out in the field.

Now, unlike traditional machine learning models, DolphinGemma doesn’t deal in words or images; it’s strictly audio-in, audio-out. It takes in sequences of natural dolphin vocalizations, processes them using an approach inspired by how large language models understand human speech, and predicts the most likely next sound in a sequence.
Dr. Denise Herzing compares it to autocomplete but for dolphin whistles, burst pulses, and click trains. It’s trained to identify patterns, structure, and progression in these sounds, just like how a text-based model predicts the next word in a sentence based on context.
Before Google came along with DolphinGemma, the team of researchers at the WDP had been using CHAT (Cetacean Hearing Augmentation Telemetry) to explore the possibility of two-way communication with dolphins. The goal with CHAT wasn’t to crack the full complexity of dolphin language but to build a simpler, shared vocabulary for interaction.
The system works by associating new, synthetic whistles, created by CHAT, with specific objects the dolphins seem to enjoy. Think things like sargassum, seagrass, or even scarves the researchers use.
The hope was that by repeatedly associating these synthetic whistles with objects, the dolphins would start mimicking the sounds to "ask" for those items.
CHAT ran off a Google Pixel 6, which handled high-quality audio analysis in real time. Using off-the-shelf phones meant the team didn’t need custom gear. It made things smaller, cheaper, more efficient, and easier to maintain in the open ocean.
For the upcoming season, they’re upgrading to the Pixel 9, which adds better speaker and microphone capabilities and has enough power to run deep learning models and pattern matching at the same time.

Just like other Gemma models, Google says it is bringing DolphinGemma as an open model this summer in hopes of "give researchers worldwide the tools to mine their own acoustic datasets, accelerate the search for patterns and collectively deepen our understanding of these intelligent marine mammals."
Gemma is a family of lightweight large language models developed by Google. The latest addition to the family is Gemma 3, available in four sizes: 1 billion, 4 billion, 12 billion, and 27 billion parameters.
3 Comments - Add comment