Three images side by side of a crow, whales, and an elephant.
Crows, whales and elephants have all been the subject of AI-assisted studies.Credit: Discovery Access/Getty; Amanda Cotton/Project CETI; Allstar Picture Library/Getty
Listening to sperm whales has taught Shane Gero the importance of seeing the animals he studies as individuals, each with a unique history.
He and his fellow scientists give the whales names — Pinchy, Quasimodo, Scar, Mysterio and Mysterio’s son Enigma. Often these names are based on some identifying physical feature. Fingers, for instance, is named after a pair of marks on her right fluke that look like she’s flashing a peace sign.
[
Nature Outlook: Robotics and artificial intelligence](https://www.nature.com/collections/dgebebiadj)
The scientists name the animals to remind themselves that the whales are not interchangeable. Gero’s children are learning this, too. “My kids know all of the animals by name,” he says. “I jokingly call them my ‘human family’, as opposed to the time I spend with my whale families.”
Gero, a whale biologist at Carleton University in Ottawa, Canada, has spent 20 years trying to understand how whales communicate. In that time, he has learnt that whales make specific sounds that identify them as members of a family group, and that sperm whales (Physeter macrocephalus) in different regions of the ocean have dialects, just as people from various parts of the world might speak English differently.
The chirps and whistles of dolphins, the rumblings of elephants and the trills and tweets of birdsong all have patterns and structure that convey information to other members of the animal’s species. For a person, the subtleties of these patterns can be difficult to identify and understand, but finding patterns is a task at which artificial intelligence (AI) excels. The hope of a growing number of biologists and computer scientists is that applying AI to animal sounds might reveal what these creatures are saying to each other.
Over the past year, AI-assisted studies have found that both African savannah elephants (Loxodonta africana)1 and common marmoset monkeys (Callithrix jacchus)2 bestow names on their companions. Researchers are also using machine-learning tools to map the vocalizations of crows. As the capability of these computer models improves, they might be able to shed light on how animals communicate, and enable scientists to investigate animals’ self-awareness — and perhaps spur people to make greater efforts to protect threatened species.
That’s not to say that there will be an animal version of Google Translate any time soon. The great progress seen in AI systems’ understanding, translation and generation of human language is mainly due to the vast quantity of examples available, the meanings of which are already known, says David Gruber, a marine microbiologist who founded the scientific and conservation project the Cetacean Translation Initiative (CETI). “I think it’s a big assumption to assume that we could take all that technology and just turn it towards another species and have it somehow learn and start translating,” he says.
Cracking the code
The CETI project focuses on sperm whales and has become a sponsor of Gero’s research. But even before CETI, Gero had spent thousands of hours in the waters of the Caribbean leading the Dominica Sperm Whale Project, in which he and his colleagues collected data on more than 30 whale families that live near the island.
An underwater photo showing a gathering of male and female sperm whales in Dominica.
(Physeter macrocephalus)Credit: Amanda Cotton / Project CETI
The whales spend most of their time seeking food deep in the ocean, as far as 2 kilometres below the surface. Sunlight doesn’t penetrate to those depths, so the whales make clicking sounds to find their prey by echolocation. They also use sequences of clicks called codas, which are each 3–40 clicks long, to stay in touch with other whales. At the surface, where they don’t need echolocation, whales use the codas during socialization.
Gero and other researchers have learnt that whales group together in what they have dubbed clans, each with a distinctive diet, social behaviour and use of their habitat. These clans, which can contain thousands of individuals in families led by female whales, communicate in their own dialects, which are distinguishable from those of others by the tempos of their codas. For instance, two clans will use the same pattern of five clicks in succession, but with different tempos and pauses. Those dialects, Gero says, mark “cultural boundaries” between clans.
To understand the rhythm and tempo of the codas, the team manually created graphic representations of whale sound recordings, known as spectrograms. These provide a way to visualize sound, depicting characteristics such as volume and frequency. For people, they can be used to identify individual units of speech known as phonemes.
The process was time-consuming; every minute of recording took a team member roughly 10 minutes to separate out all the individual clicks. Turning that task over to a machine-learning algorithm sped up the work vastly, Gero says, and also helped to separate which sound came from which animal.
But AI also allowed the researchers to go further. Manually, they had essentially been cataloguing individual words, but AI allowed them to look at the codas in the equivalent of whale sentences and even entire conversations. New structures began to emerge. “Machine learning is really good at seeing patterns that are hard to pick up through standard statistical approaches,” Gero says.
The work3 uncovered fine modulations of the intervals between clicks, which the scientists labelled ‘rubato’, borrowing a musical term for slight changes in tempo that make a piece more expressive. They also discovered the occasional addition of a click, which they named ‘ornamentation’ after the musical practice of adding notes atop the melody.
The significance of these features is not yet clear. But by using rhythm, tempo, rubato and ornamentation in different combinations, the whales can produce a huge set of different codas. The researchers collected a data set of 8,719 codas and uncovered what they call a sperm whale phonetic alphabet, which they think the whales might use as building blocks for sharing complex information.
As AI uncovers these features of whale vocalizations, researchers are asking what meaning they might carry. Does rubato increase before a dive, or decrease when a mother communicates with her calf, for instance? “If you don’t know rubato exists, then you can’t start asking ‘when is rubato important?’” Gero says. His team is analysing these features.
Call me by my name
Sperm whales aren’t the only creatures that use specific vocalizations to identify themselves. Behavioural ecologist Mickey Pardo, then at Colorado State University in Fort Collins, and his colleagues used machine learning to discover that wild African elephants have what seem to be names. That is, they address other elephants with vocalizations specific to the individual1. (Pardo now works from Colorado for Cornell University in New York.)
An elephant family standing together
Credit: Mickey Pardo
Researchers already knew that the animals make low rumbling sounds that differ depending on whether they’re out of sight of one another or in close contact, as well as whether a mother is interacting with her calf. Pardo and his colleagues saw that elephants would react to some calls while ignoring others. To see whether those calls that received a response were unique, they trained a machine-learning model with vocalizations that researchers had labelled as evoking a reaction. The algorithm learnt the acoustic features of those calls, and was then tasked with spotting those features in new calls and predicting the intended recipient. The computer correctly matched the calls to the recipient 27.5% of the time, “which might not seem like that much, but you have to remember that we wouldn’t expect elephants to use names in every call,” Pardo says. By contrast, a model trained with random features was right only 8% of the time. The team verified that these calls were meaningful to the elephants by playing back recordings of them and checking which animal responded.
David Omer, a neurologist at Hebrew University of Jerusalem, Israel, did something similar with marmoset monkeys. He and his team trained a computer on the calls of the marmosets, and found that members of the same family used calls with similar acoustic features to label other marmosets2. Pardo would like to use the same techniques to see whether he can decode any other elephant vocabulary, such as terms for locations. Elephants make certain calls when they’re trying to get their cohort moving. If some of those sounds identify movement towards a particular place, researchers might be able to identify those with AI. To verify their findings, the researchers could then play back that call and watch where the elephants go.
In a separate study, Pardo and his co-authors recorded calls from elephants in two populations in Kenya4. The researchers then used machine learning to show that there were clear vocal differences between the two populations, as well as subtle differences between elephants in different social groups within the two populations.
That could be important information for conservationists who might be trying to keep endangered populations healthy by introducing elephants into an existing group, Pardo says. If the animals can’t understand each other, that could cause problems for the newcomer. “Understanding whether different populations of elephants could communicate with each other actually has some important practical implications,” he says.
Elephant specialists know that calls contain information about the individual making the sound, including their sex, age and physiological condition. If scientists could learn to tease out that information, they could then use passive acoustic monitoring — microphones placed around an area — to learn about a particular set of elephants. “We could potentially use that to figure out the sex ratio and age structure of an elephant population in an environment where it’s really hard to observe the elephants directly, like in a forested environment,” Pardo says. And that, in turn, would help conservationists to work out how a particular group is doing, such as whether its numbers are growing or declining.
AI can be an important tool for this type of research, says Caroline Casey, an animal behavioural ecologist at the University of California, Santa Cruz. She worries, however, that both the public and some scientists might put too much faith in the technology’s abilities. “I just don’t think it’s the magic wand that’s going to be able to illuminate the mysteries of animal communication in the way that maybe the media has projected,” she says.
Casey spent five years on her PhD thesis, in which she demonstrated that elephant seals (Mirounga spp.) give themselves names5. During periods of fasting of around one month, the seals conserve energy and males avoid fighting each other for dominance. Instead, they make vocalizations to identify themselves as the winners of previous fights to discourage other seals from attacking. Casey learnt this by studying spectrograms, observing the animals’ behaviour and playing back calls to the animals. She is clear that AI will not remove the need for this kind of high-quality fieldwork. However, she worries that this might be missed by funders drawn in by the allure of AI.
Casey also thinks that the value of human intuition might be being overlooked. Using an AI-based classifier to interpret animals’ calls could diminish human bias in the research — a good thing, Casey acknowledges. But at the same time, she thinks that machines’ lack of understanding of the world might hinder their ability to make sense of the patterns that they uncover. “The human mind is able to integrate our understanding of our own world and the way that we operate, and use that to actually aid in the interpretation of animal behaviour,” she says. “I think it’s an advantage.”
Starting in the trees
Much of the excitement about AI over the past decade has come from the achievements of neural networks — systems built on an analogy of how the human brain processes information through collections of neurons. Deep learning, in which data pass through many layers of a neural network, was what allowed the Deep Blue computer to win a game of chess against a reigning chess champion and led to the creation of the chatbot ChatGPT. The sperm whale, elephant and marmoset studies, however, used earlier forms of AI known as decision trees and random forests.
A photo of Mickey Pardo smiling at the camera while recording elephant vocalizations in Kenya
Mickey Pardo listens to wild African elephants communicate.Credit: Mickey Pardo
A decision tree is a classification algorithm that looks like a flow chart. It might ask, for example, whether the sound it has been given has a frequency above a certain value. If yes, it might then ask whether the call lasts for a certain length of time, and so on, until it has decided whether the call matches the acoustic variables it was trained to look for using human-labelled data sets. A random forest is a collection of many decision trees, each constructed from a randomly chosen subset of the data.
Kurt Fristrup, an evolutionary biologist at Colorado State University who wrote the random-forest algorithm for the elephant project, says that tree-based algorithms have several advantages for this kind of work. For one, they can work with less information than is required to train a neural network, and even thousands of hours’ of recordings of animal calls is still a relatively small data set. Furthermore, because of the way that tree-based algorithms break down the variables, they’re not likely to be thrown off by mislabelled or unlabelled data.
The random forest also provides a way to verify that similar calls match: different calls that show the same features should each end up in the same ‘leaf’ of an individual tree. “Since there were on the order of a thousand of these trees, you get a fairly fine-grained measure of how similar two calls are by how often they landed in the same leaf,” Fristrup says.
It is also easier to see how a random-forest algorithm came to a particular conclusion than it is with deep learning, which can produce answers that leave scientists scratching their heads about how the model reached its decision. “Deep-learning models make it possible or even easy to get all kinds of results that we can’t really get any other way,” Fristrup says. But if scientists don’t understand the reasoning behind it, they might not learn “what we would have learnt had we got into it by the older, less efficient, and less computationally intense path” of a random forest, he says.
Despite this, the ability of a neural network to generalize from a relatively small, labelled data set and discover patterns by examining large amounts of unlabelled data is appealing to many researchers.
Vittorio Baglione is seen climbing a tree to reach a crows nest
Behavioural ecologistCredit: Daniela Canestrari /University of Leon
Machine-learning specialist Olivier Pietquin is the AI research director at the Earth Species Project, an international team headquartered in Berkeley, California, that is using AI to decode the communications of animal species. He wants to take advantage of neural networks’ ability to generalize from one data set to another by training models using not only a large range of sounds from different animals, but also other acoustic data, including human speech and music.
The hope is that the computer might derive some basic underlying features of sound before building on that understanding to recognize features in animal vocalizations specifically. This is the same way in which an image-recognition algorithm trained on pictures of human faces learns some basic characteristics of pixels that describe first an oval and then an eye. The algorithm can then take those basics and recognize the face of a cat, even if human faces make up most of its training data.
“We could imagine using speech data and hope that it will transfer to any other animal that has a vocal tract and vocal cords,” Pietquin says. The whistle made by a flute, for example, might be similar enough to a bird whistle that the computer could make inferences from it.
A model trained in this way could be useful for identifying what sounds convey information and which ones are just noise. To work out what the calls might mean, however, still requires a person to observe the animal’s behaviour and add labels to what the computer has identified. Identifying speech, which is what researchers are currently trying to achieve, is just a first step towards comprehending it. “Understanding is really a tough step,” Pietquin says.
Researchers at the Earth Species Project have already created one neural network, called Voxaboxen, that they are applying to the study of crow communication.
A population of carrion crows (Corvus corone) in northern Spain, unlike their counterparts elsewhere in Europe, share the responsibility of caring for their young. A group of crows will take turns defending a nest, cleaning it and caring for the chicks. “We think they must coordinate with vocal communication to do these tasks. And this was the reason why we started studying communication in crows,” says Daniela Canestrari, a behavioural ecologist at the University of Leon in Spain.
She and fellow behavioural ecologist Vittorio Baglione attach tags to the crows’ tail feathers. The device has a tiny microphone, along with an accelerometer and magnetometer to measure a bird’s movement along with its calls. The tag collects about six days’ worth of data, then eventually falls to the ground and emits a signal that allows researchers to retrieve it.
One of the first things the researchers discovered with their tags was that, as well as the loud ‘caws’ that can be picked up by microphones at a distance, the crows also make softer sounds that are only audible up close. That could prove to be a rich source of information, but it also increases the volume of data that the researchers have to deal with. “We couldn’t really process all this information without artificial intelligence,” Canestrari says.
Voxaboxen was trained using annotated databases of sounds — three from a variety of bird species and one from meerkats (Suricata suricatta). The neural network’s task is to tease apart all the sounds in the recordings, to find which sounds over the six days came from crows, and whether they came from the crow that was tagged or from a different one. Once the calls have been detected, the next step is to try to classify them into categories; this process is roughly equivalent to making lists of words. “It’s a very difficult task because the differences are sometimes very subtle,” Baglione says.
To work out what the calls mean, the researchers use AI to match up accelerometer data with video recordings of the same birds, and look for correlations between the observed behaviour and the sounds.
Paging Doctor Dolittle
Researchers are cautious about suggesting that AI models will eventually give us the ability to talk to the animals. Pardo says that his main goal is not so much to be able to talk to wildlife and pets, but rather to learn something about their minds and how they perceive themselves and the world. The fact that some animals seem to have names, for instance, implies that they’re capable of conceiving of other individuals as entities and coming up with labels, which he says suggests that they have a sophisticated level of abstract thinking.
Project CETI's Core Whale Listening Station is seen in the water
Project CETI’s core whale-listening station in Dominica.Credit: Project CETI
It’s still an open question whether animals are even capable of more than a rudimentary level of communication, and Pardo says that there’s no agreed definition of what constitutes a language. “To call something language, I would want it to be a system that can basically be used to communicate about almost any thought, including abstract concepts,” Pardo says. “I don’t think we have evidence for that in any non-human species.” If scientists could show that animals do have that language, then they could try to work out how to communicate with them.
Even if direct communication remains out of reach, many scientists involved in this research see the improvement of conservation efforts as a major goal. Showing that animals have minds of their own can increase empathy for them. Gruber points to the work of US biologist Roger Payne in the 1960s, whose description of the complexity of humpback whale (Megaptera novaeangliae) song galvanized the movement to ban whale hunting.
Saving whales is an important goal for Gero. During his research off the coast of Dominica, he watched as a social unit of sperm whales he’d named ‘the group of seven’ shrank to only three as their companions were struck by ships or tangled in nets. If humans could communicate with whales, perhaps they could inform the giant mammals that they were in a boat lane and at risk of a collision. Or, as Gero would prefer, he could ask the whales why they need to be in a particular area, and route the boats away from it.
“I don’t want to make a whale alarm for all the bad things that humans are doing because we’re bad neighbours,” he says. “I would rather learn from the animals about how to be a good neighbour, and then do that.”
Infant mortality among sperm whales around Dominica has become so bad — around one-third of calves die, sometimes because they lose their parents — that researchers there have delayed naming the babies until they’re at least two or three years old. But they do continue to name the whales, as a reminder of the roles of each individual in its group, and the part each group plays in the overall survival of the species.
A diver swims towards a sperm whale trapped in fishing net.
Credit: Dario Romeo/Getty
Conservationists cannot, for instance, relocate whales 4,000 kilometres from the Azores to Dominica to replenish the Caribbean population and expect them to handle their new environment, Gero says. “Our science addresses these animals under the assumption that they’re uniquely important to the network of life that they are in,” he says. “So calling one ‘Fingers’ is a short form of saying we recognize that she’s different from Pinchy, and if Fingers dies, we can’t just replace her with Pinchy.”
Like Gero, Pardo is more interested in what animals have to say to us than what we might say to them. If he could talk to the elephants, he says, he would want to ask them how they feel about the way that humans treat them. “If it were possible for humans to hear from other animals in their own words, ‘Hey, stop fucking killing us’, maybe people would actually do that.”