Categories
Applied AI Artificial Intelligence Digital Transformation Neural Networks

Meta’s new learning algorithm can teach AI to multi-task

If you can recognize a dog by sight, then you can probably recognize a dog when it is described to you in words. Not so for today’s artificial intelligence. Deep neural networks have become very good at identifying objects in photos and conversing in natural language, but not at the same time: there are AI models that excel at one or the other, but not both. 

Part of the problem is that these models learn different skills using different techniques. This is a major obstacle for the development of more general-purpose AI, machines that can multi-task and adapt. It also means that advances in deep learning for one skill often do not transfer to others.

A team at Meta AI (previously Facebook AI Research) wants to change that. The researchers have developed a single algorithm that can be used to train a neural network to recognize images, text, or speech. The algorithm, called Data2vec, not only unifies the learning process but performs at least as well as existing techniques in all three skills. “We hope it will change the way people think about doing this type of work,” says Michael Auli, a researcher at Meta AI.

The research builds on an approach known as self-supervised learning, in which neural networks learn to spot patterns in data sets by themselves, without being guided by labeled examples. This is how large language models like GPT-3 learn from vast bodies of unlabeled text scraped from the internet, and it has driven many of the recent advances in deep learning.

Auli and his colleagues at Meta AI had been working on self-supervised learning for speech recognition. But when they looked at what other researchers were doing with self-supervised learning for images and text, they realized that they were all using different techniques to chase the same goals.

Data2vec uses two neural networks, a student and a teacher. First, the teacher network is trained on images, text, or speech in the usual way, learning an internal representation of this data that allows it to predict what it is seeing when shown new examples. When it is shown a photo of a dog, it recognizes it as a dog.

The twist is that the student network is then trained to predict the internal representations of the teacher. In other words, it is trained not to guess that it is looking at a photo of a dog when shown a dog, but to guess what the teacher sees when shown that image.

Because the student does not try to guess the actual image or sentence but, rather, the teacher’s representation of that image or sentence, the algorithm does not need to be tailored to a particular type of input.

Data2vec is part of a big trend in AI toward models that can learn to understand the world in more than one way. “It’s a clever idea,” says Ani Kembhavi at the Allen Institute for AI in Seattle, who works on vision and language. “It’s a promising advance when it comes to generalized systems for learning.”

An important caveat is that although the same learning algorithm can be used for different skills, it can only learn one skill at a time. Once it has learned to recognize images, it must start from scratch to learn to recognize speech. Giving an AI multiple skills at once is hard, but that’s something the Meta AI team wants to look at next.  

The researchers were surprised to find that their approach actually performed better than existing techniques at recognizing images and speech, and performed as well as leading language models on text understanding.

Mark Zuckerberg is already dreaming up potential metaverse applications. “This will all eventually get built into AR glasses with an AI assistant,” he posted to Facebook today. “It could help you cook dinner, noticing if you miss an ingredient, prompting you to turn down the heat, or more complex tasks.”

For Auli, the main takeaway is that researchers should step out of their silos. “Hey, you don’t need to focus on one thing,” he says. “If you have a good idea, it might actually help across the board.”

Categories
Artificial Intelligence Deep Learning Machine Learning Neural Networks NLP

AI Analysis of Bird Songs Helping Scientists Study Bird Populations and Movements 

By AI Trends Staff  

A study of bird songs conducted in the Sierra Nevada mountain range in California generated a million hours of audio, which AI researchers are working to decode to gain insights into how birds responded to wildfires in the region, and to learn which measures helped the birds to rebound more quickly. 

Scientists can also use the soundscape to help track shifts in migration timing and population ranges, according to a recent account in Scientific American. More audio data is coming in from other research as well, with sound-based projects to count insects and study the effects of light and noise pollution on bird communities underway.  

Connor Wood, postdoctoral researcher, Cornell University

“Audio data is a real treasure trove because it contains vast amounts of information,” stated ecologist Connor Wood, a Cornell University postdoctoral researcher, who is leading the Sierra Nevada project. “We just need to think creatively about how to share and access that information.” AI is helping, with the latest generation of machine learning AI systems are able to identify animal species from their calls, and can process thousands of hours of data in less than a day.   

Laurel Symes, assistant director of the Cornell Lab of Ornithology’s Center for Conservation Bioacoustics, is studying acoustic communication in animals, including crickets, frogs, bats, and birds. She has compiled many months of recordings of katydids (famously vocal long-horned grasshoppers that are an essential part of the food web) in the rain forests of central Panama. Patterns of breeding activity and seasonal population variation are hidden in this audio, but analyzing it is enormously time-consuming.  

Laurel Symes, assistant director, of the Cornell Lab of Ornithology’s Center for Conservation Bioacoustics

“Machine learning has been the big game changer for us,” Symes stated to Scientific American.  

It took Symes and three of her colleagues 600 hours of work to classify various katydid species from just 10 recorded hours of sound. But a machine-learning algorithm her team is developing, called KatydID, performed the same task while its human creators “went out for a beer,” Symes stated.  

BirdNET, a popular avian-sound-recognition system available today, will be used by Wood’s team to analyze the Sierra Nevada recordings. BirdNET was built by Stefan Kahl, a machine learning scientist at Cornell’s Center for Conservation Bioacoustics and Chemnitz University of Technology in Germany. Other researchers are using BirdNET to document the effects of light and noise pollution on bird songs at dawn in France’s Brière Regional Natural Park.  

Avian bird calls are complex and varied. “You need much more than just signatures to identify the species,” Kahl stated. Many birds have more than one song, and often have regional “dialects”— a white-crowned sparrow from Washington State can sound very different from its Californian cousin — machine-learning systems can pick out the differences. “Let’s say there’s an as yet unreleased Beatles song that is put out today. You’ve never heard the melody or the lyrics before, but you know it’s a Beatles song because that’s what they sound like,” Kahl stated. “That’s what these programs learn to do, too.”  

BirdVox Combines Study of Bird Songs and Music  

Music recognition research is now crossing over into bird song research, with BirdVox, a collaboration between the Cornell Lab of Ornithology and NYU’s Music and Audio Research Laboratory. BirdVox aims to investigate machine listening techniques for the automatic detection and classification of free-flying bird species from their vocalizations, according to a blog post at NYU.  

The researchers behind BirdVox hope to deploy a network of acoustic sensing devices for real-time monitoring of seasonal bird migratory patterns, in particular, the determination of the precise timing of passage for each species.  

Current bird migration monitoring tools rely on information from weather surveillance radar, which provides insight into the density, direction, and speed of bird movements, but not into the species migrating. Crowdsourced human observations are made almost exclusively during daytime hours; they are of limited use for studying nocturnal migratory flights, the researchers indicated.   

Automatic bioacoustic analysis is seen as a complement to these methods, that is scalable and able to produce species-specific information. Such techniques have wide-ranging implications in the field of ecology for understanding biodiversity and monitoring migrating species in areas with buildings, planes, communication towers and wind turbines, the researchers observed.  

Duke University Researchers Using Drones to Monitor Seabird Colonies  

Elsewhere in bird research, a team from Duke University and the Wildlife Conservation Society (WCS) is using drones and a deep learning algorithm to monitor large colonies of seabirds. The team is analyzing more than 10,000 drone images of mixed colonies of seabirds in the Falkland Islands off Argentina’s coast, according to a press release from Duke University.  

The Falklands, also known as the Malvinas, are home to the world’s largest colonies of black-browed albatrosses (Thalassarche melanophris) and second-largest colonies of southern rockhopper penguins (Eudyptes c. chrysocome). Hundreds of thousands of birds breed on the islands in densely interspersed groups. 

The deep-learning algorithm correctly identified and counted the albatrosses with 97% accuracy and the penguins with 87% accuracy, the team reported. Overall, the automated counts were within five percent of human counts about 90% of the time. 

“Using drone surveys and deep learning gives us an alternative that is remarkably accurate, less disruptive, and significantly easier. One person, or a small team, can do it, and the equipment you need to do it isn’t all that costly or complicated,” stated Madeline C. Hayes, a remote sensing analyst at the Duke University Marine Lab, who led the study. 

Before  this new method was available, to monitor the colonies located on two rocky, uninhabited outer islands, teams of scientists would count the number of each species they could observe on a portion of the island. They would extrapolate those numbers to get a population estimate for the whole colony. Counts often needed to be repeated for better accuracy, a laborious process, with the presence of scientists potentially disruptive to the breeding and parenting behavior of the birds.   

WCS scientists used an off-the-shelf consumer drone to collect more than 10,000 individual photos. Hayes converted into a large-scale composite visual using image-processing software. She then analyzed the image using a convolutional neural network (CNN), a type of AI that employs a deep-learning algorithm to analyze an image and differentiate and count the objects it “sees”two different species of birds in this case, penguins and albatrosses. The data was used to create comprehensive estimates of the total number of birds found in colonies. 

 

“A CNN is loosely modeled on the human neural network, in that it learns from experience,” stated David W. Johnston, director of the Duke Marine Robotics and Remote Sensing Lab. “You train the computer to pick up on different visual patterns, like those made by black-browed albatrosses or southern rockhopper penguins in sample images, and over time it learns how to identify the objects forming those patterns in other images such as our composite photo.” 

Johnston, who is also associate professor of the practice of marine conservation ecology at Duke’s Nicholas School of the Environment, said the emerging drone- and CNN-enabled approach is widely applicable “and greatly increases our ability to monitor the size and health of seabird colonies worldwide, and the health of the marine ecosystems they inhabit.” 

Read the source articles and information in Scientific American, on a blog post at NYU and in a press release from Duke University.