Categories
Orion News

Cryptius Corporation Merges with Orion Innovations

We are incredibly pleased to announce that Orion Innovations has acquired the Cryptius Corporation in an all stock deal. The deal comes on the heels of strong performance by Cryptius on a variety of Orion-led projects within the commercial and Defense spaces. Both Orion and Cryptius specialize in building end-to-end Artificial Intelligence, Machine Learning, and Data Science applications in a variety of verticals. The deal was agreed to in principle by Marc Asselin, CEO of Orion Innovations, and Stephen Plainte, CEO and Data Scientist at Cryptius, and will close in the coming weeks.

“There is an incredible synergy that exists between our two companies. We realized very early on that it just made more sense to tackle large, complex projects together to deliver even more value for our commercial and government clients” said Mr. Plainte. He continued, “Additionally, Cryptius has a number of customers that would benefit greatly from this synergy, and we couldn’t be more excited to introduce them to the new team.”

Mr Asselin shared a similar take on the merger, saying “From the beginning the Cryptius team has been an incredible asset and resource to us. Their corporate culture matches incredibly well with ours, and their technical skills really complement our own. We are really excited and blessed at the opportunity to integrate the Cryptius team with Orion.”

Orion was founded in 2008 by Mr. Asselin after decades of success as CTO for companies in many different verticals. Mr. Plainte founded Cryptius in May of 2021 with the goal of providing jobs to highly talented technologists that come from non-traditional backgrounds, and to serve the traditionally underserved SMB segment with advanced technology from the AI industry.

The new Orion executive team is rounded out by Mr. Asselin as CEO, Mr. Plainte as CTO, Maria Morales as COO, Patrick Mills as Chief Compliance Officer, John Riley III as VP Government Services, and Mike Phillips as VP of Commercial Services.

For press inquiries:

hi@goorion.com

561-900-3712

Categories
Artificial Intelligence

What the history of AI tells us about its future

In May 11, 1997, Garry Kasparov fidgeted in his plush leather chair in the Equitable Center in Manhattan, anxiously running his hands through his hair. It was the final game of his match against IBM’s Deep Blue supercomputer—a crucial tiebreaker in the showdown between human and silicon—and things were not going well. Aquiver with self-recrimination after making a deep blunder early in the game, Kasparov was boxed into a corner. 

A high-level chess game usually takes at least four hours, but Kasparov realized he was doomed before an hour was up. He announced he was resigning—and leaned over the chessboard to stiffly shake the hand of Joseph Hoane, an IBM engineer who helped develop Deep Blue and had been moving the computer’s pieces around the board.

Then Kasparov lurched out of his chair to walk toward the audience. He shrugged haplessly. At its finest moment, he later said, the machine “played like a god.”

For anyone interested in artificial intelligence, the grand master’s defeat rang like a bell. Newsweek called the match “The Brain’s Last Stand”; another headline dubbed Kasparov “the defender of humanity.” If AI could beat the world’s sharpest chess mind, it seemed that computers would soon trounce humans at everything—with IBM leading the way.

That isn’t what happened, of course. Indeed, when we look back now, 25 years later, we can see that Deep Blue’s victory wasn’t so much a triumph of AI but a kind of death knell. It was a high-water mark for old-school computer intelligence, the laborious handcrafting of endless lines of code, which would soon be eclipsed by a rival form of AI: the neural net­—in particular, the technique known as “deep learning.” For all the weight it threw around, Deep Blue was the lumbering dinosaur about to be killed by an asteroid; neural nets were the little mammals that would survive and transform the planet. Yet even today, deep into a world chock-full of everyday AI, computer scientists are still arguing whether machines will ever truly “think.” And when it comes to answering that question, Deep Blue may get the last laugh.

When IBM began work to create Deep Blue in 1989, AI was in a funk. The field had been through multiple roller-coaster cycles of giddy hype and humiliating collapse. The pioneers of the ’50s had claimed that AI would soon see huge advances; mathematician Claude Shannon predicted that “within a matter of ten or fifteen years, something will emerge from the laboratories which is not too far from the robot of science fiction.” This didn’t happen. And each time inventors failed to deliver, investors felt burned and stopped funding new projects, creating an “AI winter” in the ’70s and again in the ’80s.

The reason they failed—we now know—is that AI creators were trying to handle the messiness of everyday life using pure logic. That’s how they imagined humans did it. And so engineers would patiently write out a rule for every decision their AI needed to make.

The problem is, the real world is far too fuzzy and nuanced to be managed this way. Engineers carefully crafted their clockwork masterpieces—or “expert systems,” as they were called—and they’d work reasonably well until reality threw them a curveball. A credit card company, say, might make a system to automatically approve credit applications, only to discover they’d issued cards to dogs or 13-year-olds. The programmers never imagined that minors or pets would apply for a card, so they’d never written rules to accommodate those edge cases.  Such systems couldn’t learn a new rule on their own.

To support MIT Technology Review’s journalism, please consider becoming a subscriber.

AI built via handcrafted rules was “brittle”: when it encountered a weird situation, it broke. By the early ’90s, troubles with expert systems had brought on another AI winter.

“A lot of the conversation around AI was like, ‘Come on. This is just hype,’” says Oren Etzioni, CEO of the Allen Institute for AI in Seattle, who back then was a young professor of computer science beginning a career in AI.

In that landscape of cynicism, Deep Blue arrived like a weirdly ambitious moonshot.

The project grew out of work on Deep Thought, a chess-playing computer built at Carnegie Mellon by Murray Campbell, Feng-hsiung Hsu, and others. Deep Thought was awfully good; in 1988, it became the first chess AI to beat a grand master, Bent Larsen. The Carnegie Mellon team had figured out better algorithms for assessing chess moves, and they’d also created custom hardware that speedily crunched through them. (The name “Deep Thought” came from the laughably delphic AI in The Hitchhiker’s Guide to the Galaxy—which, when asked the meaning of life, arrived at the answer “42.”)

IBM got wind of Deep Thought and decided it would mount a “grand challenge,” building a computer so good it could beat any human. In 1989 it hired Hsu and Campbell, and tasked them with besting the world’s top grand master. Chess had long been, in AI circles, symbolically potent—two opponents facing each other on the astral plane of pure thought. It’d certainly generate headlines if they could trounce Kasparov.

To build Deep Blue, Campbell and his team had to craft new chips for calculating chess positions even more rapidly, and hire grand masters to help improve algorithms for assessing the next moves. Efficiency mattered: there are more possible chess games than atoms in the universe, and even a supercomputer couldn’t ponder all of them in a reasonable amount of time. To play chess, Deep Blue would peer a move ahead, calculate possible moves from there, “prune” ones that seemed unpromising, go deeper along the promising paths, and repeat the process several times. 

“We thought it would take five years—it actually took a little more than six,” Campbell says. By 1996, IBM decided it was finally ready to face Kasparov, and it set a match for February. Campbell and his team were still frantically rushing to finish Deep Blue: “The system had only been working for a few weeks before we actually got on the stage,” he says. 

It showed. Although Deep Blue won one game, Kasparov won three and took the match. IBM asked for a rematch, and Campbell’s team spent the next year building even faster hardware. By the time they’d completed their improvements, Deep Blue was made of 30 PowerPC processors and 480 custom chess chips; they’d also hired more grand masters—four or five at any given point in time—to help craft better algorithms for parsing chess positions. When Kasparov and Deep Blue met again, in May 1997, the computer was twice as speedy, assessing 200 million chess moves per second. 

Even so, IBM still wasn’t confident of victory, Campbell remembers: “We expected a draw.”

The reality was considerably more dramatic. Kasparov dominated in the first game. But in its 36th move in the second game, Deep Blue did something Kasparov did not expect. 

He was accustomed to the way computers traditionally played chess, a style born from machines’ sheer brute-force abilities. They were better than humans at short-term tactics; Deep Blue could easily deduce the best choice a few moves out.

But what computers were bad at, traditionally, was strategy—the ability to ponder the shape of a game many, many moves in the future. That’s where humans still had the edge. 

Or so Kasparov thought, until Deep Blue’s move in game 2 rattled him. It seemed so sophisticated that Kasparov began worrying: maybe the machine was far better than he’d thought! Convinced he had no way to win, he resigned the second game.

But he shouldn’t have. Deep Blue, it turns out, wasn’t actually that good. Kasparov had failed to spot a move that would have let the game end in a draw. He was psyching himself out: worried that the machine might be far more powerful than it really was, he had begun to see human-like reasoning where none existed. 

Knocked off his rhythm, Kasparov kept playing worse and worse. He psyched himself out over and over again. Early in the sixth, winner-takes-all game, he made a move so lousy that chess observers cried out in shock. “I was not in the mood of playing at all,” he later said at a press conference.

IBM benefited from its moonshot. In the press frenzy that followed Deep Blue’s success, the company’s market cap rose $11.4 billion in a single week. Even more significant, though, was that IBM’s triumph felt like a thaw in the long AI winter. If chess could be conquered, what was next? The public’s mind reeled.

“That,” Campbell tells me, “is what got people paying attention.”

The truth is, it wasn’t surprising that a computer beat Kasparov. Most people who’d been paying attention to AI—and to chess—expected it to happen eventually.

Chess may seem like the acme of human thought, but it’s not. Indeed, it’s a mental task that’s quite amenable to brute-force computation: the rules are clear, there’s no hidden information, and a computer doesn’t even need to keep track of what happened in previous moves. It just assesses the position of the pieces right now.

“There are very few problems out there where, as with chess, you have all the information you could possibly need to make the right decision.”

Everyone knew that once computers got fast enough, they’d overwhelm a human. It was just a question of when. By the mid-’90s, “the writing was already on the wall, in a sense,” says Demis Hassabis, head of the AI company DeepMind, part of Alphabet.

Deep Blue’s victory was the moment that showed just how limited hand-coded systems could be. IBM had spent years and millions of dollars developing a computer to play chess. But it couldn’t do anything else. 

“It didn’t lead to the breakthroughs that allowed the [Deep Blue] AI to have a huge impact on the world,” Campbell says. They didn’t really discover any principles of intelligence, because the real world doesn’t resemble chess. “There are very few problems out there where, as with chess, you have all the information you could possibly need to make the right decision,” Campbell adds. “Most of the time there are unknowns. There’s randomness.”

But even as Deep Blue was mopping the floor with Kasparov, a handful of scrappy upstarts were tinkering with a radically more promising form of AI: the neural net. 

With neural nets, the idea was not, as with expert systems, to patiently write rules for each decision an AI will make. Instead, training and reinforcement strengthen internal connections in rough emulation (as the theory goes) of how the human brain learns. 

1997: After Garry Kasparov beat Deep Blue in 1996, IBM asked the world chess champion for a rematch, which was held in New York City with an upgraded machine.

AP PHOTO / ADAM NADEL

The idea had existed since the ’50s. But training a usefully large neural net required lightning-fast computers, tons of memory, and lots of data. None of that was readily available then. Even into the ’90s, neural nets were considered a waste of time.

“Back then, most people in AI thought neural nets were just rubbish,” says Geoff Hinton, an emeritus computer science professor at the University of Toronto, and a pioneer in the field. “I was called a ‘true believer’”—not a compliment. 

But by the 2000s, the computer industry was evolving to make neural nets viable. Video-game players’ lust for ever-better graphics created a huge industry in ultrafast graphic-processing units, which turned out to be perfectly suited for neural-net math. Meanwhile, the internet was exploding, producing a torrent of pictures and text that could be used to train the systems.

By the early 2010s, these technical leaps were allowing Hinton and his crew of true believers to take neural nets to new heights. They could now create networks with many layers of neurons (which is what the “deep” in “deep learning” means). In 2012 his team handily won the annual Imagenet competition, where AIs compete to recognize elements in pictures. It stunned the world of computer science: self-learning machines were finally viable. 

Ten years into the deep-­learning revolution, neural nets and their pattern-recognizing abilities have colonized every nook of daily life. They help Gmail autocomplete your sentences, help banks detect fraud, let photo apps automatically recognize faces, and—in the case of OpenAI’s GPT-3 and DeepMind’s Gopher—write long, human-­sounding essays and summarize texts. They’re even changing how science is done; in 2020, DeepMind debuted AlphaFold2, an AI that can predict how proteins will fold—a superhuman skill that can help guide researchers to develop new drugs and treatments. 

Meanwhile Deep Blue vanished, leaving no useful inventions in its wake. Chess playing, it turns out, wasn’t a computer skill that was needed in everyday life. “What Deep Blue in the end showed was the shortcomings of trying to handcraft everything,” says DeepMind founder Hassabis.

IBM tried to remedy the situation with Watson, another specialized system, this one designed to tackle a more practical problem: getting a machine to answer questions. It used statistical analysis of massive amounts of text to achieve language comprehension that was, for its time, cutting-edge. It was more than a simple if-then system. But Watson faced unlucky timing: it was eclipsed only a few years later by the revolution in deep learning, which brought in a generation of language-crunching models far more nuanced than Watson’s statistical techniques.

Deep learning has run roughshod over old-school AI precisely because “pattern recognition is incredibly powerful,” says Daphne Koller, a former Stanford professor who founded and runs Insitro, which uses neural nets and other forms of machine learning to investigate novel drug treatments. The flexibility of neural nets—the wide variety of ways pattern recognition can be used—is the reason there hasn’t yet been another AI winter. “Machine learning has actually delivered value,” she says, which is something the “previous waves of exuberance” in AI never did.

The inverted fortunes of Deep Blue and neural nets show how bad we were, for so long, at judging what’s hard—and what’s valuable—in AI. 

For decades, people assumed mastering chess would be important because, well, chess is hard for humans to play at a high level. But chess turned out to be fairly easy for computers to master, because it’s so logical.

What was far harder for computers to learn was the casual, unconscious mental work that humans do—like conducting a lively conversation, piloting a car through traffic, or reading the emotional state of a friend. We do these things so effortlessly that we rarely realize how tricky they are, and how much fuzzy, grayscale judgment they require. Deep learning’s great utility has come from being able to capture small bits of this subtle, unheralded human intelligence.

Still, there’s no final victory in artificial intelligence. Deep learning may be riding high now—but it’s amassing sharp critiques, too.

“For a very long time, there was this techno-chauvinist enthusiasm that okay, AI is going to solve every problem!” says Meredith Broussard, a programmer turned journalism professor at New York University and author of Artificial Unintelligence. But as she and other critics have pointed out, deep-learning systems are often trained on biased data—and absorb those biases. The computer scientists Joy Buolamwini and Timnit Gebru discovered that three commercially available visual AI systems were terrible at analyzing the faces of darker-­skinned women. Amazon trained an AI to vet résumés, only to find it downranked women. 

Though computer scientists and many AI engineers are now aware of these bias problems, they’re not always sure how to deal with them. On top of that, neural nets are also “massive black boxes,” says Daniela Rus, a veteran of AI who currently runs MIT’s Computer Science and Artificial Intelligence Laboratory. Once a neural net is trained, its mechanics are not easily understood even by its creator. It is not clear how it comes to its conclusions—or how it will fail.

“For a very long time, there was this techno-chauvinist enthusiasm that Okay, AI is going to solve every problem!” 

It may not be a problem, Rus figures, to rely on a black box for a task that isn’t “safety critical.” But what about a higher-stakes job, like autonomous driving? “It’s actually quite remarkable that we could put so much trust and faith in them,” she says. 

This is where Deep Blue had an advantage. The old-school style of handcrafted rules may have been brittle, but it was comprehensible. The machine was complex—but it wasn’t a mystery.

Ironically, that old style of programming might stage something of a comeback as engineers and computer scientists grapple with the limits of pattern matching.  

Language generators, like OpenAI’s GPT-3 or DeepMind’s Gopher, can take a few sentences you’ve written and keep on going, writing pages and pages of plausible-­sounding prose. But despite some impressive mimicry, Gopher “still doesn’t really understand what it’s saying,” Hassabis says. “Not in a true sense.”

Similarly, visual AI can make terrible mistakes when it encounters an edge case. Self-driving cars have slammed into fire trucks parked on highways, because in all the millions of hours of video they’d been trained on, they’d never encountered that situation. Neural nets have, in their own way, a version of the “brittleness” problem. 

What AI really needs in order to move forward, as many computer scientists now suspect, is the ability to know facts about the world—and to reason about them. A self-driving car cannot rely only on pattern matching. It also has to have common sense—to know what a fire truck is, and why seeing one parked on a highway would signify danger. 

The problem is, no one knows quite how to build neural nets that can reason or use common sense. Gary Marcus, a cognitive scientist and coauthor of Rebooting AI, suspects that the future of AI will require a “hybrid” approach—neural nets to learn patterns, but guided by some old-fashioned, hand-coded logic. This would, in a sense, merge the benefits of Deep Blue with the benefits of deep learning.

Hard-core aficionados of deep learning disagree. Hinton believes neural networks should, in the long run, be perfectly capable of reasoning. After all, humans do it, “and the brain’s a neural network.” Using hand-coded logic strikes him as bonkers; it’d run into the problem of all expert systems, which is that you can never anticipate all the common sense you’d want to give to a machine. The way forward, Hinton says, is to keep innovating on neural nets—to explore new architectures and new learning algorithms that more accurately mimic how the human brain itself works.

Computer scientists are dabbling in a variety of approaches. At IBM, Deep Blue developer Campbell is working on “neuro-symbolic” AI that works a bit the way Marcus proposes. Etzioni’s lab is attempting to build common-sense modules for AI that include both trained neural nets and traditional computer logic; as yet, though, it’s early days. The future may look less like an absolute victory for either Deep Blue or neural nets, and more like a Frankensteinian approach—the two stitched together.

Given that AI is likely here to stay, how will we humans live with it? Will we ultimately be defeated, like Kasparov with Deep Blue, by AIs so much better at “thinking work” that we can’t compete?

Kasparov himself doesn’t think so. Not long after his loss to Deep Blue, he decided that fighting against an AI made no sense. The machine “thought” in a fundamentally inhuman fashion, using brute-force math. It would always have better tactical, short-term power. 

So why compete? Instead, why not collaborate? 

After the Deep Blue match, Kasparov invented “advanced chess,” where humans and silicon work together. A human plays against another human—but each also wields a laptop running chess software, to help war-game possible moves. 

When Kasparov began running advanced chess matches in 1998, he quickly discovered fascinating differences in the game. Interestingly, amateurs punched above their weight. In one human-with-laptop match in 2005, a pair of them won the top prize—beating out several grand masters. 

How could they best superior chess minds? Because the amateurs better understood how to collaborate with the machine. They knew how to rapidly explore ideas, when to accept a machine suggestion and when to ignore it. (Some leagues still hold advanced chess tournaments today.)

This, Kasparov argues, is precisely how we ought to approach the emerging world of neural nets. 

“The future,” he told me in an email, lies in “finding ways to combine human and machine intelligences to reach new heights, and to do things neither could do alone.” 

Neural nets behave differently from chess engines, of course. But many luminaries agree strongly with Kasparov’s vision of human-AI collaboration. DeepMind’s Hassabis sees AI as a way forward for science, one that will guide humans toward new breakthroughs. 

“I think we’re going to see a huge flourishing,” he says, “where we will start seeing Nobel Prize–­winning–level challenges in science being knocked down one after the other.” Koller’s firm Insitro is similarly using AI as a collaborative tool for researchers. “We’re playing a hybrid human-machine game,” she says. 

Will there come a time when we can build AI so human-like in its reasoning that humans really do have less to offer—and AI takes over all thinking? Possibly. But even these scientists, on the cutting edge, can’t predict when that will happen, if ever.

So consider this Deep Blue’s final gift, 25 years after its famous match. In his defeat, Kasparov spied the real endgame for AI and humans. “We will increasingly become managers of algorithms,” he told me, “and use them to boost our creative output—our adventuresome souls.”

Clive Thompson is a science and technology journalist based in New York City and author of Coders: The Making of a New Tribe and the Remaking of the World.

Categories
Artificial Intelligence OpenAI

DeepMind says its new language model can beat others 25 times its size

In the two years since OpenAI released its language model GPT-3, most big-name AI labs have developed language mimics of their own. Google, Facebook, and Microsoft—as well as a handful of Chinese firms—have all built AIs that can generate convincing text, chat with humans, answer questions, and more. 

Known as large language models because of the massive size of the neural networks underpinning them, they have become a dominant trend in AI, showcasing both its strengths—the remarkable ability of machines to use language—and its weaknesses, particularly AI’s inherent biases and the unsustainable amount of computing power it can consume.

Until now, DeepMind has been conspicuous by its absence. But this week the UK-based company, which has been behind some of the most impressive achievements in AI, including AlphaZero and AlphaFold, is entering the discussion with three large studies on language models. DeepMind’s main result is an AI with a twist: it’s enhanced with an external memory in the form of a vast database containing passages of text, which it uses as a kind of cheat sheet when generating new sentences.

Called RETRO (for “Retrieval-Enhanced Transformer”), the AI matches the performance of neural networks 25 times its size, cutting the time and cost needed to train very large models. The researchers also claim that the database makes it easier to analyze what the AI has learned, which could help with filtering out bias and toxic language.  

“Being able to look things up on the fly instead of having to memorize everything can often be useful, in the same way as it is for humans,” says Jack Rae at DeepMind, who leads the firm’s research in large language models.

Language models generate text by predicting what words come next in a sentence or conversation. The larger a model, the more information about the world it can learn during training, which makes its predictions better. GPT-3 has 175 billion parameters—the values in a neural network that store data and get adjusted as the model learns. Microsoft’s language model Megatron has 530 billion parameters. But large models also take vast amounts of computing power to train, putting them out of reach of all but the richest organizations.

With RETRO, DeepMind has tried to cut the cost of training without reducing the amount the AI learns. The researchers trained the model on a vast data set of news articles, Wikipedia pages, books, and text from GitHub, an online code repository. The data set contains text in 10 languages, including English, Spanish, German, French, Russian, Chinese, Swahili, and Urdu.

RETRO’s neural network has only 7 billion parameters. But the system makes up for this with a database containing around 2 trillion passages of text. Both the database and the neural network are trained at the same time.

When RETRO generates text, it uses the database to look up and compare passages similar to the one it is writing, which makes its predictions more accurate. Outsourcing some of the neural network’s memory to the database lets RETRO do more with less.

The idea isn’t new, but this is the first time a look-up system has been developed for a large language model, and the first time the results from this approach have been shown to rival the performance of the best language AIs around.

Bigger isn’t always better

RETRO draws from two other studies released by DeepMind this week, one looking at how the size of a model affects its performance and one looking at the potential harms caused by these AIs.

To study size, DeepMind built a large language model called Gopher, with 280 billion parameters. It beat state-of-the-art models on 82% of the more than 150 common language challenges they used for testing. The researchers then pitted it against RETRO and found that the 7-billion-parameter model matched Gopher’s performance on most tasks.

The ethics study is a comprehensive survey of well-known problems inherent in large language models. These models pick up biases, misinformation, and toxic language such as hate speech from the articles and books they are trained on. As a result, they sometimes spit out harmful statements, mindlessly mirroring what they have encountered in the training text without knowing what it means. “Even a model that perfectly mimicked the data would be biased,” says Rae.

According to DeepMind, RETRO could help address this issue because it is easier to see what the AI has learned by examining the database than by studying the neural network. In theory, this could allow examples of harmful language to be filtered out or balanced with non-harmful examples. But DeepMind has not yet tested this claim. “It’s not a fully resolved problem, and work is ongoing to address these challenges,” says Laura Weidinger, a research scientist at DeepMind.

The database can also be updated without retraining the neural network. This means that new information, such as who won the US Open, can be added quickly—and out-of-date or false information removed.

Systems like RETRO are more transparent than black-box models like GPT-3, says Devendra Sachan, a PhD student at McGill University in Canada. “But this is not a guarantee that it will prevent toxicity and bias.” Sachan developed a forerunner of RETRO in a previous collaboration with DeepMind, but he was not involved in this latest work.

For Sachan, fixing the harmful behavior of language models requires thoughtful curation of the training data before training begins. Still, systems like RETRO may help: “It’s easier to adopt these guidelines when a model makes use of external data for its predictions,” he says.

DeepMind may be late to the debate. But rather than leapfrogging existing AIs, it is matching them with an alternative approach. “This is the future of large language models,” says Sachan.

Categories
Artificial Intelligence Blockchain GPT-3 NFT VR & AR

What’s ahead for AI, VR, NFTs, and more?

Every year starts with a round of predictions for the new year, most of which end up being wrong. But why fight against tradition? Here are my predictions for 2022.

The safest predictions are all around AI.

We’ll see more “AI as a service” (AIaaS) products. This trend started with the gigantic language model GPT-3. It’s so large that it really can’t be run without Azure-scale computing facilities, so Microsoft has made it available as a service, accessed via a web API. This may encourage the creation of more large-scale models; it might also drive a wedge between academic and industrial researchers. What does “reproducibility” mean if the model is so large that it’s impossible to reproduce experimental results?
Prompt engineering, a field dedicated to developing prompts for language generation systems, will become a new specialization. Prompt engineers answer questions like “What do you have to say to get a model like GPT-3 to produce the output you want?”
AI-assisted programming (for example, GitHub Copilot) has a long way to go, but it will make quick progress and soon become just another tool in the programmer’s toolbox. And it will change the way programmers think too: they’ll need to focus less on learning programming languages and syntax and more on understanding precisely the problem they have to solve.
GPT-3 clearly is not the end of the line. There are already language models bigger than GPT-3 (one in Chinese), and we’ll certainly see large models in other areas. We will also see research on smaller models that offer better performance, like Google’s RETRO.
Supply chains and business logistics will remain under stress. We’ll see new tools and platforms for dealing with supply chain and logistics issues, and they’ll likely make use of machine learning. We’ll also come to realize that, from the start, Amazon’s core competency has been logistics and supply chain management.
Just as we saw new professions and job classifications when the web appeared in the ’90s, we’ll see new professions and services appear as a result of AI—specifically, as a result of natural language processing. We don’t yet know what these new professions will look like or what new skills they’ll require. But they’ll almost certainly involve collaboration between humans and intelligent machines.
CIOs and CTOs will realize that any realistic cloud strategy is inherently a multi- or hybrid cloud strategy. Cloud adoption moves from the grassroots up, so by the time executives are discussing a “cloud strategy,” most organizations are already using two or more clouds. The important strategic question isn’t which cloud provider to pick; it’s how to use multiple providers effectively.
Biology is becoming like software. Inexpensive and fast genetic sequencing, together with computational techniques including AI, enabled Pfizer/BioNTech, Moderna, and others to develop effective mRNA vaccines for COVID-19 in astonishingly little time. In addition to creating vaccines that target new COVID variants, these technologies will enable developers to target diseases for which we don’t have vaccines, like AIDS.

Now for some slightly less safe predictions, involving the future of social media and cybersecurity.

Augmented and virtual reality aren’t new, but Mark Zuckerberg lit a fire under them by talking about the “metaverse,” changing Facebook’s name to Meta, and releasing a pair of smart glasses in collaboration with Ray-Ban. The key question is whether these companies can make AR glasses that work and don’t make you look like an alien. I don’t think they’ll succeed, but Apple is also working on VR/AR products. It’s much harder to bet against Apple’s ability to turn geeky technology into a fashion statement.
There’s also been talk from Meta, Microsoft, and others, about using virtual reality to help people who are working from home, which typically involves making meetings better. But they’re solving the wrong problem. Workers, whether at home or not, don’t want better meetings; they want fewer. If Microsoft can figure out how to use the metaverse to make meetings unnecessary, it’ll be onto something.
Will 2022 be the year that security finally gets the attention it deserves? Or will it be another year in which Russia uses the cybercrime industry to improve its foreign trade balance? Right now, things are looking better for the security industry: salaries are up, and employers are hiring. But time will tell.

And I’ll end a very unsafe prediction.

NFTs are currently all the rage, but they don’t fundamentally change anything. They really only provide a way for cryptocurrency millionaires to show off—conspicuous consumption at its most conspicuous. But they’re also programmable, and people haven’t yet taken advantage of this. Is it possible that there’s something fundamentally new on the horizon that can be built with NFTs? I haven’t seen it yet, but it could appear in 2022. And then we’ll all say, “Oh, that’s what NFTs were all about.”

Or it might not. The discussion of Web 2.0 versus Web3 misses a crucial point. Web 2.0 wasn’t about the creation of new applications; it was what was left after the dot-com bubble burst. All bubbles burst eventually. So what will be left after the cryptocurrency bubble bursts? Will there be new kinds of value, or just hot air? We don’t know, but we may find out in the coming year.

Categories
Applied AI Artificial Intelligence Digital Transformation Neural Networks

Meta’s new learning algorithm can teach AI to multi-task

If you can recognize a dog by sight, then you can probably recognize a dog when it is described to you in words. Not so for today’s artificial intelligence. Deep neural networks have become very good at identifying objects in photos and conversing in natural language, but not at the same time: there are AI models that excel at one or the other, but not both. 

Part of the problem is that these models learn different skills using different techniques. This is a major obstacle for the development of more general-purpose AI, machines that can multi-task and adapt. It also means that advances in deep learning for one skill often do not transfer to others.

A team at Meta AI (previously Facebook AI Research) wants to change that. The researchers have developed a single algorithm that can be used to train a neural network to recognize images, text, or speech. The algorithm, called Data2vec, not only unifies the learning process but performs at least as well as existing techniques in all three skills. “We hope it will change the way people think about doing this type of work,” says Michael Auli, a researcher at Meta AI.

The research builds on an approach known as self-supervised learning, in which neural networks learn to spot patterns in data sets by themselves, without being guided by labeled examples. This is how large language models like GPT-3 learn from vast bodies of unlabeled text scraped from the internet, and it has driven many of the recent advances in deep learning.

Auli and his colleagues at Meta AI had been working on self-supervised learning for speech recognition. But when they looked at what other researchers were doing with self-supervised learning for images and text, they realized that they were all using different techniques to chase the same goals.

Data2vec uses two neural networks, a student and a teacher. First, the teacher network is trained on images, text, or speech in the usual way, learning an internal representation of this data that allows it to predict what it is seeing when shown new examples. When it is shown a photo of a dog, it recognizes it as a dog.

The twist is that the student network is then trained to predict the internal representations of the teacher. In other words, it is trained not to guess that it is looking at a photo of a dog when shown a dog, but to guess what the teacher sees when shown that image.

Because the student does not try to guess the actual image or sentence but, rather, the teacher’s representation of that image or sentence, the algorithm does not need to be tailored to a particular type of input.

Data2vec is part of a big trend in AI toward models that can learn to understand the world in more than one way. “It’s a clever idea,” says Ani Kembhavi at the Allen Institute for AI in Seattle, who works on vision and language. “It’s a promising advance when it comes to generalized systems for learning.”

An important caveat is that although the same learning algorithm can be used for different skills, it can only learn one skill at a time. Once it has learned to recognize images, it must start from scratch to learn to recognize speech. Giving an AI multiple skills at once is hard, but that’s something the Meta AI team wants to look at next.  

The researchers were surprised to find that their approach actually performed better than existing techniques at recognizing images and speech, and performed as well as leading language models on text understanding.

Mark Zuckerberg is already dreaming up potential metaverse applications. “This will all eventually get built into AR glasses with an AI assistant,” he posted to Facebook today. “It could help you cook dinner, noticing if you miss an ingredient, prompting you to turn down the heat, or more complex tasks.”

For Auli, the main takeaway is that researchers should step out of their silos. “Hey, you don’t need to focus on one thing,” he says. “If you have a good idea, it might actually help across the board.”

Categories
Artificial Intelligence Deep Learning Machine Learning Neural Networks NLP

AI Analysis of Bird Songs Helping Scientists Study Bird Populations and Movements 

By AI Trends Staff  

A study of bird songs conducted in the Sierra Nevada mountain range in California generated a million hours of audio, which AI researchers are working to decode to gain insights into how birds responded to wildfires in the region, and to learn which measures helped the birds to rebound more quickly. 

Scientists can also use the soundscape to help track shifts in migration timing and population ranges, according to a recent account in Scientific American. More audio data is coming in from other research as well, with sound-based projects to count insects and study the effects of light and noise pollution on bird communities underway.  

Connor Wood, postdoctoral researcher, Cornell University

“Audio data is a real treasure trove because it contains vast amounts of information,” stated ecologist Connor Wood, a Cornell University postdoctoral researcher, who is leading the Sierra Nevada project. “We just need to think creatively about how to share and access that information.” AI is helping, with the latest generation of machine learning AI systems are able to identify animal species from their calls, and can process thousands of hours of data in less than a day.   

Laurel Symes, assistant director of the Cornell Lab of Ornithology’s Center for Conservation Bioacoustics, is studying acoustic communication in animals, including crickets, frogs, bats, and birds. She has compiled many months of recordings of katydids (famously vocal long-horned grasshoppers that are an essential part of the food web) in the rain forests of central Panama. Patterns of breeding activity and seasonal population variation are hidden in this audio, but analyzing it is enormously time-consuming.  

Laurel Symes, assistant director, of the Cornell Lab of Ornithology’s Center for Conservation Bioacoustics

“Machine learning has been the big game changer for us,” Symes stated to Scientific American.  

It took Symes and three of her colleagues 600 hours of work to classify various katydid species from just 10 recorded hours of sound. But a machine-learning algorithm her team is developing, called KatydID, performed the same task while its human creators “went out for a beer,” Symes stated.  

BirdNET, a popular avian-sound-recognition system available today, will be used by Wood’s team to analyze the Sierra Nevada recordings. BirdNET was built by Stefan Kahl, a machine learning scientist at Cornell’s Center for Conservation Bioacoustics and Chemnitz University of Technology in Germany. Other researchers are using BirdNET to document the effects of light and noise pollution on bird songs at dawn in France’s Brière Regional Natural Park.  

Avian bird calls are complex and varied. “You need much more than just signatures to identify the species,” Kahl stated. Many birds have more than one song, and often have regional “dialects”— a white-crowned sparrow from Washington State can sound very different from its Californian cousin — machine-learning systems can pick out the differences. “Let’s say there’s an as yet unreleased Beatles song that is put out today. You’ve never heard the melody or the lyrics before, but you know it’s a Beatles song because that’s what they sound like,” Kahl stated. “That’s what these programs learn to do, too.”  

BirdVox Combines Study of Bird Songs and Music  

Music recognition research is now crossing over into bird song research, with BirdVox, a collaboration between the Cornell Lab of Ornithology and NYU’s Music and Audio Research Laboratory. BirdVox aims to investigate machine listening techniques for the automatic detection and classification of free-flying bird species from their vocalizations, according to a blog post at NYU.  

The researchers behind BirdVox hope to deploy a network of acoustic sensing devices for real-time monitoring of seasonal bird migratory patterns, in particular, the determination of the precise timing of passage for each species.  

Current bird migration monitoring tools rely on information from weather surveillance radar, which provides insight into the density, direction, and speed of bird movements, but not into the species migrating. Crowdsourced human observations are made almost exclusively during daytime hours; they are of limited use for studying nocturnal migratory flights, the researchers indicated.   

Automatic bioacoustic analysis is seen as a complement to these methods, that is scalable and able to produce species-specific information. Such techniques have wide-ranging implications in the field of ecology for understanding biodiversity and monitoring migrating species in areas with buildings, planes, communication towers and wind turbines, the researchers observed.  

Duke University Researchers Using Drones to Monitor Seabird Colonies  

Elsewhere in bird research, a team from Duke University and the Wildlife Conservation Society (WCS) is using drones and a deep learning algorithm to monitor large colonies of seabirds. The team is analyzing more than 10,000 drone images of mixed colonies of seabirds in the Falkland Islands off Argentina’s coast, according to a press release from Duke University.  

The Falklands, also known as the Malvinas, are home to the world’s largest colonies of black-browed albatrosses (Thalassarche melanophris) and second-largest colonies of southern rockhopper penguins (Eudyptes c. chrysocome). Hundreds of thousands of birds breed on the islands in densely interspersed groups. 

The deep-learning algorithm correctly identified and counted the albatrosses with 97% accuracy and the penguins with 87% accuracy, the team reported. Overall, the automated counts were within five percent of human counts about 90% of the time. 

“Using drone surveys and deep learning gives us an alternative that is remarkably accurate, less disruptive, and significantly easier. One person, or a small team, can do it, and the equipment you need to do it isn’t all that costly or complicated,” stated Madeline C. Hayes, a remote sensing analyst at the Duke University Marine Lab, who led the study. 

Before  this new method was available, to monitor the colonies located on two rocky, uninhabited outer islands, teams of scientists would count the number of each species they could observe on a portion of the island. They would extrapolate those numbers to get a population estimate for the whole colony. Counts often needed to be repeated for better accuracy, a laborious process, with the presence of scientists potentially disruptive to the breeding and parenting behavior of the birds.   

WCS scientists used an off-the-shelf consumer drone to collect more than 10,000 individual photos. Hayes converted into a large-scale composite visual using image-processing software. She then analyzed the image using a convolutional neural network (CNN), a type of AI that employs a deep-learning algorithm to analyze an image and differentiate and count the objects it “sees”two different species of birds in this case, penguins and albatrosses. The data was used to create comprehensive estimates of the total number of birds found in colonies. 

 

“A CNN is loosely modeled on the human neural network, in that it learns from experience,” stated David W. Johnston, director of the Duke Marine Robotics and Remote Sensing Lab. “You train the computer to pick up on different visual patterns, like those made by black-browed albatrosses or southern rockhopper penguins in sample images, and over time it learns how to identify the objects forming those patterns in other images such as our composite photo.” 

Johnston, who is also associate professor of the practice of marine conservation ecology at Duke’s Nicholas School of the Environment, said the emerging drone- and CNN-enabled approach is widely applicable “and greatly increases our ability to monitor the size and health of seabird colonies worldwide, and the health of the marine ecosystems they inhabit.” 

Read the source articles and information in Scientific American, on a blog post at NYU and in a press release from Duke University. 

Categories
Artificial Intelligence Ethical AI NLP Sentiment Analysis

AI Could Solve Partisan Gerrymandering, if Humans Can Agree on What’s Fair 

By John P. Desmond, AI Trends Editor 

With the 2020 US Census results having been delivered to the states, now the process begins for using the population results to draw new Congressional districts. Gerrymandering, a practice intended to establish a political advantage by manipulating the boundaries of electoral districts, is expected to be practiced on a wide scale with Democrats having a slight margin of seats in the House of Representatives and Republicans seeking to close the gap in states where they hold a majority in the legislature.    

Today, more powerful redistricting software incorporating AI and machine learning is available, and it represents a double-edged sword.  

David Thornburgh, president, Committee of Seventy

The pessimistic view is that the gerrymandering software will enable legislators to gerrymander with more precision than ever before, to ensure maximum advantages. This was called “political laser surgery” by David Thornburgh, president of the Committee of Seventy, an anti-corruption organization that considers the 2010 redistricting as one of the worst in the country’s history, according to an account in the Columbia Political Review. 

Supreme Court Justice Elena Kagan issued a warning in her dissent in the Rucho v. Common Cause case, in which the court majority ruled that gerrymandering claims lie outside the jurisdiction of federal courts.  

Justice Kagan stated, “Gerrymanders will only get worse (or depending on your perspective, better) as time goes on — as data becomes ever more fine-grained and data analysis techniques continue to improve,” she wrote in her dissent. “What was possible with paper and pen — or even with Windows 95 — doesn’t hold a candle to what will become possible with developments like machine learning. And someplace along this road, ‘we the people’ become sovereign no longer.”  

The optimistic view is that the tough work can be handed over to the machines to take over, with humans further removed from the equation. A state simply needs to establish objective criteria in a bipartisan manner, then turn it over to computers. But it turns out it is difficult to arrive at criteria for what constitutes a “fair” district.  

Brian Olson of Carnegie Mellon University is working on it, with a proposal to have computers prioritize districts that are compact and equally populated, using a tool called ‘Bdistricting.’ However, the authors of the Columbia Review account reported this has not been successful in creating districts that would have competitive elections.  

One reason is the political geography of the country includes dense, urban Democratic centers surrounded by sparsely-populated rural Republican areas. Attempts to take these geographic considerations into account have added so many variables and complexities that the solution becomes impractical.  

Shruti Verma, student at Columbia’s School of Engineering and Applied Sciences, studying computer science and political science

“Technology cannot, then, be trusted to handle the process of redistricting alone. But it can play an important role in its reform,” stated the author, Shruti Verma, a student at Columbia’s School of Engineering and Applied Sciences, studying computer science and political science.   

However, more tools are becoming available to provide transparency into the redistricting process to a degree not possible in the past. “This software weakens the ability of our state lawmakers to obfuscate,” she stated. “In this way, the very developments in technology that empowered gerrymandering can now serve to hobble it.”  

Tools are available from the Princeton Gerrymandering Project and the Committee of Seventy.  

University of Illinois Researcher Urges Transparency in Redistricting 

Transparency in the process of redistricting is also emphasized by researchers Wendy Tam Cho and Bruce Cain in the September 2020 issue of Science, who suggest that AI can help in the process. Cho, who teaches at the University of Illinois at Urbana-Champaign, has worked on computational redistricting for many years. Last year, she was an expert witness in a lawsuit by the ACLU that wound up in a finding that gerrymandered districts in Ohio were unconstitutional, according to a report in TechCrunch. Bruce Cain is a professor of political science at Stanford University with expertise in democratic representation and state politics.   

In an essay explaining their work, the two stated, “The way forward is for people to work collaboratively with machines to produce results not otherwise possible. To do this, we must capitalize on the strengths and minimize the weaknesses of both artificial intelligence (AI) and human intelligence.”  

And, “Machines enhance and inform intelligent decision-making by helping us navigate the unfathomably large and complex informational landscape. Left to their own devices, humans have shown themselves to be unable to resist the temptation to chart biased paths through that terrain.”  

In an interview with TechCrunch, Cho stated that while automation has potential benefits for states in redistricting, “transparency within that process is essential for developing and maintaining public trust and minimizing the possibilities and perceptions of bias.” 

Also, while the AI models for redistricting may be complex, the public is interested mostly in the results. “The details of these models are intricate and require a fair amount of knowledge in statistics, mathematics, and computer science but also an equally deep understanding of how our political institutions and the law work,” Cho stated. “At the same time, while understanding all the details is daunting, I am not sure this level of understanding by the general public or politicians is necessary.”

Harvard, BU Researchers Recommend a Game Approach 

Researchers at Harvard University and Boston University have proposed a software tool to help with redistricting  using a game metaphor. Called Define-Combine, the tool enables each party to take a turn in shaping the districts, using sophisticated mapping algorithms to ensure the approach is fair, according to an account in Fast Company.  

Early experience shows the Define-Combine procedure resulted in the majority party having a much smaller advantage, so in the end, the process produced more moderate maps.  

Whether this is the desired outcome of the party with the advantage today remains to be seen. Gerrymandering factors heavily in politics, according to a recent account in Data Science Central. After a redistricting in 2011, Wisconsin’s district maps produced an outcome where if the Republican party receives 48% the vote in the state, they end up with 62% of the legislative seats.  

Read the source articles and information in Columbia Political Review, in Sciencein TechCrunch, in Fast Company and in Data Science Central. 

Categories
Artificial Intelligence Digital Transformation

Building architectures that can handle the world’s data

Perceiver IO, a more general version of the Perceiver architecture, can produce a wide variety of outputs from many different inputs.

Categories
Artificial Intelligence

We tested AI interview tools. Here’s what we found.

After more than a year of the covid-19 pandemic, millions of people are searching for employment in the United States. AI-powered interview software claims to help employers sift through applications to find the best people for the job. Companies specializing in this technology reported a surge in business during the pandemic.

But as the demand for these technologies increases, so do questions about their accuracy and reliability. In the latest episode of MIT Technology Review’s podcast “In Machines We Trust,” we tested software from two firms specializing in AI job interviews, MyInterview and Curious Thing. And we found variations in the predictions and job-matching scores that raise concerns about what exactly these algorithms are evaluating.

Getting to know you

MyInterview measures traits considered in the Big Five Personality Test, a psychometric evaluation often used in the hiring process. These traits include openness, conscientiousness, extroversion, agreeableness, and emotional stability. Curious Thing also measures personality-related traits, but instead of the Big Five, candidates are evaluated on other metrics, like humility and resilience.

This screenshot shows our candidate’s match score and personality analysis on MyInterview after answering all interview questions in German instead of English.

HILKE SCHELLMANN

The algorithms analyze candidates’ responses to determine personality traits. MyInterview also compiles scores indicating how closely a candidate matches the characteristics identified by hiring managers as ideal for the position.

To complete our tests, we first set up the software. We uploaded a fake job posting for an office administrator/researcher on both MyInterview and Curious Thing. Then we constructed our ideal candidate by choosing personality-related traits when prompted by the system.

On MyInterview, we selected characteristics like attention to detail and ranked them by level of importance. We also selected interview questions, which are displayed on the screen while the candidate records video responses. On Curious Thing, we selected characteristics like humility, adaptability, and resilience.

One of us, Hilke, then applied for the position and completed interviews for the role on both MyInterview and Curious Thing.

Our candidate completed a phone interview with Curious Thing. She first did a regular job interview and received a 8.5 out of 9 for English competency. In a second try, the automated interviewer asked the same questions, and she responded to each by reading the Wikipedia entry for psychometrics in German.

Yet Curious Thing awarded her a 6 out of 9 for English competency. She completed the interview again and received the same score.

A screenshot shows our candidate’s English competency score in Curious Thing’s software after she answered all questions in German.

HILKE SCHELLMANN

Our candidate turned to MyInterview and repeated the experiment. She read the same Wikipedia entry aloud in German. The algorithm not only returned a personality assessment, but it also predicted our candidate to be a 73% match for the fake job, putting her in the top half of all the applicants we had asked to apply.

MyInterview provides hiring managers with a transcript of their interviews. When we inspected our candidate’s transcript, we found that the system interpreted her German words as English words. But the transcript didn’t make any sense. The first few lines, which correspond to the answer provided above, read:

So humidity is desk a beat-up. Sociology, does it iron? Mined material nematode adapt. Secure location, mesons the first half gamma their Fortunes in for IMD and fact long on for pass along to Eurasia and Z this particular location mesons.

Mismatched

Instead of scoring our candidate on the content of her answers, the algorithm pulled personality traits from her voice, says Clayton Donnelly, an industrial and organizational psychologist working with MyInterview.

But intonation isn’t a reliable indicator of personality traits, says Fred Oswald, a professor of industrial organizational psychology at Rice University. “We really can’t use intonation as data for hiring,” he says. “That just doesn’t seem fair or reliable or valid.”

Using open-ended questions to determine personality traits also poses significant challenges, even when—or perhaps especially when—that process is automated. That’s why many personality tests, such as the Big Five, give people options from which to choose.

“The bottom-line point is that personality is hard to ferret out in this open-ended sense,” Oswald says. “There are opportunities for AI or algorithms and the way the questions are asked to be more structured and standardized. But I don’t think we’re necessarily there in terms of the data, in terms of the designs that give us the data.”

The cofounder and chief technology officer of Curious Thing, Han Xu, responded to our findings in an email, saying: “This is the very first time that our system is being tested in German, therefore an extremely valuable data point for us to research into and see if it unveils anything in our system.”

The bias paradox

Performance on AI-powered interviews is often not the only metric prospective employers use to evaluate a candidate. And these systems may actually reduce bias and find better candidates than human interviewers do. But many of these tools aren’t independently tested, and the companies that built them are reluctant to share details of how they work, making it difficult for either candidates or employers to know whether the algorithms are accurate or what influence they should have on hiring decisions.

Mark Gray, who works at a Danish property management platform called Proper, started using AI video interviews during his previous human resources role at the electronics company Airtame. He says he originally incorporated the software, produced by a German company called Retorio, into interviews to help reduce the human bias that often develops as hiring managers make small talk with candidates.

While Gray doesn’t base hiring decisions solely on Retorio’s evaluation, which also draws on the Big Five traits, he does take it into account as one of many data points when choosing candidates. “I don’t think it’s a silver bullet for figuring out how to hire the right person,” he says.

Gray’s usual hiring process includes a screening call and a Retorio interview, which he invites most candidates to participate in regardless of the impression they made in the screening. Successful candidates will then advance to a job skills test, followed by a live interview with other members of the team.

“In time, products like Retorio, and Retorio itself—every company should be using it because it just gives you so much insight,” Gray says. “While there are some question marks and controversies in the AI sphere in general, I think the bigger question is, are we a better or worse judge of character?”

Gray acknowledges the criticism surrounding AI interviewing tools. An investigation published in February by Bavarian Public Broadcasting found that Retorio’s algorithm assessed candidates differently when they used different video backgrounds and accessories, like glasses, during the interview.

Retorio’s co-founder and managing director, Christoph Hohenberger, says that while he’s not aware of the specifics behind the journalists’ testing methods, the company doesn’t intend for its software to be the deciding factor when hiring candidates. “We are an assisting tool, and it’s being used in practice also together with human people on the other side. It’s not an automatic filter,” he says.

Still, the stakes are so high for job-seekers attempting to navigate these tools that surely more caution is warranted. For most, after all, securing employment isn’t just about a new challenge or environment—finding a job is crucial to their economic survival.

Categories
Artificial Intelligence NLP

AI voice actors sound more human than ever—and they’re ready to hire

The company blog post drips with the enthusiasm of a ’90s US infomercial. WellSaid Labs describes what clients can expect from its “eight new digital voice actors!” Tobin is “energetic and insightful.” Paige is “poised and expressive.” Ava is “polished, self-assured, and professional.”

Each one is based on a real voice actor, whose likeness (with consent) has been preserved using AI. Companies can now license these voices to say whatever they need. They simply feed some text into the voice engine, and out will spool a crisp audio clip of a natural-sounding performance.

WellSaid Labs, a Seattle-based startup that spun out of the research nonprofit Allen Institute of Artificial Intelligence, is the latest firm offering AI voices to clients. For now, it specializes in voices for corporate e-learning videos. Other startups make voices for digital assistants, call center operators, and even video-game characters.

KH · A WellSaid AI voice actor in a promotional style

Not too long ago, such deepfake voices had something of a lousy reputation for their use in scam calls and internet trickery. But their improving quality has since piqued the interest of a growing number of companies. Recent breakthroughs in deep learning have made it possible to replicate many of the subtleties of human speech. These voices pause and breathe in all the right places. They can change their style or emotion. You can spot the trick if they speak for too long, but in short audio clips, some have become indistinguishable from humans.

AI voices are also cheap, scalable, and easy to work with. Unlike a recording of a human voice actor, synthetic voices can also update their script in real time, opening up new opportunities to personalize advertising.

But the rise of hyperrealistic fake voices isn’t consequence-free. Human voice actors, in particular, have been left to wonder what this means for their livelihoods.

How to fake a voice

Synthetic voices have been around for a while. But the old ones, including the voices of the original Siri and Alexa, simply glued together words and sounds to achieve a clunky, robotic effect. Getting them to sound any more natural was a laborious manual task.

Deep learning changed that. Voice developers no longer needed to dictate the exact pacing, pronunciation, or intonation of the generated speech. Instead, they could feed a few hours of audio into an algorithm and have the algorithm learn those patterns on its own.

“If I’m Pizza Hut, I certainly can’t sound like Domino’s, and I certainly can’t sound like Papa John’s.”

Rupal Patel, founder and CEO of VocaliD

Over the years, researchers have used this basic idea to build voice engines that are more and more sophisticated. The one WellSaid Labs constructed, for example, uses two primary deep-learning models. The first predicts, from a passage of text, the broad strokes of what a speaker will sound like—including accent, pitch, and timbre. The second fills in the details, including breaths and the way the voice resonates in its environment.

Making a convincing synthetic voice takes more than just pressing a button, however. Part of what makes a human voice so human is its inconsistency, expressiveness, and ability to deliver the same lines in completely different styles, depending on the context.

Capturing these nuances involves finding the right voice actors to supply the appropriate training data and fine-tune the deep-learning models. WellSaid says the process requires at least an hour or two of audio and a few weeks of labor to develop a realistic-sounding synthetic replica.

KH · A Resemble.ai customer service agent
KH · A Resemble.ai voice actor in conversational style

AI voices have grown particularly popular among brands looking to maintain a consistent sound in millions of interactions with customers. With the ubiquity of smart speakers today, and the rise of automated customer service agents as well as digital assistants embedded in cars and smart devices, brands may need to produce upwards of a hundred hours of audio a month. But they also no longer want to use the generic voices offered by traditional text-to-speech technology—a trend that accelerated during the pandemic as more and more customers skipped in-store interactions to engage with companies virtually.

“If I’m Pizza Hut, I certainly can’t sound like Domino’s, and I certainly can’t sound like Papa John’s,” says Rupal Patel, a professor at Northeastern University and the founder and CEO of VocaliD, which promises to build custom voices that match a company’s brand identity. “These brands have thought about their colors. They’ve thought about their fonts. Now they’ve got to start thinking about the way their voice sounds as well.”

Karen Hao, MIT Tech Review · A VocaliD ad sample with a male voice
Karen Hao, MIT Tech Review · A VocaliD ad sample with a female voice

Whereas companies used to have to hire different voice actors for different markets—the Northeast versus Southern US, or France versus Mexico—some voice AI firms can manipulate the accent or switch the language of a single voice in different ways. This opens up the possibility of adapting ads on streaming platforms depending on who is listening, changing not just the characteristics of the voice but also the words being spoken. A beer ad could tell a listener to stop by a different pub depending on whether it’s playing in New York or Toronto, for example. Resemble.ai, which designs voices for ads and smart assistants, says it’s already working with clients to launch such personalized audio ads on Spotify and Pandora.

The gaming and entertainment industries are also seeing the benefits. Sonantic, a firm that specializes in emotive voices that can laugh and cry or whisper and shout, works with video-game makers and animation studios to supply the voice-overs for their characters. Many of its clients use the synthesized voices only in pre-production and switch to real voice actors for the final production. But Sonantic says a few have started using them throughout the process, perhaps for characters with fewer lines. Resemble.ai and others have also worked with film and TV shows to patch up actors’ performances when words get garbled or mispronounced.

But there are limitations to how far AI can go. It’s still difficult to maintain the realism of a voice over the long stretches of time that might be required for an audiobook or podcast. And there’s little ability to control an AI voice’s performance in the same way a director can guide a human performer. “We’re still in the early days of synthetic speech,” says Zohaib Ahmed, the founder and CEO of Resemble.ai, comparing it to the days when CGI technology was used primarily for touch-ups rather than to create entirely new worlds from green screens.

A human touch

In other words, human voice actors aren’t going away just yet. Expressive, creative, and long-form projects are still best done by humans. And for every synthetic voice made by these companies, a voice actor also needs to supply the original training data.

But some actors have grown increasingly worried about their livelihoods, says a spokesperson at SAG-AFTRA, the union representing voice actors in the US. If they’re not afraid of being automated away by AI, they’re worried about being compensated unfairly or losing control over their voices, which constitute their brand and reputation.

This is now the subject of a lawsuit against TikTok brought by the Canadian voice actor Bev Standing, who alleges that the app’s built-in voice-over feature uses a synthetic copy of her voice without her permission. Standing’s experience also echoes that of Susan Bennett, the original voice of American Siri, who was paid for her initial recordings but not for the continued use of her vocal likeness on millions of Apple devices.

Some companies are looking to be more accountable in how they engage with the voice-acting industry. The best ones, says SAG-AFTRA’s rep, have approached the union to figure out the best way to compensate and respect voice actors for their work.

Several now use a profit-sharing model to pay actors every time a client licenses their specific synthetic voice, which has opened up a new stream of passive income. Others involve the actors in the process of designing their AI likeness and give them veto power over the projects it will be used in. SAG-AFTRA is also pushing for legislation to protect actors from illegitimate replicas of their voice.

But for VocaliD’s Patel, the point of AI voices is ultimately not to replicate human performance or to automate away existing voice-over work. Instead, the promise is that they could open up entirely new possibilities. What if in the future, she says, synthetic voices could be used to rapidly adapt online educational materials to different audiences? “If you’re trying to reach, let’s say, an inner-city group of kids, wouldn’t it be great if that voice actually sounded like it was from their community?”