Like a turkey dozing off when talk turns to Christmas, I confess to tuning out when talk turns to AI. Or rather I used to, until a few weeks ago. Before then, AI seemed vital and foreboding, yet somehow also remote and incomprehensible. But now my attention is hooked. The difference lies in the no-longer-unique sound of the human voice.
Deepfake vocal clones are here. The technology behind them isn’t new, but rapid advances in accuracy and availability have made AI-generated voice copying go viral this year. Microsoft’s Vall-E software claims to be able to mimic a person based on just three seconds of audio. Although it hasn’t yet been released to the public, others with similarly powerful capabilities are easily obtained.
A flashpoint came in January when tech start-up ElevenLabs released a powerful online vocal generator. Faked voices of celebrities immediately flooded social media. Swifties on TikTok concocted imaginary inspirational messages from Taylor Swift (“Hey it’s Taylor, if you’re having a bad day just know that you are loved”). At the other end of the spectrum, 4chan trolls created fake audio clips of celebrities saying hateful things.