When it cannot be determined by human testing whether some fake voice is a synthetic fake of some person's voice, or is it an actual recording made of that person's actual real voice, it is a digital sound-alike.

Living people can defend¹ themselves against digital sound-alike by denying the things the digital sound-alike says if they are presented to the target, but dead people cannot. Digital sound-alikes offer criminals new disinformation attack vectors and wreak havoc on provability.

A spectrogram of a male voice saying 'nineteenth century'

Timeline of digital sound-alikes

In 2016 w:Adobe Inc.'s Voco, an unreleased prototype, was publicly demonstrated in 2016. (View and listen to Adobe MAX 2016 presentation of Voco)

In 2016 w:DeepMind's w:WaveNet owned by w:Google also demonstrated ability to steal people's voices

In 2018 Conference on Neural Information Processing Systems the work 'Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis' (at arXiv.org) was presented. The pre-trained model is able to steal voices from a sample of only 5 seconds with almost convincing results
- Listen 'Audio samples from "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis"'
- View Video summary of the work at YouTube: 'This AI Clones Your Voice After Listening for 5 Seconds'

As of 2019 Symantec research knows of 3 cases where digital sound-alike technology has been used for crimes.^[1]

Examples of speech synthesis software not quite able to fool a human yet

Some other contenders to create digital sound-alikes are though, as of 2019, their speech synthesis in most use scenarios does not yet fool a human because the results contain tell tale signs that give it away as a speech synthesizer.

Lyrebird.ai (listen)
CandyVoice.com (test with your choice of text)
Merlin, a w:neural network based speech synthesis system by the Centre for Speech Technology Research at the w:University of Edinburgh

Documented digital sound-alike attacks

'An artificial-intelligence first: Voice-mimicking software reportedly used in a major theft', a 2019 Washington Post article

Example of a hypothetical digital sound-alike attack

A very simple example of a digital sound-alike attack is as follows:

Someone puts a digital sound-alike to call somebody's voicemail from an unknown number and to speak for example illegal threats. In this example there are at least two victims:

Victim #1 - The person whose voice has been stolen into a covert model and a digital sound-alike made from it to frame them for crimes
Victim #2 - The person to whom the illegal threat is presented in a recorded form by a digital sound-alike that deceptively sounds like victim #1
Victim #3 - It could also be viewed that victim #3 is our law enforcement systems as they are put to chase after and interrogate the innocent victim #1
Victim #4 - Our judiciary which prosecutes and possibly convicts the innocent victim #1.

Thus it is high time to act and to criminalize the covert modeling of human appearance and voice!

Transcluded Wikipedia articles

Speech synthesis article transcluded from Wikipedia

[Template fetch failed for https://en.wikipedia.org/wiki/speech_synthesis?action=render: HTTP 503]

↑ https://www.washingtonpost.com/technology/2019/09/04/an-artificial-intelligence-first-voice-mimicking-software-reportedly-used-major-theft/

[WaPo2019-1] ttps://www.washingtonpost.com/technology/2019/09/04/an-artificial-intelligence-first-voice-mimicking-software-reportedly-used-major-theft/

[1]

Digital sound-alikes

Contents