Lacking sounds to train their artificial intelligence (AI) models, French companies and laboratories of voice techniques (“Voicetech”) will launch a campaign to ask French speakers to give away a little of their voice for free, AFP was told. Karel Bourgois, head of the Voice Lab.
About thirty players in the field have combined their strengths in this association to bring together their data sets, or “datasets”: thousands of hours of recorded voices, necessary to feed and improve voice AI models. “Together, we have collected 9,000 hours. But we are the start-ups and SMEs facing faculties like Microsoft or Google, which has millions of hours on YouTube.
In France, +datasets+ are limited in number and often unlicensed for commercial use, hence the difficulty of training AIs. “Recently, a young researcher spent two years compiling his data,” laments the founder of the start-up Voxist. On a common voice platform, in French, anyone can read and record the text.
Also, in September, he will launch a campaign for a new version of the tool, which will “collect more natural voices by volunteering to answer questions”. Another track with the human-our lab is the “Listening to Talk” project: a truck driving through France to record voices, rather than voices from radios or televisions.
The Voice Lab has been in discussions with Radio France, France Televisions and INA, but is coming up against legal ambiguity regarding the concept of AI being used for training purposes. In 2021, the Voice Lab won a public call for projects and received 4.7 million euros over 5 years to compile voice data, develop common models, and demonstrate services to its members, for research purposes or for business.
“Voicetech”, a growing field revolutionized by AI, includes voice recognition and synthesis, emotion analysis, speaker identification, oral transcription of texts, de-accentuation, or imitations and voice modification, including in real-time.
These techniques are of interest to the general public and to large groups that want to use voice as an identifier or automate call centers. In January, Microsoft presented VALL-E, an AI model that can imitate a voice from a 3-second recording.