Objective: To develop and deploy a Voice Cloning AI tailored to produce voice cloning of an existing voice, submitted to the model, on the runpod.io platform.
The idea would be to submit an MP3 or WAV file through an API (depends on what the model supports, it's up to you to explain what is best), and generate (train) a new voice model with a unique ID. The API should return the voice model unique ID.
Then, we need an API where we send the voice model unique ID, the text and language of the text, and we would obtain a new MP3 (text to speech) of the text, spoken with the voice of the voice model we submitted.
Please note, we are using this API to offer our clients voice cloning capabilities. It needs to generate a new voice with a small sample audio of theirs (maximum 5 minutes). It needs to generate the new voice over fast, without training intensively.
It is required the ability to clone the same voice, in multiple languages (at least top 5 languages) like:
english, french, spanish, german, portuguese. The more languages supported, the better.
Form the research I have done, right now I see the option of using Coqui XTTS or Turtoise TTS.
If you have other suggestions, please let me know, as I am not familiar with all libraries (and what is best).
Important: Please check attached PDF for full job description.
This job is already closed and no longer accepting applicants, sorry.