Yandex SpeechKit Brand Voice Adaptive
Beta
Yandex supports speech synthesis with variables via Yandex SpeechKit Brand Voice Adaptive.
Yandex SpeechKit Brand Voice Adaptive integration via JAICP is currently in beta. Send an email to client@just-ai.com for more details.
How to use
The following steps are needed to use this technology in JAICP projects.
- Input data set preparation. Prepare a data set containing the templates and audio recordings for training the voice model. Be sure to follow all audio and text requirements. The data set will be passed on to Yandex.
- 
Voice model preparation. Model training is done on Yandex’s side. One training cycle takes approximately one month. 
- 
Voice model deployment. Yandex deploys the trained model to Yandex.Cloud and provides the model ID. This ID can be used in external projects. 
- 
TTS configuration in JAICP. In the phone channel settings, you will need to configure Yandex TTS settings. Paste the obtained model ID instead of the voice name: 

You will now be able to use speech synthesis with variables in the script to generate bot replies.
Refer to the $reactions.ttsWithVariables documentation to find out how.
We also have several of our own pretrained voice models and would be happy to share and retrain them using your data. This way the preparation stage will go much faster.
Restrictions
Answer length limit
The total length of speech synthesized with variables should not exceed:
- 24 seconds of voiced text;
- 250 characters, including whitespace and punctuation signs.
Otherwise the provider will return an error.
Bot script restrictions
When TTS with variables is used, the a tag and the $reactions.answer method do not work in the script.
Along with $reactions.ttsWithVariables, only audio playback via audio or $reactions.audio is allowed.
Model retraining
During speech synthesis, you can try using templates which are not part of the original data set. However, in this case the quality of the output is not guaranteed.