beneadie97 8 months ago

A while back I built a Text-To-Speech API for a start-up idea I had for a podcasting app. I've since dropped it and want to share the core API with people so that they don't need to rely on paid APIs from big companies. It has a number of voices built in. It is a step below premium elevenlabs but still pretty high quality. Runs on my local 8 core AMD Ryzen 4000 series computer at around 10 seconds of audio per second of run-time. A selection of voices I liked are included but you can browse the Piper TTS library for more models. The API creates WAV files and sends them directly to an AWS S3 bucket. I have engineered it so that it can take an unlimited input length and allow the insertion of pauses with line skips, unlike the original version of piper. It runs fast on CPU but can also run on GPU. Piper models are ONNX format.

Hope you guys like it. I'm hoping to get to making some nicer documentation soon.

Piper TTS engine: https://github.com/rhasspy/piper

Piper TTS voices: https://rhasspy.github.io/piper-samples/

  • popalchemist 8 months ago

    This is awesome. I will check it out. Does your code include scripts for fine-tuning new Piper voices?