Post · bonfire.cafe

Post

@codingcoyote@floss.social · 3 months ago

Update on that #rust #tts #grpc service. TTS is far more complicated than I imagined even using #ai (machine learning) models. I assumed I'd have to process the text, for the model, but it's coming up that I need more processing than expected.

1. Split it up into sentences

2. Pass it through a phonemizer (phonetic/sound versions of the text)

3. Process the phonemes for the model

4. Run the model to actually generate the speech

I'm gonna have to write a blog post about this when I get done

alcinnz

@alcinnz@floss.social replied · 3 months ago

@codingcoyote Yeah, I found this getting into browser-dev too... I've learned not to underestimate the complexity of text!

Interesting that the ML speech-synthesis models focus solely on reading the phonemes, but I guess that was the part which needed to be improved.

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances

Bonfire social · 1.0.2-alpha.23 no JS en

Automatic federation enabled