Discussion
Loading...

Post

  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Curtis Carter
@codingcoyote@floss.social  ·  activity timestamp 3 weeks ago

Update on that #rust #tts #grpc service. TTS is far more complicated than I imagined even using #ai (machine learning) models. I assumed I'd have to process the text, for the model, but it's coming up that I need more processing than expected.

1. Split it up into sentences

2. Pass it through a phonemizer (phonetic/sound versions of the text)

3. Process the phonemes for the model

4. Run the model to actually generate the speech

I'm gonna have to write a blog post about this when I get done

  • Copy link
  • Flag this post
  • Block
alcinnz
@alcinnz@floss.social replied  ·  activity timestamp 3 weeks ago

@codingcoyote Yeah, I found this getting into browser-dev too... I've learned not to underestimate the complexity of text!

Interesting that the ML speech-synthesis models focus solely on reading the phonemes, but I guess that was the part which needed to be improved.

  • Copy link
  • Flag this comment
  • Block
Log in

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.1-alpha.8 no JS en
Automatic federation enabled
  • Explore
  • About
  • Members
  • Code of Conduct
Home
Login