Skip to content

TTS#73

Open
donnie58744 wants to merge 13 commits into
zayfod:devfrom
donnie58744:TTS
Open

TTS#73
donnie58744 wants to merge 13 commits into
zayfod:devfrom
donnie58744:TTS

Conversation

@donnie58744

@donnie58744 donnie58744 commented Feb 19, 2026

Copy link
Copy Markdown

TTS

What this PR does

  • Adds very basic functionality of TTS(text to speech) for Cozmo using espeak-ng

Why?

Features

  • Generate .wav files from Text using espeak-ng
  • Generate Audio PCM Samples from .wav files
  • Play Audio PCM samples through Cozmos speaker with play_audio()
  • Has Cozmos Iconic Voice (Replicates It)
    • Uses chatterbox-tts AI model to replicate Cozmo's voice!

Basic AI Model Voice Cloning

python pycozmo/cozmo_voice_model/voice_clone.py --sample pycozmo/cozmo_voice_model/cozmo_voice_sample.wav --text "Hello Ive been cloned!" --output result.wav

Params:
--exaggeration  0.25–2.0
--cfg-weight    0.0–1.0
--temperature   0.05–5.0
--seed          integer

Requires

torch
torchaudio
chatterbox-tts

Changes

  • Added examples/tts.py
  • Added pycozmo/cozmo_voice_model/voice_clone.py for basic Voice Cloning testing
    • Added cozmo_voice_sample.wav
  • Fixed a clamp issue in audio.py
  • Added say_text() to client.py
    • Has parameter cozmo_voice when set to True it will use chatterbox-tts AI model.
    • If not using cozmo_voice then TTS will use espeakng
  • Added ‎tools/pycozmo_load_voice_model.py for downloading of chatterbox-tts AI model
  • Changed requirment.txt, setup.py and NOTICE to reflect changes made above

Dependencies

Test

  • python = [3.8,3.10]
    python examples/tts.py

Basic TTS that can be played through Cozmos speaker.
Converts Text to wav using espeakng then wav to pcm then sends the packets to Cozmo.

TODO: Does not have Cozmos Iconic voice yet
Basic example on how to use the TTS function
espeak-ng is now a requirement

espeak-ng now creates a wave file -> saves it -> then generates pcm packets with pkts = audio.load_wav(wav) -> then pycozmo.anim_controller.play_audio(pkts)
@donnie58744 donnie58744 mentioned this pull request Feb 20, 2026
17 tasks
@donnie58744

Copy link
Copy Markdown
Author

I would love some help on this I tried synthesizing his voice myself without the use of AI models and its just too hard to get it nailed right. The AI model approach gets it pretty darn close but of course takes some time to process the TTS because of the use of heavy weight models.

python pycozmo/cozmo_voice_model/voice_clone.py --sample pycozmo/cozmo_voice_model/cozmo_voice_sample.wav --text "Hello Ive been cloned fuck you anki" --output result.wav

Params:
--exaggeration  0.25–2.0
--cfg-weight    0.0–1.0
--temperature   0.05–5.0
--seed          integer
@donnie58744 donnie58744 mentioned this pull request Feb 22, 2026
- `voice_synth.py` now uses sox to synthesize a wav file.
- Change `requirements.txt` to have sox
Added more sample audio for better training data
…ice_model`

Added a new example in `tts.py` to use the AI Cozmo voice model.
Fixed a bug in `audio.py` to "clamp to valid byte range. Guarantee valid 0-255 range"
Added `cozmo_voice_model` to `client.py`
Made a new lib `cozmo_voice_model`
Removed Sox as a requirement for now until someone can synthesize correctly

In `setup.py` made the numpy lib requirement version between 1.24 and 1.26 for chatterbox-tts

Also in `setup.py` install_requires -> "py-espeak-ng>=0.1.8", "torch>=2.6.0", "torchaudio>=2.6.0", "chatterbox-tts>=0.1.6"

Also in `setup.py` added new script `pycozmo_load_voice_model.py` for the ease of downloading chatterbox-tts AI model
@donnie58744

Copy link
Copy Markdown
Author

This is ready to be reviewed and merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant