TTS by donnie58744 · Pull Request #73 · zayfod/pycozmo

donnie58744 · 2026-02-19T23:41:47Z

TTS

What this PR does

Adds very basic functionality of TTS(text to speech) for Cozmo using espeak-ng

Why?

Resolves Text-to-speech integration #36, Please add TTS #70

Features

Generate .wav files from Text using espeak-ng
Generate Audio PCM Samples from .wav files
Play Audio PCM samples through Cozmos speaker with play_audio()
Has Cozmos Iconic Voice (Replicates It)
- Uses chatterbox-tts AI model to replicate Cozmo's voice!

Basic AI Model Voice Cloning

python pycozmo/cozmo_voice_model/voice_clone.py --sample pycozmo/cozmo_voice_model/cozmo_voice_sample.wav --text "Hello Ive been cloned!" --output result.wav

Params:
--exaggeration  0.25–2.0
--cfg-weight    0.0–1.0
--temperature   0.05–5.0
--seed          integer

Requires

torch
torchaudio
chatterbox-tts

Changes

Added examples/tts.py
Added pycozmo/cozmo_voice_model/voice_clone.py for basic Voice Cloning testing
- Added cozmo_voice_sample.wav
Fixed a clamp issue in audio.py
Added say_text() to client.py
- Has parameter cozmo_voice when set to True it will use chatterbox-tts AI model.
- If not using cozmo_voice then TTS will use espeakng
Added ‎tools/pycozmo_load_voice_model.py for downloading of chatterbox-tts AI model
Changed requirment.txt, setup.py and NOTICE to reflect changes made above

Dependencies

Test

python = [3.8,3.10]
python examples/tts.py

Basic TTS that can be played through Cozmos speaker. Converts Text to wav using espeakng then wav to pcm then sends the packets to Cozmo. TODO: Does not have Cozmos Iconic voice yet

Basic example on how to use the TTS function

espeak-ng is now a requirement espeak-ng now creates a wave file -> saves it -> then generates pcm packets with pkts = audio.load_wav(wav) -> then pycozmo.anim_controller.play_audio(pkts)

donnie58744 · 2026-02-21T20:52:35Z

I would love some help on this I tried synthesizing his voice myself without the use of AI models and its just too hard to get it nailed right. The AI model approach gets it pretty darn close but of course takes some time to process the TTS because of the use of heavy weight models.

python pycozmo/cozmo_voice_model/voice_clone.py --sample pycozmo/cozmo_voice_model/cozmo_voice_sample.wav --text "Hello Ive been cloned fuck you anki" --output result.wav Params: --exaggeration 0.25–2.0 --cfg-weight 0.0–1.0 --temperature 0.05–5.0 --seed integer

- `voice_synth.py` now uses sox to synthesize a wav file. - Change `requirements.txt` to have sox

Added more sample audio for better training data

…ice_model` Added a new example in `tts.py` to use the AI Cozmo voice model. Fixed a bug in `audio.py` to "clamp to valid byte range. Guarantee valid 0-255 range" Added `cozmo_voice_model` to `client.py` Made a new lib `cozmo_voice_model` Removed Sox as a requirement for now until someone can synthesize correctly In `setup.py` made the numpy lib requirement version between 1.24 and 1.26 for chatterbox-tts Also in `setup.py` install_requires -> "py-espeak-ng>=0.1.8", "torch>=2.6.0", "torchaudio>=2.6.0", "chatterbox-tts>=0.1.6" Also in `setup.py` added new script `pycozmo_load_voice_model.py` for the ease of downloading chatterbox-tts AI model

donnie58744 · 2026-02-26T00:12:14Z

This is ready to be reviewed and merged!

donnie58744 added 6 commits February 19, 2026 13:02

Base

6517c54

Basic TTS that can be played through Cozmos speaker. Converts Text to wav using espeakng then wav to pcm then sends the packets to Cozmo. TODO: Does not have Cozmos Iconic voice yet

Rearranged Functions

a075166

Example TTS

a5d3e13

Basic example on how to use the TTS function

Update NOTICE

853e30e

Performence Improvemnts

10f7a6a

espeak-ng is now a requirement espeak-ng now creates a wave file -> saves it -> then generates pcm packets with pkts = audio.load_wav(wav) -> then pycozmo.anim_controller.play_audio(pkts)

Update tts.py

516f056

donnie58744 mentioned this pull request Feb 20, 2026

Remote control cozmo #72

Open

17 tasks

Added Tester Files

722db3a

python pycozmo/cozmo_voice_model/voice_clone.py --sample pycozmo/cozmo_voice_model/cozmo_voice_sample.wav --text "Hello Ive been cloned fuck you anki" --output result.wav Params: --exaggeration 0.25–2.0 --cfg-weight 0.0–1.0 --temperature 0.05–5.0 --seed integer

donnie58744 mentioned this pull request Feb 22, 2026

Please add TTS #70

Closed

donnie58744 added 6 commits February 23, 2026 17:58

Changed voice_synth method

3e48948

- `voice_synth.py` now uses sox to synthesize a wav file. - Change `requirements.txt` to have sox

Update cozmo_voice_sample.wav

d4bbf3f

Added more sample audio for better training data

Format

c0b5df8

Delete voice_synth.py

45f9826

Update requirements.txt

d1dc0b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TTS#73

TTS#73
donnie58744 wants to merge 13 commits into
zayfod:devfrom
donnie58744:TTS

donnie58744 commented Feb 19, 2026 •

edited

Loading

Uh oh!

donnie58744 commented Feb 21, 2026

Uh oh!

donnie58744 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

donnie58744 commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TTS

Uh oh!

donnie58744 commented Feb 21, 2026

Uh oh!

donnie58744 commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

donnie58744 commented Feb 19, 2026 •

edited

Loading