Skip to content

TTS support with dataset bug fixes#19

Open
pcsid wants to merge 2 commits intomainfrom
feat/tts
Open

TTS support with dataset bug fixes#19
pcsid wants to merge 2 commits intomainfrom
feat/tts

Conversation

@pcsid
Copy link
Copy Markdown
Collaborator

@pcsid pcsid commented Nov 10, 2025

…mosv2 for mean opinion score estimation. Small dataset pathway bug adjustments.

📌 Description

This feature was to added to support text to speech evaluations for AU-Harness. Cartesia, Deepgram, and ElevenLabs clients are supported.

Also made some bug fixes to the dataset paths in the ASR task.

🛠️ Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality including new tasks)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactor / Code cleanup
  • Maintenance / Chore / Task
  • Other (please describe):

✅ How Has This Been Tested?

  • Unit tests
  • Integration tests
  • Manual testing

Test Results / Screenshots (if applicable):

📸 Screenshots / Demos

📋 Checklist

  • Code follows project style guidelines
  • Tests have been added/updated (if applicable)
  • Documentation has been updated (if applicable)
  • Linked relevant issue(s)
  • Self-reviewed my code

🙌 Additional Notes

…mosv2 for mean opinion score estimation. Small dataset pathway bug adjustments.
@pcsid pcsid changed the title text-to-speech pathway for Cartesia, ElevenLabs, and Deepgram with ut… TTS support with bug fixes Nov 10, 2025
@pcsid pcsid changed the title TTS support with bug fixes TTS support with dataset bug fixes Nov 10, 2025
@nhhoang96 nhhoang96 requested a review from jonggunp April 22, 2026 20:06
Comment thread utils/util.py

# URL is only required for non-TTS inference types
inference_type = info.get('inference_type')
if inference_type not in ['cartesia_tts', 'elevenlabs_tts', 'deepgram_tts']:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big deal, but should we be using constants instead?

Comment thread utils/util.py
raise ValueError(f"Model {index}: '{field}' must be a non-empty string")

# Require voice_id for TTS inference types
if inference_type in ['cartesia_tts', 'elevenlabs_tts']:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need deepgram here?

Comment thread metrics/utmos.py
for i, audio_file in enumerate(batch_files):
temp_name = f"audio_{i:06d}.wav"
temp_path = os.path.join(temp_dir, temp_name)
shutil.copy(audio_file, temp_path)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

audio_paths in tts_postprocessor.py can be an empty string. This will cause "audio_file" to be an empty string.

You may need to address line 109 so that it only shutil.copies when audio_file exists

Comment thread models/model.py

Args:
message: Input message containing ground_truth_text
run_params: Runtime parameters for the inference request
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is never used in this method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants