Skip to content

Add zen-dub-newsroom whitepaper: license-clean real-time multi-speaker dubbing#1

Merged
hanzo-dev merged 1 commit into
mainfrom
paper/zen-dub-newsroom
Jun 16, 2026
Merged

Add zen-dub-newsroom whitepaper: license-clean real-time multi-speaker dubbing#1
hanzo-dev merged 1 commit into
mainfrom
paper/zen-dub-newsroom

Conversation

@hanzo-dev

Copy link
Copy Markdown
Contributor

zen-dub-newsroom — license-clean, real-time, multi-speaker cross-lingual video dubbing

New whitepaper documenting the Zen Live-Dub system, reported with all-measured numbers (and honest negatives).

Contributions

  • Real-time lip-sync at 57 fps — bottleneck localized to per-frame PNG I/O, not the diffusion UNet (A/B isolates the I/O fix at an identical model path).
  • FP4 convolution analysis — eager im2col+_scaled_mm is counterproductive (0.054× vs f16); fused implicit-GEMM offers a ~4× ceiling; the cuBLAS to_blocked swizzle is load-bearing for numerics.
  • Code-vs-weights license audit — IndexTTS-2 (non-commercial weights) and F5-TTS (CC-BY-NC via Emilia) rejected; Qwen3-TTS-open (Apache-2.0) adopted; BiSeNet/CelebAMask-HQ → FAN-landmark (BSD) for the visual path (LSE-C 6.924, 47.85 dB).
  • Governed multi-speaker pipeline — diarization, voice registry, consent ledger, AudioSeal watermarking (100% across codec chains), C2PA provenance; governed e2e 20/20.
  • Distributed two-node live dubbing — breaks the single-GPU throughput wall by offloading TTS to a second machine over direct Ethernet: P50 4.04 s / P95 6.14 s, 0 drops, no new hardware (GB10 + Strix Halo).

Files

  • zen-dub-newsroom.tex (source) · pdfs/zen-dub-newsroom.pdf (6 pages)
  • Compiles clean on the repo's pdflatex toolchain; matches the zen-* house style.

🤖 Generated with Claude Code

…r dubbing

Empirical systems report on a permissively-licensed cross-lingual video dubbing pipeline: 57fps lip-sync (bottleneck localized to per-frame PNG I/O, not the diffusion UNet); FP4 conv analysis (eager im2col 0.054x vs f16, fused implicit-GEMM ~4x ceiling, to_blocked swizzle); code-vs-weights license audit (IndexTTS-2/F5-TTS rejected, Qwen3-TTS-open adopted, BiSeNet->FAN-landmark for the visual path); governed multi-speaker pipeline (diarization, registry, consent ledger, AudioSeal, C2PA); and a distributed two-node design reaching <=8s P95 live latency (P50 4.04s, 0 drops) on a GB10 + Strix Halo over direct Ethernet. 6-page PDF, compiles clean on the repo toolchain.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@hanzo-dev hanzo-dev merged commit e36a17f into main Jun 16, 2026
1 check passed
@hanzo-dev hanzo-dev deleted the paper/zen-dub-newsroom branch June 16, 2026 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant