DeepSpeak-v1

Urdu Text-to-Speech · Rectified Flow Diffusion Transformer

Early Checkpoint · 18k / 100k steps
Urdu Text
Reference Audio * required

Upload 3–15 sec of the target voice. The model clones it.

Settings
3 30
10 80
1 8
1 10
Quick examples
Examples
Output
How to use
  1. Type or paste Urdu text
  2. Upload or record a reference voice clip
  3. Adjust settings if needed
  4. Click Generate Speech
Model
Architecture Rectified Flow DiT
Text Backbone Qwen3.5-0.8B
Parameters ~400M
Steps 18,000 / 100,000
Val Loss 0.7959