DeepSpeak-v1
Urdu Text-to-Speech · Rectified Flow Diffusion Transformer
Early Checkpoint · 18k / 100k stepsUrdu Text
Reference Audio * required
Upload 3–15 sec of the target voice. The model clones it.
Settings
3 30
10 80
1 8
1 10
Quick examples
Examples
Output
How to use
- Type or paste Urdu text
- Upload or record a reference voice clip
- Adjust settings if needed
- Click Generate Speech
Model
| Architecture | Rectified Flow DiT |
| Text Backbone | Qwen3.5-0.8B |
| Parameters | ~400M |
| Steps | 18,000 / 100,000 |
| Val Loss | 0.7959 |