Innovate Futures @ Benji

ComfyUI With Spark-TTS And Voice Clone - An Efficient LLM-Based Text-to-Speech Model

Added 2025-03-31 13:00:17 +0000 UTC

In this video, we explore Spark TTS, a lightweight yet powerful text-to-speech model developed by Hong Kong University, now available in ComfyUI! Discover how to generate natural-sounding voice clones, create dynamic audio for videos, and fine-tune speech outputs with advanced settings like pitch and speed. Whether you're a content creator, AI enthusiast, or developer, this tutorial covers everything from setup to advanced voice cloning techniques—all running locally on your PC. Learn how to integrate Spark TTS into your workflows for AI-generated videos, virtual assistants, or talking avatars with minimal computational overhead.

Who is this content suitable for?

This video is perfect for content creators, AI developers, video producers, and ComfyUI users who want to add high-quality, customizable voiceovers to their projects. It’s also ideal for educators and entrepreneurs looking to leverage AI for automated narration or voice cloning.

Why does it matter?

Spark TTS offers:

- Local, lightweight operation (0.5B parameters) with zero-shot voice cloning.

- Seamless integration into ComfyUI for audio-video workflows.

- Customizable outputs (pitch, speed, tone) for natural-sounding speech.

- Support for English and Chinese, ideal for multilingual projects.

https://github.com/SparkAudio/Spark-TTS

https://github.com/1038lab/ComfyUI-SparkTTS