How much training data does RVC need to clone a voice?

RVC can train on as little as 1 minute of clean audio, but 10 to 30 minutes yields significantly better results. Audio quality matters more than quantity, so a short studio recording outperforms hours of noisy phone calls.

Can RVC run on a CPU without a GPU?

RVC inference can run on CPU, though it is roughly 10 to 20 times slower than GPU. Training, however, requires a CUDA-capable NVIDIA GPU with at least 4GB VRAM, and CPU training is impractical at 50x or more slower.

What is the difference between RVC v1 and v2?

RVC v2 upgrades the content encoder from 9-layer HuBERT with 256-dimensional features to 12-layer HuBERT with 768-dimensional features and adds 3 period discriminators for better audio quality. The v2 pretrained models are not backward compatible with v1, and all new projects should use v2.

How do I reduce timbre leakage from the source speaker in RVC?

Adjust the index_rate parameter, which controls how heavily inference relies on the Faiss retrieval index versus the source audio. Start at 0.75 and raise it toward 1.0 to pull more features from the trained voice, or lower it to 0.3 to 0.5 if the output sounds artificial.

Can RVC convert text to speech, or only audio to audio?

RVC only performs audio-to-audio voice conversion and cannot generate speech from text. For a full text-to-speech cloning pipeline, combine it with a TTS engine such as GPT-SoVITS, Coqui TTS, or Edge-TTS.

＃＃介绍您需要一个可在 10 分钟内训练、在单个 GPU 上运行并产生广播质量输出的语音转换管道。开源生态系统已经产生了数十种语音克隆工具，但大多数都需要数小时的训练、大量数据集或按分钟计费的云 API。 RVC（基于检索的语音转换）是一个基于 VITS 的框架，拥有 35,700 多个 GitHub star，可将训练时间缩短至 10 分钟以下，而干净的音频只需 10 分钟。本指南介绍了生产就绪的 RVC 设置 — Docker 部署、训练管道、API 集成以及交付给用户之前所需的强化步骤。 ## 什么是 RVC？ RVC 是一种开源语音转换框架，可将一个人的声音转换为另一个人的声音，同时保留语音内容、语调和节奏。它基于 VITS 构建，具有基于检索的特征匹配模块，在消费级 GPU 上实现了 10 分钟以内的训练时间，并支持延迟低至 90 毫秒的实时推理。 ## RVC 的工作原理 RVC的架构结合了四个核心模块： 内容特征提取 — 使用 ContentVec（HuBERT 的解缠变体）从源音频中提取说话人不变的语音和语言特征。 ContentVec 在保留内容信息的同时剥离说话者身份，使其成为语音转换任务的理想选择。 音高提取 — 采用 Interspeech 2023 上提出的 RMVPE（稳健的人声音高估计模型）来提取基频 (F0)。 RMVPE 可以处理和弦音频，即使在源分离不完善的情况下也能准确执行。 声学建模 - 基于 VITS（端到端文本到语音的对抗性学习的变分推理）构建，这是一种通过标准化流增强的条件 VAE。 VITS 通过生成器和多周期鉴别器之间的对抗性训练生成高保真音频。 检索模块 — RVC 的标志性创新。在训练期间，内容特征在 Faiss 矢量数据库中建立索引。在推理过程中，源特征被训练集中的前 K 个最近邻替换（默认情况下 K=8），从而显着减少源说话者的音色泄漏。 “index_rate”参数（α，通常为 0.3）控制检索到的特征和源特征之间的混合。

## 安装和设置 ### 先决条件 RVC 在 Linux、macOS 和 Windows 上运行。对于训练，需要至少具有 4GB VRAM 的 NVIDIA GPU（建议 8GB+）。仅用于推理，CPU 的工作延迟是可以接受的。 最低硬件：

GPU：NVIDIA GTX 1660 6GB / RTX 2060 8GB（训练）； 4GB VRAM（仅供参考）
CPU：4核Intel/AMD处理器
RAM：最低 8GB，建议 16GB
存储：10GB 可用空间用于模型和依赖项 ### 方法 1：Docker 部署（推荐用于生产）官方 Dockerfile 在 Ubuntu 20.04 和 Python 3.9 上使用 CUDA 11.6.2： ```` bas h

克隆存储库 #

git 克隆 https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI.git cd 基于检索的语音转换WebUI # 构建 Docker 镜像 docker build -t rvc-webui:latest 。 # 在 GPU 支持和卷安装的情况下运行 docker run -d –name rvc \ –GPU全部\ -p 7865:7865 \ -v $(pwd)/weights:/app/weights \ -v $(pwd)/opt:/app/opt \ rvc-webui：最新 对于 docker-compose 用户： yam l 版本：‘3.8’ 服务：房车：建造：。容器名称：rvc-webui 运行时：英伟达环境： - NVIDIA_VISIBLE_DEVICES=全部端口： - “7865:7865” 卷： - ./weights:/app/weights - ./opt:/app/opt - ./资产：/应用程序/资产部署：资源：预订：设备： - 驱动程序：nvidia 数量：1 能力：[GPU] 重新启动：除非停止 bas h

Start with docker-compose #

docker-compose up -d # 检查日志 docker-compose 日志-f rvc ### Method 2: Local Python Setup bas h

克隆存储库 #

git 克隆 https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI.git cd 基于检索的语音转换WebUI # 创建虚拟环境 python3 -m venv venv 源 venv/bin/activate # 安装依赖项 pip install -r 要求.txt # 下载预训练模型 python 工具/download_models.py # 或者从 HuggingFace 手动下载 wget https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/D40k.pth -P asset/pretrained_v2/ wget https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/G40k.pth -P asset/pretrained_v2/ wget https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/f0D40k.pth -P asset/pretrained_v2/ wget https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/pretrained_v2/f0G40k.pth -P asset/pretrained_v2/ wget https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/hubert_base.pt -P asset/hubert/ wget https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/rmvpe.pt -P asset/rmvpe/ ### 方法 3：AMD GPU 设置 (ROCm) bas h

安装 ROCm 依赖项 (Ubuntu/Debian) #

sudo apt install rocm-hip-sdk rocm-opencl-sdk # 设置环境变量导出 ROCM_PATH=/opt/rocm 导出 HSA_OVERRIDE_GFX_VERSION=10.3.0 # 将用户添加到渲染和视频组 sudo usermod -aG 渲染 $USER sudo usermod -aG 视频 $USER # Install AMD-specific requirements pip install -r requirements-amd.txt ### 启动WebUI bas h

启动 Gradio Web 界面 #

python infer-web.py # WebUI 将在 http://localhost:7865 上提供

- **格式：** WAV，16 位或 24 位，22050Hz 或 40000Hz 采样率
- **内容：** 单扬声器，最小背景噪音，无音乐或混响
- **静音：** 删除长静音片段（> 3 秒） 使用 UVR5（随附）进行源分离： ````
bas
h
# 将人声与背景音乐分开
python 工具/uvr5/uvr5_cli.py \ --input_path ./raw_audio/song_with_music.wav \ --output_path ./数据集/ \ --model_name "HP2-人声人声+非人声乐器"
```` ### 步骤 2：预处理和提取特征 在 WebUI **Train** 选项卡中： 1. 设置**实验名称**（例如“my_voice_v2”）
2. 将 **目标采样率** 设置为 40kHz（推荐）
3. 将 **RVC 版本** 设置为 v2
4. 将 **模型架构** 设置为 `rmvpe_gpu`
5. 将 **数据集路径** 设置为您的音频文件夹
6. 点击**一键培训** 或者通过命令行： ````
bas
h
# 第 1 步：预处理（重新采样、切片、去除静音）
python trainset_preprocess_pipeline_print.py \ ./数据集/my_voice \ 40000\ 8 #CPU线程数 # 步骤 2：使用 ContentVec 提取特征
python extract_feature_print.py \ --model_name my_voice_v2 \ --sample_rate 40000 \ --pitch_extractor rmvpe \ --GPU 0 # 第 3 步：训练模型
python train_nsf_sim_cache_sid_load_pretrain.py \ --model_name my_voice_v2 \ --sample_rate 40000 \ --batch_size 8 \ --total_epoch 200 \ --save_every_epoch 5 \ --pretrained_G 资产/pretrained_v2/f0G40k.pth \ --pretrained_D 资产/pretrained_v2/f0D40k.pth \ --GPU 0
```` ### 第 3 步：构建特征索引 ````
bas
h
# 生成Faiss索引用于检索
python 工具/infer/train_index.py \ --model_name my_voice_v2 \ --采样率 40000
```` 训练输出地点： ````
日志/
└── my_voice_v2/ ├── added_IVF512_Flat_nprobe_1.index # Faiss检索索引 ├── G_*.pth # 生成器检查点 ├── D_*.pth # 鉴别器检查点 └── config.json # 模型配置
```` ![RVC WebUI Training Tab](https://raw.githubusercontent.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/main/docs/en/training_tab.png) ### 训练基准 | 硬件| 数据集大小 | 时代| 培训时间| 输出质量|
|

RVC：部署 AI 语音转换，拥有 35K+ 星标 — 2026 年 10 分钟训练设置

Start with docker-compose #

克隆存储库 #

安装 ROCm 依赖项 (Ubuntu/Debian) #

启动 Gradio Web 界面 #

💬 留言讨论

Start with docker-compose #

克隆存储库 #

安装 ROCm 依赖项 (Ubuntu/Debian) #

启动 Gradio Web 界面 #

🔗 相关资源推荐

💬 留言讨论