lang: vi slug: ultimate-vocal-remover title: ‘Ultimate Vocal Remover: 24,7K+ Sao — Hướng dẫn thiết lập hoàn chỉnh 2026’ description: ‘Ultimate Vocal Remover (UVR) là một ứng dụng GUI để loại bỏ giọng hát bằng cách sử dụng mạng lưới thần kinh sâu. Tương thích với demucs, RVC, GPT-SoVITS. Bao gồm cài đặt Windows, macOS, Linux, lựa chọn mô hình, xử lý hàng loạt và tăng cường sản xuất.’ tags: [“guide”, “open-source”, “reference”, “tutorial”] date: 2026-05-19 00:00:00+08:00 lastmod: 2026-05-19 00:00:00+08:00 tech_stack: [] application_domain: Ai Tools source_version: ’' licensing_model: Open Source license_type: MIT file_size: ’' file_md5: ’' download_url: ’' backup_url: ’' github_repo: ‘https://github.com/Anjok07/ultimatevocalremovergui' last_maintained: ‘2026-05-19’ draft: false categories: [‘ai-tools’] aliases:- /posts/ultimate-vocal-remover/ câu hỏi thường gặp:

q: ‘Ultimate Vocal Remover có thể chạy mà không cần GPU không?’ a: ‘Có, UVR tự động quay lại xử lý CPU, mặc dù nó chạy chậm hơn khoảng 5-10 lần. CPU 8 nhân hiện đại xử lý một bản nhạc dài 4 phút trong khoảng 3 phút bằng mô hình MDX-Net và bạn nên đặt kích thước phân đoạn thành 64 hoặc thấp hơn để phù hợp với các hạn chế về bộ nhớ CPU.’
q: ‘Mẫu UVR nào tạo ra bản nhạc nhạc cụ trong trẻo nhất?’ a: ‘MDX23C đạt điểm chuẩn SDR cao nhất, đạt 9,42 SDR giọng hát trên tập dữ liệu MUSDB18, phù hợp nhất cho các bản phối phức tạp hoặc dàn nhạc. Đối với hầu hết các bài hát pop/rock, MDX-Net Main mang lại sự cân bằng tốt nhất giữa chất lượng và tốc độ.’
q: ‘Tại sao UVR chỉ cài vào ổ C:\ trên Windows?’ a: ‘Trình cài đặt Windows gói Python, PyTorch và FFmpeg vào một thư mục đường dẫn cố định và việc di chuyển cài đặt sẽ phá vỡ các đường dẫn tương đối được mã hóa cứng giữa thư mục thời gian chạy và mô hình. Việc cài đặt vào ổ đĩa phụ sẽ gây mất ổn định thời gian chạy.’
q: ‘Ultimate Vocal Remover có tốt cho việc tách lời nói hay hai người đang nói chuyện không?’ trả lời: ‘Không, các mô hình UVR được đào tạo trên các bộ dữ liệu âm nhạc như MUSDB18, do đó việc tách lời nói chồng chéo sẽ tạo ra kết quả kém. Để tách giọng nói, thay vào đó hãy sử dụng các công cụ chuyên dụng như pyannote.audio hoặc SpeechBrain.’
q: ‘Tôi có thể sử dụng đầu ra UVR cho bản phát hành thương mại không?’ a: ‘Phần mềm UVR và các mô hình của nó được MIT cấp phép, cho phép sử dụng chính công cụ này cho mục đích thương mại. Tuy nhiên, luật bản quyền vẫn áp dụng cho tài liệu nguồn, do đó, việc xóa giọng hát khỏi bài hát có bản quyền sẽ không cấp cho bạn quyền phân phối nhạc cụ thu được.’

📦 资源信息

Tách giọng hát khỏi các bản nhạc cụ được sử dụng để yêu cầu các plugin DAW đắt tiền, khắc EQ thủ công hoặc gia công cho các kỹ sư âm thanh. Vào năm 2026, các mô hình học sâu nguồn mở sẽ xử lý tác vụ này trong vòng chưa đầy 60 giây trên phần cứng của người tiêu dùng. Ultimate Vocal Remover (UVR) dẫn đầu không gian này với hơn 24.700 sao GitHub, GUI dựa trên Tkinter và hỗ trợ nhiều kiến trúc tiên tiến bao gồm VR-Net, MDX-Net, MDX23C và Demucs. Hướng dẫn loại bỏ giọng hát cuối cùng này sẽ hướng dẫn thiết lập loại bỏ giọng hát trên cả ba nền tảng chính, chiến lược lựa chọn mô hình, quy trình xử lý hàng loạt, cấu hình tách âm thanh ai và tích hợp với các công cụ như RVC và GPT-SoVITS. Cho dù bạn đang so sánh loại bỏ giọng hát với demucs hay đang tìm kiếm hướng dẫn uvr hoàn chỉnh, bài viết này sẽ đề cập đến quá trình triển khai sẵn sàng sản xuất từ đầu đến cuối.

## Ultimate Vocal Remover là gì?Ultimate Vocal Remover (UVR) là một ứng dụng GUI nguồn mở sử dụng mạng thần kinh sâu để tách giọng hát khỏi âm thanh nhạc cụ. Được xây dựng chủ yếu bằng Python với PyTorch, nó đóng gói các mô hình phân tách nguồn phức tạp vào một giao diện máy tính để bàn mà những người không phải lập trình viên có thể truy cập được. Dự án được duy trì bởi Anjok07 và aufr03, với phần lớn các mô hình được đào tạo bởi nhóm phát triển cốt lõi.UVR hỗ trợ nhiều kiến trúc AI:- Kiến trúc VR — Phân tách dựa trên quang phổ được phát triển bởi tsurumeso
MDX-Net — Mạng nơ-ron sâu đa băng tần của Kuielab
MDX23C — MDX-Net mở rộng với cửa sổ ngữ cảnh lớn hơn
Demucs v3/v4 — Mô hình dạng sóng quang phổ lai của Facebook ResearchỨng dụng xuất ra các tệp WAV riêng biệt cho giọng hát và nhạc cụ, với các tùy chọn bổ sung cho trống, âm trầm và các gốc “khác” khi sử dụng kiểu 4 gốc.## How UVR Works — Architecture OverviewUVR does not implement a single monolithic model. Instead, it acts as a model orchestration layer that loads and runs different PyTorch-based separation engines behind a unified interface.``` Input Audio (MP3/WAV/FLAC) | v [FFmpeg Decoder] → WAV PCM | v [Model Selection] |– VR-Net → Spectrogram masking |– MDX-Net → Multi-band estimation |– MDX23C → Extended context model |– Demucs → Hybrid waveform+spec | v [Post-Processing] → WAV Output |– Vocals.wav |– Instrumental.wav

a
c
h
model processes audio differently:**VR Architecture** converts audio to a Short-Time Fourier Transform (STFT) spectrogram, applies a learned mask to separate vocal frequencies, and reconstructs the waveform via inverse STFT. This approach is fast but can leave vocal artifacts in the instrumental track.**MDX-Net** splits the spectrogram into multiple frequency bands and processes each band through separate neural network branches. The multi-band design captures harmonic structures in vocals that single-band masks miss.**Demucs** operates on both the raw waveform and spectrogram representations simultaneously. The hybrid approach preserves phase information better than spectrogram-only methods, producing cleaner separations at the cost of higher compute requirements.All models run through **ONNX Runtime** or **PyTorch** with optional GPU acceleration via CUDA (Nvidia), MPS (Apple Silicon), or DirectML (AMD/Intel).## Installation and Setup### Windows Installation (Recommended)UVR v5.6 provides a standalone installer for Windows 10 and above. No Python or dependency installation is required.**Step 1: Download the installer**```
powershel
l
# Download UVR v5.6 from the official release page
# 64-bit Windows (CUDA-enabled for Nvidia GPUs)
# https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_v5.6.0_setup.exe# For AMD Radeon / Intel Arc GPUs, use the DirectML build:
# https://github.com/Anjok07/ultimatevocalremovergui/releases/downlo```
powershel
l
# Download UVR v5.6 from the official release page
# 64-bit Windows (CUDA-enabled for Nvidia GPUs)
# https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_v5.6.0_setup.exe

# For AMD Radeon / Intel Arc GPUs, use the DirectML build:
# https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/UVR_1_15_25_22_30_BETA_full.exe
```s
typical depending on which models you select. Store models on an SSD — model load times are a bottleneck on HDDs.**System Requirements — Windows:**```
yam
l
OS: Windows 10 64-bit or higher
CPU: Intel/AMD 64-bit (Pentium/Celeron not supported)
RAM: 8GB minimum, 16GB recommended
GPU: Nvidia GTX 1060 6GB minimum, RTX 3060 8GB+ recommended
Storage: 15GB free space (SSD strongly recommended)
Note: Intel Pent```
powershel
l
# IMPORTANT: Install to C:\ drive only.
# Installing to a secondary drive causes runtime instability.
# Run the installer as Administrator
.\UVR_v5.6.0_setup.exe
```f
o
r
your architecture
# Apple Silicon (M1/M2/M3):
# https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_MacOS_arm64.dmg# Intel Macs:
# https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_MacOS_x86_64.dmg# Step 2: Mount the DMG and drag UVR to Applications# Step 3: Bypass Gatekeeper (first launch only)
sudo spctl --master-disable
sudo xattr ```
yam
l
OS: Windows 10 64-bit or higher
CPU: Intel/AMD 64-bit (Pentium/Celeron not supported)
RAM: 8GB minimum, 16GB recommended
GPU: Nvidia GTX 1060 6GB minimum, RTX 3060 8GB+ recommended
Storage: 15GB free space (SSD strongly recommended)
Note: Intel Pentium and Celeron CPUs are not supported
```requiremen
t
s
.txt# Apple Silicon only — fix soundfile library
cp /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/_soundfile_data/libsndfile_arm64.dylib \
   /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/_soundfile_data/libsndfile.dylib# Download FFmpeg binary and place in application directory
# Download Rubber Band for tim```
bas
h
# Step 1: Download the DMG for your architecture
# Apple Silicon (M1/M2/M3):
# https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_MacOS_arm64.dmg

# Intel Macs:
# https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.6/Ultimate_Vocal_Remover_v5_6_MacOS_x86_64.dmg

# Step 2: Mount the DMG and drag UVR to Applications

# Step 3: Bypass Gatekeeper (first launch only)
sudo spctl --master-disable
sudo xattr -rd com.apple.quarantine "/Applications/Ultimate Vocal Remover.app"

# Step 4: Re-enable Gatekeeper after UVR opens successfully
sudo spctl --master-enable
``` environment
python3 -m venv venv
source venv/bin/activate# Step 4: Install Python dependencies
pip install -r requirements.txt# Step 5: Run UVR
python UVR.py
```**Arch-based systems (EndeavourOS, Manjaro):**```
bas
h
sudo pacman -Syu
sudo pacman -S ffmpeg python-pip tk python-virtualenvgit clone https://github.com/Anjok07/ultimatevocalremovergui.git
cd ultimatevocalremovergui
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python UVR.py
```**Headless / Server Deployment (Docker):**```
dockerfil
e
# Dockerfile for UVR headless processing
FROM nvidia/cuda:12.1-runtime-ubuntu22.04RUN apt-get update && apt-get install -y \

bas h

For developers who prefer running from source #

brew install python@3.10 ffmpeg pip3 install -r requirements.txt

Apple Silicon only — fix soundfile library #

cp /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/_soundfile_data/libsndfile_arm64.dylib
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/_soundfile_data/libsndfile.dylib

Download FFmpeg binary and place in application directory #

Download Rubber Band for time-stretch/pitch-shift features #

python3 UVR.py

docker run --gpus all -v $(pwd)/input:/input -v $(pwd)/output:/output uvr-gpu \
    --input /input/song.mp3 --output /output --model MDX-Net
```### requirements.txt Key Dependencies```
tex
t
altgraph==0.17.3
audioread==3.0.0
einops==0.6.0
julius==0.2.7
librosa==0.9.2
matchering==2.0.6
omegaconf==2.2.3
opencv-python==4.6.0.66
psutil==5.9.4
pydub==0.25.1
pyrubberband==0.3.0
pytorch_lightning==2.0.0
resampy==0.4.2
scipy==1.9.3
torch
onnxruntime
onnxruntime-gpu
numpy==1.23.5
```## Model Selection and ConfigurationUVR ships with dozens of pre-trained models. Selecting the right model depends on your input audio and desired output quality.### Built-in Models| Model | Architecture | Best For | Speed | VRAM |
|---
featureImage: /images/articles/ultimate-vocal-remover-247k-sao-hng-dn-thit-lp-hon-chnh-2026.png
-------|-------------|----------|-------|------|
| `MDX-Net Main` | MDX-Net | General vocal rem```
bas
h
# Step 1: Install system dependencies
sudo apt update && sudo apt upgrade -y
sudo apt-get install -y ffmpeg python3-pip python3-tk python3-venv

# Step 2: Clone the repository
git clone https://github.com/Anjok07/ultimatevocalremovergui.git
cd ultimatevocalremovergui

# Step 3: Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate

# Step 4: Install Python dependencies
pip install -r requirements.txt

# Step 5: Run UVR
python UVR.py
``` Main (best balance of speed and quality)
  No → Is it a complex orchestral mix?
          Yes → MDX23C (higher quality, slower)
          No → Is it a live recording with crowd noise?
                  Yes → VR-DeEcho (noise suppression built-in)
                  No → Demucs v4 (full 4-stem separation)
```### Recommended Settings for Maximum Quality```
pytho
n
# UVR Settings → "Choose MDX-Net Model"
# Process Method: "MDX-Net"
# Segment Size: 256 (lower = more VRAM, better quality)
# Overlap: 0.75 (higher = smooth```
bas
h
sudo pacman -Syu
sudo pacman -S ffmpeg python-pip tk python-virtualenv

git clone https://github.com/Anjok07/ultimatevocalremovergui.git
cd ultimatevocalremovergui
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python UVR.py
```i
z
e
: 1
Expect 5-10x slower processing
```### Batch Processing Configuration```
bas
h
# For processing entire folders via the GUI:
# 1. Click "Input" → Select Folder
# 2. Enable "Batch Processing" checkbox
# 3. Set output folder
# 4. Choose "Same as input" or custom directory
# 5. Select model and click ```
dockerfil
e
# Dockerfile for UVR headless processing
FROM nvidia/cuda:12.1-runtime-ubuntu22.04

RUN apt-get update && apt-get install -y \
    python3.10 python3-pip python3-venv ffmpeg \
    git wget && rm -rf /var/lib/apt/lists/*

WORKDIR /app
RUN git clone https://github.com/Anjok07/ultimatevocalremovergui.git .
RUN python3 -m venv venv
RUN . venv/bin/activate && pip install -r requirements.txt

# Pre-download models to avoid runtime downloads
RUN . venv/bin/activate && python -c "
import wget
import os
os.makedirs('models', exist_ok=True)
# Models auto-download on first use
"

ENTRYPOINT ["venv/bin/python", "separate.py"]
```(clean, isolated vocals)# Step 2: Process through RVC
python infer-web.py --input Vocals.wav --model weights/MyVoice.pth --pitch 0# Step 3: Mix converted vocals back with UVR instrumental output
ffmpeg -i RVC_Converted_Vocals.wav -i UVR_Instrumental.wav \
       -filter_complex "[0:a][1:a]amix=inputs=2:duration=longest" \
       -ac 2 -ar 44100 Final_Cover.wav
```### Integration with GPT-SoVITS```
pytho
n
# GPT-SoVITS requires clean vocal input for voice cloning
# Use UVR to preprocess training data# Step 1: Batch extract vocals from training samples
# UVR Settings:
#   Model: UVR-MDX-NET Inst Main (extracts vocals as ```
bas
h
# Build and run
docker build -t uvr-gpu .
docker run --gpus all -v $(pwd)/input:/input -v $(pwd)/output:/output uvr-gpu \
    --input /input/song.mp3 --output /output --model MDX-Net
```sli
c
e
s
for SoVITS training
python webui.py --voice_slices slices/
```### Integration with demucs CLIUVR uses Demucs internally, but you can also chain the CLI version:```
bas
h
# Use demucs directly for 4-stem separation
dem```
tex
t
altgraph==0.17.3
audioread==3.0.0
einops==0.6.0
julius==0.2.7
librosa==0.9.2
matchering==2.0.6
omegaconf==2.2.3
opencv-python==4.6.0.66
psutil==5.9.4
pydub==0.25.1
pyrubberband==0.3.0
pytorch_lightning==2.0.0
resampy==0.4.2
scipy==1.9.3
torch
onnxruntime
onnxruntime-gpu
numpy==1.23.5
``` output to multiple formats
for file in UVR_Output/*.wav; do
    base=$(basename "$file" .wav)    # High-quality MP3
    ffmpeg -i "$file" -codec:a libmp3lame -b:a 320k "${base}.mp3"    # FLAC for archival
    ffmpeg -i "$file" -codec:a flac "${base}.flac"    # OGG for streaming
    ffmpeg -i "$file" -codec:a libvorbis -q:a 6 "${base}.ogg"
done
```## Benchmarks and Real-World Performance### Processing Speed ComparisonAll tests performed on a 4-minute 44.1kHz stereo WAV file:| Hardware | MDX-Net | MDX23C | Demucs v4 | VR-DeEcho |
|----------|---------|--------|-----------|-----------|
| RTX 4090 (24GB) | 18s | 42s | 55s | 12s |
| RTX 3060 (12GB) | 35s | 85s | 110s | 22s |
| GTX 1060 (6GB) | 72s | 180s | 240s | 45s |
| Apple M3 Pro | 28s | 68s | 90s | 18s |
| Ryzen 9 7950X (CPU) | 180s | 420s | 540s | 110s |### Separation Quality (SDR — Signal-to-Distortion Ratio)Higher SDR = better separation quality, tested on MUSDB18 benchmark:| Model | Vocals SDR | Instrumental SDR | Ar```
yam
l
# Decision flow for model selection
Is the track a standard pop/rock song?
  Yes → MDX-Net Main (best balance of speed and quality)
  No → Is it a complex orchestral mix?
          Yes → MDX23C (higher quality, slower)
          No → Is it a live recording with crowd noise?
                  Yes → VR-DeEcho (noise suppression built-in)
                  No → Demucs v4 (full 4-stem separation)
```proc
e
s
s
1,000+ samples for RVC/GPT-SoVITS training. VR-DeEcho model removes background bleed in voice recordings.
3. **Remix production** — Extract stems from old recordings that lack multi-track masters. MDX23C produces the cleanest instrumental tracks for sampling.
4. **Podcast editing** — Separate co-host voices when only a mixed recording exists. Note: UVR is not designed for speech separation — see Limitations.## Advanced Usage and Prod```
pytho
n
# UVR Settings → "Choose MDX-Net Model"
# Process Method: "MDX-Net"
# Segment Size: 256 (lower = more VRAM, better quality)
# Overlap: 0.75 (higher = smoother transitions, slower)
# Denoise: Enabled
# Post-Process: Enabled

# For GPU with 8GB+ VRAM:
Segment Size: 256
Overlap: 0.85
Batch Size: 4

# For GPU with 6GB VRAM:
Segment Size: 128
Overlap: 0.50
Batch Size: 1

# For CPU-only:
Segment Size: 64
Overlap: 0.25
Batch Size: 1
Expect 5-10x slower processing
```hun
k
s
/# Option 4: Close other GPU applications
# UVR requires exclusive VRAM access during processing
# Close browsers, games, and other CUDA applications
```### Model Management and Storage```
bas
h
# UVR stores models in the application directory
# Windows: C:\Users\<User>\AppData\Local\Programs\Ultimate Vocal Remover\models\
# macOS: /Applications/Ultimate Vocal Remover.app/Contents/models/
# Linux: ./models/# To migrate models between machines:
# Copy the entire models/ directory
rsync -av```
bas
h
# For processing entire folders via the GUI:
# 1. Click "Input" → Select Folder
# 2. Enable "Batch Processing" checkbox
# 3. Set output folder
# 4. Choose "Same as input" or custom directory
# 5. Select model and click "Start Processing"

# Output file structure:
input/
  track1.mp3
  track2.mp3
tracks/
  track1/Instrumental_track1.wav
  track1/Vocals_track1.wav
  track2/Instrumental_track2.wav
  track2/Vocals_track2.wav
```_PATH
= "/path/to/UVR.py"
MODEL = "MDX-Net Main"
INPUT_DIR = "./input"
OUTPUT_DIR = "./output"def process_file(input_path: str, output_dir: str) -> dict:
    """Process a single audio file through UVR."""
    cmd = [
        "python", UVR_PATH,
        "--input", input_path,
        "--output", output_dir,
        "--model", MODEL,
        "--segment", "256",
        "--overlap", "0.75"
    ]    result = subprocess.run(cmd, capture_output=True, text=True)    return {
        "input": input_path,
        "success": result.returncode == 0,
        "stderr": result.stderr if res```
bas
h
# Pipeline: Original Song → UVR → Vocals Only → RVC → AI Voice Cover
#           Original Song → UVR → Instrumental → Final Mix

# Step 1: Extract clean vocals with UVR
# Model: MDX-Net Main
# Settings: Segment 256, Overlap 0.75, Denoise ON
# Output: Vocals.wav (clean, isolated vocals)

# Step 2: Process through RVC
python infer-web.py --input Vocals.wav --model weights/MyVoice.pth --pitch 0

# Step 3: Mix converted vocals back with UVR instrumental output
ffmpeg -i RVC_Converted_Vocals.wav -i UVR_Instrumental.wav \
       -filter_complex "[0:a][1:a]amix=inputs=2:duration=longest" \
       -ac 2 -ar 44100 Final_Cover.wav
```_
== "__main__":
    main()
```### Monitoring and Logging```
pytho
n
# UVR writes processing logs accessible via the GUI:
# Settings Button → Error Log → View Details# For headless deployments, wrap with logging:
import sys
import logging
from datetime import datetimelog_file = f"uvr_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s [%(levelname)s] %(message)s',
    handlers=[
        logging.FileHandler(log_file),
        logging.StreamHandler(sys.stdout)
    ]
)# Monitor GPU utilization during processing
watch -n 1 nvidia-smi
```## Comparison with Alternatives| Feature | Ultimate```
pytho
n
# GPT-SoVITS requires clean vocal input for voice cloning
# Use UVR to preprocess training data

# Step 1: Batch extract vocals from training samples
# UVR Settings:
#   Model: UVR-MDX-NET Inst Main (extracts vocals as byproduct)
#   Or: MDX-Net Main → keep Vocals output

# Step 2: Feed clean vocals to GPT-SoVITS slicing
python slice_audio.py --input UVR_Vocals/ --output slices/ --threshold -34

# Step 3: Use slices for SoVITS training
python webui.py --voice_slices slices/
```mi
t
e
d
| Limited |
| **Windows Installer** | Standalone .exe | pip/conda only | pip/conda only | pip only |
| **Vocal-only separation** | Yes (specialized models) | 2-stem mode | 2-stem mode | 4-stem only |
| **Denoise processing** | Built-in (VR-DeEcho) | No | No | No |
| **Batch processing** | GUI + CLI | CLI only | CLI only | CLI only |
| **Time-stretch/Pitch-shift** | Built-in (Rubber Band) | No | No | No |
| **Active maintenance** | Yes (2025 releases) | Archived (Jan 2025) | Limited | Minimal |
| **License** | MIT | MIT | MIT | MIT |**Key distinction:** UVR is the only tool ```
bas
h
# Use demucs directly for 4-stem separation
demucs --mp3 --two-stems=vocals input.mp3

# Then use UVR for additional vocal cleanup
# UVR can process demucs output for finer vocal/instrumental splits
python separate.py --input demucs_vocals.wav --model VR-DeEcho --output cleaned/
```t
AssessmentUVR is purpose-built for **music vocal separation**. It is not the right tool for every audio task:1. **Speech separation** — UVR models are trained on music datasets (MUSDB18, internal datasets). Separating two people talking over each other produces poor results. For speech separation, use pyannote.audi```
bas
h
# Convert UVR output to multiple formats
for file in UVR_Output/*.wav; do
    base=$(basename "$file" .wav)

    # High-quality MP3
    ffmpeg -i "$file" -codec:a libmp3lame -b:a 320k "${base}.mp3"

    # FLAC for archival
    ffmpeg -i "$file" -codec:a flac "${base}.flac"

    # OGG for streaming
    ffmpeg -i "$file" -codec:a libvorbis -q:a 6 "${base}.ogg"
done
```ou
r
c
e
audio should be at least 256kbps MP3 or lossless WAV/FLAC.4. **Extreme genre outliers** — Death metal growling, throat singing, and heavily autotuned vocals sometimes leak into the instrumental track because these timbres were rare in the training data.5. **AMD GPU limitations** — DirectML support exists on a separate branch but is less mature than CUDA. AMD users should expect occasional crashes or slower performance compared to equivalent Nvidia cards.6. **No VST/AU plugin format** — UVR runs as a standalone application. It cannot be loaded as a plugin inside Ableton, Logic, or FL Studio. Use external audio routing or process stems beforehand.## Frequently Asked Questions**Q: Can I run UVR without a GPU?**
Yes. UVR falls back to CPU processing automatically. Expect 5–10x slower speeds. A modern 8-core CPU processes a 4-minute track in approximately 3 minutes with the MDX-Net model. Set segment size to 64 or lower to fit CPU memory constraints.**Q: Why does UVR install to C:\ drive only on Windows?**
The installer bundles Python, PyTorch, and FFmpeg into a fixed-path directory structure. Moving the installation breaks hardcoded relative paths between the runtime and model directories. The development team is aware of this limitation.**Q: Which model produces the cleanest instrumental track?**
MDX23C consistently scores highest on SDR benchmarks (9.42 vocal SDR on MUSDB18). For most pop/rock tracks, MDX-Net Main provides the best balance of quality and speed. Test multiple models on a 30-second clip before processing full albums.**Q: How do I process FLAC, M4A, or OGG files?**
Install FFmpeg and ensure it is available in your system PATH. UVR uses FFmpeg as a backend decoder for all non-WAV formats. On Linux, `sudo apt install ffmpeg`. On macOS, `brew install ffmpeg`. The Windows installer bundles FFmpeg automatically.**Q: Can I use UVR output for commercial rel```
pytho
n
# If you encounter "CUDA out of memory" errors:

# Option 1: Reduce segment size in GUI
# Settings → Segment Size → Drop from 256 to 128 or 64

# Option 2: Enable "Use CPU for secondary model"
# This offloads post-processing to CPU, saving VRAM

# Option 3: Process in chunks via command line
python separate.py \
    --input long_track.wav \
    --model MDX-Net \
    --segment 64 \
    --overlap 0.25 \
    --output chunks/

# Option 4: Close other GPU applications
# UVR requires exclusive VRAM access during processing
# Close browsers, games, and other CUDA applications
```agai
n
s
t
the release page, or build from source if preferred.**Q: How do I update models without reinstalling UVR?**
Open UVR and click the "Download Center" button. New models appear here as they are released by the development team. Click the download icon next to each model. Models are stored independently of the application binary.## ConclusionUltimate Vocal Remover fills a gap that CLI-only libraries cannot: accessible, high-quality vocal separation with a visual interface and curated model selection. For producers building AI voice pipelines, karaoke operators processing hundreds of tracks, or devel```
bas
h
# UVR stores models in the application directory
# Windows: C:\Users\<User>\AppData\Local\Programs\Ultimate Vocal Remover\models\
# macOS: /Applications/Ultimate Vocal Remover.app/Contents/models/
# Linux: ./models/

# To migrate models between machines:
# Copy the entire models/ directory
rsync -avz --progress models/ user@new-server:/opt/uvr/models/

# Models range from 50MB to 500MB each
# Full model set: ~8GB download, ~12GB on disk
``` recommendations
4. Follow [dibi8 on Telegram](https://t.me/dibi8channel) for weekly AI audio tool guides







## Recommended Hosting & InfrastructureBefore you deploy any of the tools above into production, you'll need solid infrastructure. Two options dibi8 actually uses and recommends:- **DigitalOcean
** — $200 free credit for 60 days across 14+ global regions. The default option for indie devs running open-source AI too```
pytho
n
#!/usr/bin/env python3
"""Batch UVR processing script for production workflows."""

import os
import subprocess
import json
import logging
from pathlib import Path

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("uvr-batch")

UVR_PATH = "/path/to/UVR.py"
MODEL = "MDX-Net Main"
INPUT_DIR = "./input"
OUTPUT_DIR = "./output"

def process_file(input_path: str, output_dir: str) -> dict:
    """Process a single audio file through UVR."""
    cmd = [
        "python", UVR_PATH,
        "--input", input_path,
        "--output", output_dir,
        "--model", MODEL,
        "--segment", "256",
        "--overlap", "0.75"
    ]

    result = subprocess.run(cmd, capture_output=True, text=True)

    return {
        "input": input_path,
        "success": result.returncode == 0,
        "stderr": result.stderr if result.returncode != 0 else None
    }

def main():
    os.makedirs(OUTPUT_DIR, exist_ok=True)
    results = []

    for file in Path(INPUT_DIR).glob("*"):
        if file.suffix.lower() in {".mp3", ".wav", ".flac", ".m4a"}:
            logger.info(f"Processing: {file.name}")
            result = process_file(str(file), OUTPUT_DIR)
            results.append(result)

    # Save batch report
    with open(f"{OUTPUT_DIR}/batch_report.json", "w") as f:
        json.dump(results, f, indent=2)

    success_count = sum(1 for r in results if r["success"])
    logger.info(f"Complete: {success_count}/{len(results)} files processed")

if __name__ == "__main__":
    main()

pytho n

UVR writes processing logs accessible via the GUI: #

Settings Button → Error Log → View Details #

For headless deployments, wrap with logging: #

import sys import logging from datetime import datetime

log_file = f"uvr_{datetime.now().strftime(’%Y%m%d_%H%M%S’)}.log" logging.basicConfig( level=logging.INFO, format=’%(asctime)s [%(levelname)s] %(message)s’, handlers=[ logging.FileHandler(log_file), logging.StreamHandler(sys.stdout) ] )

Monitor GPU utilization during processing #

watch -n 1 nvidia-smi

📦 资源信息