What is the minimum VRAM needed to run ComfyUI with SDXL?

The minimum is 8GB VRAM (such as an RTX 3060 or RTX 4060) to run SDXL at modest quality. A comfortable setup is 16GB+ VRAM (RTX 4080 or 4090), which also handles Flux models and LoRA stacking.

How do you install ComfyUI?

Clone the repo with git clone, create a Python virtual environment, run pip install -r requirements.txt, then launch with python main.py. The browser interface then opens at http://localhost:8188.

Which AI models should you download first for ComfyUI in 2026?

Three priorities: SDXL base + refiner (~13GB, the versatile workhorse), Flux.1 Schnell (~24GB, fast prototyping), and Stable Diffusion 3.5 Large (~17GB, best photorealism). Add 2-3 LoRAs for your style and skip the dozens of Civitai personas until you have a reason.

How do you keep a character's face consistent across images in ComfyUI?

Combine a character-trained LoRA with IPAdapter using a reference image. Train your own LoRA on 15-20 source images of the character, and IPAdapter handles maintaining the consistent face across scenes, taking about 12-20 seconds per image.

Can ComfyUI generate video, and which models does it use?

Yes. Using an image-to-video workflow, ComfyUI animates a still into a 5-second clip (24 frames at 8fps), taking 60-90 seconds on an RTX 4090. The recommended 2026 models are Stable Video Diffusion XT and LTX Video.

ComfyUI Workflow 2026

Meta Description: ComfyUI hit 106K stars in 2026. Setup guide + 5 production-ready workflow templates (text-to-image, inpaint, upscale, video, character consistency).

ComfyUI became the default tool for serious AI image generation in 2026. Node-based, reproducible, automatable. This guide gets you from zero to 5 working production workflows in an afternoon.

⚡ TL;DR #

Why ComfyUI: workflow reproducibility, automation, complex pipelines. 106K stars in 2026.

Hardware: 8GB VRAM minimum, 16GB+ comfortable.

Setup time: 1 hour to first generation.

5 templates below: text-to-image, inpaint, upscale chain, video, character consistency.

Setup (1 Hour) #

Step 1: Install (15 min) #

# Clone + setup venv
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
python main.py

Browser opens at http://localhost:8188.

Step 2: Download models (30 min) #

Drop into ComfyUI/models/checkpoints/:

SDXL base + refiner (most versatile, ~13GB total)
Flux.1 Schnell (fast prototyping, ~24GB)
SD 3.5 Large (best photorealism, ~17GB)

Optional but useful:

2-3 LoRAs for your style (Civitai, search “2026 SDXL trending”)
ControlNet models (OpenPose, Depth, Canny — each ~1.5GB)

Step 3: First generation (15 min) #

Drag the default workflow from ComfyUI/workflows/
Load SDXL checkpoint
Type prompt
Queue prompt

Done. You’re now generating. The hard part starts now: building reusable workflows.

5 Production-Ready Templates #

Template 1: Text-to-Image (SDXL base + refiner) #

Use: standard generation, daily workhorse. Nodes: Load Checkpoint → CLIP Text Encode (prompt) → KSampler (base 70%) → KSampler (refiner 30%) → VAE Decode → Save Image. Time per image: 8-15 sec on RTX 4090.

Template 2: Inpaint Mask (selective editing) #

Use: change a specific area without regenerating the whole image. Nodes: Load Image → MaskEditor → CLIP Text Encode (new content) → InpaintModelConditioning → KSampler → Composite back. Time per edit: 5-10 sec.

Template 3: 4x Upscale Chain (4K output) #

Use: take 1024×1024 generation → 4096×4096 production-ready output. Nodes: Generate at 1024 → Upscale Latent 2x → KSampler refine pass → Upscale Latent 2x again → Final refine. Time: 30-45 sec per image at 4K.

Template 4: Image-to-Video (5 sec clip) #

Use: animate a still into 5-second motion clip. Nodes: Load SVD model → Load Image → Image to Video (24 frames @ 8fps) → VAE Decode → Save Video. Time: 60-90 sec on RTX 4090. 2026 model: Stable Video Diffusion XT or LTX Video.

Template 5: Character Consistency (LoRA + IPAdapter) #

Use: generate same character across many scenes with consistent face. Nodes: Load LoRA (character-trained) + IPAdapter (reference image) → CLIP Text Encode → KSampler → output. Time per image: 12-20 sec. Trick: train your own LoRA on 15-20 source images of one character — IPAdapter handles the rest.

All five templates are saveable as .json. Drag onto ComfyUI canvas to load. Share with team via git or Discord.

The community publishes thousands of workflows at:

ComfyUI subreddit
OpenArt.ai workflow library
Civitai (look for “ComfyUI workflow” filter)

Bring 2-3 community workflows in and customize for your style. That’s how most production artists work — not building from scratch.

Recommended Infrastructure #

For serious ComfyUI work:

DigitalOcean — $200 credit, GPU droplets (H100/L40S/A100)
HTStack — Hong Kong VPS for low-latency Asia generation

Affiliate links — same price, supports dibi8.com.

Conclusion #

ComfyUI’s learning curve is real but the payoff is real. Once you have 5 reusable workflows, you’re shipping faster than any single-shot tool. The 2026 ecosystem (Flux, SD 3.5, IPAdapter, ControlNet improvements) is the most capable image-generation stack ever assembled — and ComfyUI is the only tool that orchestrates it cleanly.

Start with the 5 templates above. Customize. Share. The compound returns of reusable workflows show up after week 2 — when you realize you’re combining nodes faster than you’d write code.

ComfyUI Workflow 2026

⚡ TL;DR #

Setup (1 Hour) #

Step 1: Install (15 min) #

Step 2: Download models (30 min) #

Step 3: First generation (15 min) #

5 Production-Ready Templates #

Template 1: Text-to-Image (SDXL base + refiner) #

Template 2: Inpaint Mask (selective editing) #

Template 3: 4x Upscale Chain (4K output) #

Template 4: Image-to-Video (5 sec clip) #

Template 5: Character Consistency (LoRA + IPAdapter) #

Recommended Infrastructure #

Conclusion #

References & Sources #

📦 Featured in collections

💬 Discussion

⚡ TL;DR #

Setup (1 Hour) #

Step 1: Install (15 min) #

Step 2: Download models (30 min) #

Step 3: First generation (15 min) #

5 Production-Ready Templates #

Template 1: Text-to-Image (SDXL base + refiner) #

Template 2: Inpaint Mask (selective editing) #

Template 3: 4x Upscale Chain (4K output) #

Template 4: Image-to-Video (5 sec clip) #

Template 5: Character Consistency (LoRA + IPAdapter) #

Workflow Sharing #

Recommended Infrastructure #

Conclusion #

References & Sources #

🔗 Related Resources

📦 Featured in collections

💬 Discussion