What is Pixelle-Video?

Pixelle-Video is an open-source AI-powered automatic short video generation engine. Simply input a topic, and it automatically completes the entire video production pipeline:

  • ✍️ AI Script Writing — Generates video narration based on your topic
  • 🎨 AI Image/Video Generation — Creates matching visuals for every scene
  • 🗣️ AI Voice Synthesis — Converts script to natural speech using TTS
  • 🎵 Background Music — Adds BGM to enhance atmosphere
  • 🎬 One-Click Video Assembly — Renders final video automatically

Zero门槛, zero editing experience — video creation becomes as simple as typing one sentence!

🔗 GitHub: https://github.com/AIDC-AI/Pixelle-Video


Key Features

FeatureDescription
Fully AutomaticInput topic → get complete video
AI Smart ScriptAI writes narration, no manual scripting needed
AI Image GenerationEvery sentence gets a matching AI illustration
AI Video GenerationSupports WAN 2.1 and other video models for dynamic content
Multi TTS SupportEdge-TTS, Index-TTS, and more voice synthesis options
Background MusicBuilt-in BGM support for better atmosphere
Visual TemplatesMultiple templates for unique video styles
Flexible SizesPortrait, landscape, and custom video dimensions
Multiple AI ModelsGPT, Tongyi Qianwen, DeepSeek, Ollama support
ComfyUI ArchitectureModular design, customizable workflows

Video Generation Pipeline

Pixelle-Video uses a modular design with a clear workflow:

Text Input → Script Generation → Image Planning → Frame Processing → Video Synthesis

Each stage supports flexible customization — choose different AI models, audio engines, visual styles to meet personalized creation needs.


Extended Modules

Beyond basic video generation, Pixelle-Video offers powerful extension modules:

👤 Digital Human Avatar

Upload a photo and generate a talking-head video with lip-sync. Supports multiple languages including Korean, Chinese, and English.

🖼️ Image-to-Video

Transform static images into dynamic videos using AI video generation models.

💃 Motion Transfer

Upload a reference video and image to transfer motions — like making a photo dance following video movements.


Supported AI Models

LLM (Script Generation)

  • OpenAI GPT-4o / GPT-4o-mini
  • Alibaba Tongyi Qianwen
  • DeepSeek V3 / R1
  • Ollama (local deployment)
  • Custom API endpoints

Image Generation

  • FLUX (via ComfyUI)
  • Stable Diffusion
  • Qwen Image Generation
  • RunningHub cloud service
  • Nano Banana model

TTS (Voice Synthesis)

  • Edge-TTS (free, multi-language)
  • Index-TTS (voice cloning)
  • ChatTTS
  • Custom ComfyUI TTS workflows

Quick Start

1. Clone Repository

git clone https://github.com/AIDC-AI/Pixelle-Video.git
cd Pixelle-Video

2. Install Dependencies

pip install -r requirements.txt

3. Configure API Keys

Edit config.json with your API keys:

{
  "llm": {
    "api_key": "your-api-key",
    "base_url": "https://api.openai.com/v1",
    "model": "gpt-4o"
  },
  "image": {
    "comfyui_url": "http://127.0.0.1:8188"
  }
}

4. Launch Web UI

python webui.py

Open http://localhost:7860 in your browser.

5. Generate Your First Video

  1. Enter a topic like “Why reading habits matter”
  2. Select your preferred TTS voice
  3. Choose a visual template
  4. Click “Generate Video”
  5. Wait 2-5 minutes for the complete video

Use Cases

ScenarioExample Topic
Knowledge Sharing“10 Python tricks beginners should know”
Product Review“iPhone 16 vs Samsung S24 comparison”
Storytelling“The journey of a startup founder”
Educational Content“How does blockchain work?”
News Commentary“AI trends in 2026”
Book/Movie Review“Lessons from ‘Atomic Habits’”

Video Style Examples

Pixelle-Video supports multiple video styles:

  • 🌄 Documentary Style — Travel, nature, human stories
  • 🔍 Cultural Analysis — Deep dives into trends and phenomena
  • 🔭 Science & Philosophy — Complex concepts made simple
  • 🌱 Personal Growth — Self-improvement, productivity
  • 🧠 Deep Thinking — Psychology, philosophy, reflection
  • 🏯 History & Culture — Ancient wisdom, historical events
  • ☀️ Emotional — Heartwarming stories, inspiration
  • 📜 Fiction Commentary — Novel reviews, character analysis
  • 🧬 Health & Wellness — Medical tips, wellness advice

Technical Architecture

Pixelle-Video is built on ComfyUI architecture:

  • Modular Workflows — Each component (LLM, TTS, image gen) is a separate node
  • Customizable Pipeline — Swap any model or service easily
  • API-First Design — All capabilities exposed via REST API
  • Web UI — Gradio-based interface for easy use
  • Batch Processing — Generate multiple videos simultaneously

Performance & Cost

OptionCostSpeedQuality
Local DeploymentFree (GPU required)FastHigh
RunningHub CloudPay-per-useInstantHigh
Mixed ModeFlexibleBalancedHigh

Recommended setup for beginners:

  • LLM: DeepSeek API (cheap, good quality)
  • Image: RunningHub (no local GPU needed)
  • TTS: Edge-TTS (free, multi-language)

Comparison with Other Tools

FeaturePixelle-VideoHeyGenSynthesiaPictory
Open Source
Free TierLimitedLimitedLimited
Local Deployment
Custom Models
ComfyUI Integration
Voice Cloning
Digital Human
Motion Transfer

Tips for Best Results

  1. Topic Specificity — More specific topics yield better scripts
  2. Template Selection — Match template to content style
  3. Prompt Prefix — Use English prompt prefixes for consistent image style
  4. Voice Preview — Always preview TTS before generating full video
  5. Batch Generation — Generate 3-5 variants and pick the best


Conclusion

Pixelle-Video democratizes video creation by combining LLM, image generation, TTS, and video editing into a single automated pipeline. Whether you’re a content creator, educator, marketer, or developer, this tool can save hours of video production time.

The ComfyUI-based architecture means it’s not just a black-box tool — you can customize every component, swap models, and build your own video generation workflows.

Best for: Content creators, educators, marketers, developers who need quick video production

GitHub: https://github.com/AIDC-AI/Pixelle-Video


Last updated: 2026-05-06