AI应用工具

MinerU：70.6K 星 — 将任何文档转换为 LLM 就绪的 Markdown

MinerU（70,600+ GitHub star）将 PDF、DOCX、PPTX、XLSX、图像和网页转换为结构化 Markdown 和 JSON，用于 LLM、RAG 和 Agent 工作流程。支持 109 种语言的 OCR、公式到 LaTeX、表格到 HTML，并在 CPU 或 GPU 上运行。

WorldMonitor：面向地缘政治监控的实时全球情报仪表盘

一个实时的AI驱动全球情报仪表盘，聚合新闻、地缘政治事件和基础设施追踪。59K stars。Palantir Gotham的开源替代方案。

VoiceBox：开源AI语音工作室，用于克隆、听写和生成

一个全栈开源AI语音工作室，让您克隆任意语音、生成语音并听写到任何应用。33K stars。在您的机器上本地运行，支持CUDA或Apple Silicon。

MoneyPrinterTurbo：使用 AI 自动生成视频

OpenMontage 评测：世界上第一个开源自主视频制作系统（52 个工具，12 个流程，500 多项技能）

OpenMontage (8.3K+ GitHub stars) is the world's first open-source, agentic video production system. 12 production pipelines, 52 tools, 500+ agent skills. Turn any AI coding assistant into a full video studio — from animated explainers to cinematic trailers to real-footage documentaries. Zero API keys needed for basic output.

ChatTTS 2026：3.93万星开源对话式TTS，支持笑声、停顿和音级别韵律控制

ChatTTS 是专为对话（而非朗读）设计的开源TTS。3.93万GitHub星标，最低4GB显存，RTX 4090上RTF 0.3，支持笑声和停顿的精细韵律控制。2026完整安装+生产环境设置指南。

Understand-Anything：代码库的交互式知识图谱 — 60K+ 星 2026

Understand-Anything 将任何代码库转变为一个可交互的知识图谱，您可以在其中探索、搜索和查询。支持 Claude Code、Codex、Cursor、Copilot、Gemini CLI。在 GitHub 上拥有 60,339 个星标。

MarkItDown：通用文件到 Markdown 转换器——微软面向 LLM 流水线的开源工具 2026

微软 AutoGen 团队的 MarkItDown 可以将 20 多种文件类型转换为供 LLM 使用的 Markdown。使用 pip install markitdown[all]，提供 Python API、LangChain 集成、RAG 流水线和批量处理功能。

知识工作插件：Anthropic 的插件生态系统，赋能 AI 生产力 2026

Knowledge Work Plugins（20,728 颗星）由 Anthropic 打造，为 Claude 扩展了强大的文档编辑、代码分析、网页浏览和文件操作工具。为你的工作流构建自定义插件。

学术研究技能：用 AI 自动化文献综述——31K 星框架 2026

Academic Research Skills（31,628 颗星）自动化研究流水线：搜索论文、提取洞察、综合发现并撰写文献综述。专为 Claude Code 构建，采用模块化技能架构。

品味技能：让 AI 告别平庸输出——Agent 技能框架 2026

Taste Skill 是一个可移植的 Agent 技能框架，通过更强的布局、排版、动效和间距设计，全面提升 AI 生成的前端界面。兼容 Codex、Cursor、Claude Code 和 ChatGPT Images。

从零开始构建 AI 工程：打造生产级 LLM 系统——2026 完整指南

AI Engineering From Scratch（32,771 颗星）是一套全面的课程体系，涵盖 LLM 微调、RAG、Agent 框架和生产部署。学习构建、交付和扩展 AI 系统。

Oh My Pi：将任何树莓派变成智能设备——12K 星项目 2026

Oh My Pi（12,554 颗星）通过一键式设置和自动化配置，将树莓派设备转变为智能家居中心、媒体中心和开发工作站。

Apple 的 Container：Mac 上拥有 37K Stars 的类 Docker 体验

Apple 发布了 container，一款基于 Swift 的工具，可在 Mac 上使用轻量级虚拟机运行 Linux 容器。已获 37K stars，兼容 OCI，需要 macOS 26。

NVIDIA Cosmos：面向物理AI的开源世界模型（10K+星标）

NVIDIA Cosmos 是一个开源的世界模型平台，包含数据集和工具，用于构建物理AI——机器人、自动驾驶汽车、智能基础设施。Cosmos 3 采用混合Transformer架构，统一支持语言、图像、视频、音频和行动生成。提供16B和64B两种模型。

Impeccable：让 AI 生成的 UI 真正好看的编程语言 — 2026 评测

Impeccable（3.7 万星标）是为 AI 编码代理设计的编程语言，包含 23 个命令、41 个检测规则和实时浏览器迭代。通过确定性的设计质量检查修复 AI 生成的 UI 粗糙问题。兼容 Claude Code、Cursor 和 Codex。

字节跳动 UI-TARS Desktop：看得见并控制你电脑的视觉语言 AI Agent——完整设置指南

学习如何部署字节跳动的 UI-TARS Desktop，这是一款视觉语言 AI Agent，可以观看你的屏幕并通过自然语言控制应用程序。包含逐步安装、实际基准测试和与替代方案的比较。

RuView：用于智能建筑的 WiFi 空间智能——Python 命令行界面、实时定位跟踪和网状网络

Learn how to use RuView, the Python-based WiFi spatial intelligence platform that tracks real-time positions, maps building layouts, and optimizes WiFi mesh networks. Step-by-step pip install guide, real-time tracking, and mesh network configuration.

奥德赛：自我托管AI工作站，内置10余种工具——6.5万颗星——完整安装指南2026

奥德赛（69,110个GitHub星标）是一个自助托管的AI工作站，结合了聊天、代理自动化、深度研究、文档编辑、邮件筛选、日历等功能。支持vLLM、llama.cpp、Ollama、OpenRouter、OpenAI和GitHub Copilot。提供Docker和原生Linux/macOS安装方式。

TurboVec：Rust 驱动的高性能向量索引 — AI 搜索指南 2026

TurboVec (RyanCodrai/turbovec) 是基于 Google Research TurboQuant 算法的向量索引，用 Rust 编写并提供 Python 绑定。可无缝替换 LangChain、LlamaIndex、Haystack 和 Agno。使用 4 位量化和 SIMD 加速实现高性能向量搜索。涵盖 Python 集成、基准测试和生产部署。

Odysseus：9天涨6.3万 GitHub Star 的自部署 AI 工作台 — 2026 完整安装指南

Odysseus 是开源、隐私优先的 AI 工作台（9天6.3万 star，MIT 协议）。一条 Docker 命令即可获得聊天、AI 智能体、深度调研、邮件自动分类、日历、笔记和模型 Cookbook——全部运行在自己的硬件上。本文详解安装步骤、核心功能及与 ChatGPT Plus 的对比。

nanochat: Karpathy 的 100 美元 ChatGPT — 单 GPU 上自建 AI 聊天应用

nanochat（54,800 GitHub 星标）是 Andrej Karpathy 的开源 ChatGPT 克隆项目，可在单张 100 美元 GPU 上运行。使用 SGLang 从头训练或通过 vLLM 提供预训练模型服务。包含设置指南、训练基准和部署示例。

ComfyUI 工作流程 2026：初学者设置 + 5 个生产就绪模板

ComfyUI hit 106K GitHub stars in 2026. Beginner-friendly setup guide, model recommendations for 2026, and 5 production-ready workflow templates (text-to-image, inpaint, upscale, video, character consistency).

Supertonic 评测：99M 参数本地 TTS，31 语言、ONNX 跑 CPU（2026）

Supertonic（GitHub 9.9K+ stars）由韩国语音 AI 公司 Supertone Inc. 推出，是 2026 年最具说服力的本地多语种 TTS 模型。11551 万参数、31 种语言（含韩语/日语/越南语/中文）、44.1kHz 录音棚级音质、10 个表情标签，通过 ONNX Runtime 跑在 CPU 上——无云端、无 API、无 GPU。Python、Node.js、浏览器（WebGPU/WASM）、iOS、Android、Rust、Flutter 全平台 SDK。完整功能拆解、安装、代码示例与 2026 本地 TTS 横向对比。

Stable Diffusion WebUI 2026（AUTOMATIC1111）：163k 星自托管图像生成完整指南

AUTOMATIC1111 stable-diffusion-webui 是 163k 星的自托管 SD/SDXL 图像生成事实标准 UI。2026 完整安装+生产指南：txt2img / img2img / 修复 / 扩展 / LoRA / ControlNet、硬件要求、替代品（Forge / SD.Next）。

ComfyUI 2026：114k 星节点式 AI 图像/视频/音频工作流引擎完整指南

ComfyUI 是 114k 星节点式可视化工作流引擎，支持 SD/SDXL/Flux/Wan/Hunyuan 等。支持图像、视频、音频、3D 生成。2026 完整安装指南：节点基础、workflow JSON 导入、ComfyUI Manager、何时 ComfyUI 胜过 AUTOMATIC1111。

Mistral AI 2026: 使用8x7B MoE架构部署生产级本地LLM — 完整设置指南

本地部署生产级 Mistral 大模型——8x7B MoE 架构、mistral-inference、vLLM 高吞吐服务、GGUF 量化、函数调用与微调。完整设置指南。

终极人声去除器：24.7K+ 星 — 2026 完整安装指南

Ultimate Vocal Remover (UVR) is a GUI application for vocal removal using deep neural networks. Compatible with demucs, RVC, GPT-SoVITS. Covers Windows, macOS, Linux installation, model selection, batch processing, and production hardening.

浑元视频：12.1K+ 星标 — 生产部署指南 2026

HunyuanVideo (HYV) is an open-source video generation framework by Tencent with 13B parameters. Supports ComfyUI, Diffusers, Gradio API. Covers Docker setup, FP8 quantization, multi-GPU inference, and production hardening.

WhisperX：22K+ 星 — 生产环境 ASR 设置指南 2026

WhisperX is an open-source ASR toolkit with word-level timestamps and speaker diarization. Compatible with faster-whisper, pyannote.audio, and OpenAI Whisper models. Covers Docker deployment, Python API, benchmarks, and production hardening.

Wan 2.1：16.1K+ 星标——2026年开放视频生成深度剖析对比 HunyuanVideo、CogVideo

Wan 2.1 is an open suite of video foundation models by Alibaba with SOTA performance. Supports ComfyUI, Diffusers, and Gradio. Covers T2V, I2V, video editing, and text generation with 1.3B and 14B parameter variants.

VoiceCraft：8.5K+ 星 — 2026 年零样本语音编辑对比 GPT-SoVITS、XTTS

VoiceCraft is a token infilling neural codec language model for zero-shot speech editing and TTS. Compatible with GPT-SoVITS, Coqui TTS, and RVC. Covers setup, benchmarks, Docker deployment, and comparison tables.

VideoReTalking：7.2K+ 星 — AI 口型同步视频编辑设置指南 2026

VideoReTalking (VRT) is an audio-based lip synchronization system for talking head video editing. Compatible with RVC, GPT-SoVITS, and Coqui TTS. Covers installation, inference, Gradio WebUI, production deployment, and benchmarks vs Wav2Lip and SadTalker.

RVC：部署 AI 语音转换，拥有 35K+ 星标 — 2026 年 10 分钟训练设置

RVC (Retrieval-based Voice Conversion) is a VITS-based voice conversion framework compatible with GPT-SoVITS, Coqui TTS, and demucs. This tutorial covers Docker deployment, training pipelines, API integration, and production hardening.

OpenAI Whisper：99.8K+ 星标

OpenAI Whisper (ASR) robust speech recognition via large-scale weak supervision. Compatible with WhisperX, faster-whisper, LibreTranslate. Covers whisper tutorial, whisper vs whisperx, speech recognition setup, whisper python, whisper docker.

Open-Sora：29K+ 星 — 开源视频生成设置指南 2026

Open-Sora is an open-source video generation framework with 29K+ GitHub stars. Covers Docker setup, ComfyUI integration, Stable Diffusion compatibility, production deployment, benchmarks vs HunyuanVideo, CogVideo, and Wan.

MeloTTS：7.4K+ 星标 — 2026 年多语言 TTS 基准对比 Coqui TTS、ChatTTS、Bark

MeloTTS is a high-quality multi-lingual text-to-speech library with 7.4K+ stars. Compare benchmarks with Coqui TTS, ChatTTS, and Bark. Covers Python setup, Docker deployment, real-time inference, and production hardening.

Lobe Chat：开源 ChatGPT 用户界面替代方案，支持 20 多个 LLM 提供商和插件系统 — 2026 设置

Deploy Lobe Chat as your self-hosted ChatGPT alternative. Supports 20+ LLM providers, plugin system, PWA, multi-language UI. Complete Docker setup guide with benchmarks and comparisons.

LibreTranslate：自托管翻译 API，超过 14.4K 星 —— 生产部署指南 2026

LibreTranslate (LT) is a free, open-source machine translation API powered by Argos Translate. Supports Docker, CUDA GPU, 30+ languages, and offline deployment. Covers setup, benchmarks, monitoring, and integration with OpenAI Whisper, Coqui TTS, and Argos Translate.

InvokeAI：27.2K+ 星 — 2026 完整安装指南

InvokeAI (Invoke) is the leading creative engine for Stable Diffusion models with an industry-leading WebUI. Compatible with SD 1.5, SDXL, FLUX, and ControlNet. Covers Docker install, workflow setup, benchmarks vs AUTOMATIC1111 and ComfyUI, and production hardening.

GPT-SoVITS：57.5K+ 星标 — 部署 AI 语音克隆生产环境设置指南 2026

GPT-SoVITS (GSV) is a few-shot voice cloning and TTS tool with zero-shot capabilities. Supports ComfyUI, RVC, and MeloTTS integration. Covers Docker deployment, voice training, API setup, and production hardening.

faster-whisper：2.3万星，比原版快4倍的语音转文字工具——与WhisperX、whisper.cpp的基准测试对比...

faster-whisper (SYSTRAN) reimplements OpenAI Whisper via CTranslate2 for 4x speedup. Covers faster whisper tutorial, benchmark data, Docker setup, Python API, VAD filter, batch processing, and production hardening with WhisperX and whisper.cpp integration.

Demucs：音乐源分离，超过1万颗星 — 2026年与UVR、Spleeter的比较

Demucs is a hybrid spectrogram and waveform source separation model by Meta AI. Compatible with Ultimate Vocal Remover, RVC, GPT-SoVITS. Covers demucs tutorial, demucs vs uvr, demucs docker setup, and production benchmarks.

Coqui TTS：45.3K+ 星标 — 2026 年深度学习 TTS 工具包基准对比 ChatTTS、MeloTTS、Bark

Coqui TTS is an open-source deep learning toolkit for Text-to-Speech. Supports 1100+ languages, XTTS v2 voice cloning, VITS end-to-end synthesis. Benchmarks against ChatTTS, MeloTTS, Bark with real RTF numbers, Docker deployment, and production configs.

CogVideo：12.7K 星 — 完整文字到视频设置指南 2026

CogVideo (CogVideoX) is a text and image-to-video generation model from Zhipu AI. Supports ComfyUI, Diffusers, SAT, and Wan/HunyuanVideo/Open-Sora integration. Covers installation, Docker, inference, fine-tuning, and benchmarks.

Baetyl：云原生边缘 AI 计算平台，将模型部署到物联网设备 — 2026 安装指南

Deploy Baetyl v2.4 to bring Kubernetes-native edge computing to IoT devices. AI model inference, MQTT/BACnet support, OTA updates, K3s runtime, and cloud-edge synchronization.

AI 图像生成工具：Midjourney、DALL-E、Stable Diffusion 及更多完整指南

Complete guide to AI image generation tools in 2025. Compare Midjourney v7, DALL-E 3, Stable Diffusion 3.5, Adobe Firefly, FLUX, and Leonardo.ai with features and pricing.

AI 搜索工具比较：Perplexity vs Google Gemini vs ChatGPT 搜索 2025

Compare the top AI search engines of 2025 — Perplexity, Google Gemini, ChatGPT Search, Copilot, and more. See accuracy, speed, and source coverage side by side.

2025年最佳人工智能数据分析工具：ChatGPT、Julius、Tableau AI及更多

Discover the best AI data analysis tools of 2025 — ChatGPT Advanced Data Analysis, Julius AI, Tableau Einstein, Copilot in Excel, and more. Compare features, pricing, and use cases.

2025年最佳人工智能开发工具与IDE插件：超越代码生成

Discover the best AI developer tools and IDE plugins of 2025 — GitHub Copilot, Cursor, Sourcegraph Cody, Tabnine, Codeium, and more. Compare features, pricing, and IDE support.

2025年最佳人工智能翻译工具：谷歌翻译 vs DeepL vs ChatGPT 对比

Compare the best AI translation tools of 2025 — Google Translate, DeepL, ChatGPT, Microsoft Translator, Smartcat, and Reverso. See quality, pricing, and language coverage side by side.

2025年最佳AI语音工具：文本转语音与语音转文本对比

Compare the best AI voice tools of 2025 for text-to-speech and transcription. ElevenLabs, Murf.ai, Whisper, Otter.ai, and more with pricing, accuracy, and use cases.

2025年最佳AI写作助手对比：Jasper、Copy.ai、Writesonic与ChatGPT全面评测

2025年AI写作助手全面对比评测，深入分析Jasper、Copy.ai、Writesonic、ChatGPT、Claude、Notion AI等工具的功能、定价与最佳使用场景。

2025年最佳AI会议助理工具：Otter.ai、Fireflies、Fathom及更多对比

Compare the best AI meeting assistant tools of 2025. In-depth reviews of Otter.ai, Fireflies.ai, Fathom, Notion AI, Microsoft Copilot for Teams, and Avoma with transcription accuracy, integrations, and pricing.

2025 年最佳 AI 客户服务聊天机器人工具：Intercom、Zendesk AI 及更多

Compare the top AI customer service chatbot platforms in 2025 — Intercom Fin, Zendesk AI, Freshworks Freddy, ChatGPT Enterprise, Drift, and Tidio Lyro. See pricing, features, and ROI data.

2025 年最佳 AI 代码生成器：GitHub Copilot、Cursor 与 Tabnine 对比

Compare the best AI code generators of 2025: GitHub Copilot, Cursor, Tabnine, Amazon CodeWhisperer, and more. Features, pricing, and use cases explained.

阅读 PostgreSQL 中的 EXPLAIN ANALYZE 输出而不迷失

PostgreSQL EXPLAIN ANALYZE tutorial. Learn query plan interpretation, bottleneck detection, and database performance optimization.

经典电影《Roop》为什么会消亡？

FaceFusion replaces Roop: modular ONNX pipeline, multi-threaded rendering, cross-platform GPU acceleration — the open-source AI video face-swap engine.

WiFi-Forge — 一个安全、合法的WiFi黑客学习沙盒

WiFi Forge: safe WiFi hacking lab for security research. Learn penetration testing, wireless security and ethical hacking in a controlled environment.

Toprank：用Claude Code驱动的SEO+GEO+ADS一站式增长引擎

Toprank 是一款基于 Claude Code 构建的开源 SEO/GEO/ADS 增长工具，自动化关键词研究、内容生成、排名监控与广告投放优化，帮助网站流量翻倍。

Terax AI：理解你的轻量级 AI 终端模拟器

Discover Terax AI, a 7 MB AI-native terminal emulator built on Tauri 2 + Rust. Features natural language commands, inline AI assistance, smart autocomplete, and cross-shell support for bash, zsh, fish, and PowerShell.

Python 上下文管理器：你真正需要的三种情况

Python context managers: the three cases you actually need. Master with statements, contextlib and custom context managers for better resource management.

Pixelle-Video 评论：AI 自动短视频生成器 — 从一个主题到完整视频

Pixelle-Video is an AI-powered automatic short video engine. Input a topic and get a complete video with script, AI images, voiceover, and BGM.

ML 系统书籍：麻省理工学院出版社关于机器学习系统工程的教科书

The ML Systems Book is an MIT Press textbook covering distributed training, model serving, hardware acceleration, and ML infrastructure. Essential reading for ML engineers.

Midjourney 终极免费平替 (2026)：为什么专业团队都在转向 ComfyUI？

ComfyUI vs Midjourney 深度对比（2026）：节点式工作流 vs 黑盒订阅，拆解图像质量、速度、成本和可控性四个维度，帮你选出适合专业用途的 AI 绘图工具。

JustHireMe：从申请到录用，全程自动化的求职AI

A review of JustHireMe, the open-source AI job-search workbench. A local-first job-intelligence system that auto-scrapes listings, scores AI match quality, and generates tailored resumes and cover letters.

DocuSeal 评论：使用这个开源 DocuSign 将文档签署成本降低 90% Alternative

DocuSeal is a 15.7k-star open-source platform that replaces DocuSign with self-hosted digital document signing, PDF form building, and white-label eSignature workflows.

Caveman：让 Claude Code Token 消耗减少 65%，省钱又提速

Caveman 是一款 Claude Code 技能，通过智能压缩 AI 输出提示词，平均减少 65% Token 消耗，响应速度提升约 3 倍，技术准确性 100% 不变。支持 Claude Code、Cursor、Gemini CLI、Codex 等 30+ AI 编程助手。

Bitcoin-Classic (BTCC): 让普通人也能 CPU 挖矿的比特币复刻版

Bitcoin-Classic (BTCC) 是一个基于 Bitcoin Core v28.1 重建的去中心化数字货币，支持 CPU 挖矿，自带图形界面和内置矿机，让普通人也能体验早期比特币挖矿的乐趣。