Prefect을 서버 없이 실행할 수 있나요?

네. Prefect는 임시 모드(ephemeral mode)를 지원하여, 서버 없이도 flow 실행이 완전히 로컬에서 이루어집니다. 서버는 스케줄링, 다중 worker 조율, 대시보드가 필요할 때만 요구됩니다.

Prefect에서 flow와 task의 차이는 무엇인가요?

flow는 전체 워크플로우를 정의하는 데코레이터가 붙은 Python 함수이고, task는 flow 내에서 단일 작업 단위를 나타내는 데코레이터 함수입니다. task는 자동으로 재시도, 캐싱, 타임아웃, 동시성 제한 기능을 갖게 되며, flow는 다른 flow를 서브플로우(subflow)로 호출할 수 있습니다.

Prefect는 무료 오픈소스인가요?

네. Prefect 오케스트레이션 엔진과 셀프 호스팅 서버는 Apache-2.0 라이선스로 오픈소스입니다. Prefect Cloud는 RBAC, SSO, 감사 로그 등 유료 엔터프라이즈 기능을 추가로 제공하지만, 셀프 호스팅 서버는 핵심 오케스트레이션 기능을 모두 포함합니다.

Prefect와 Apache Airflow 중 무엇을 선택해야 하나요?

DAG 파일 없이 순수 Python으로 워크플로우를 작성하고 싶고, 아 1초 미만의 task 스케줄링과 현대적인 비동기 엔진을 원한다면 Prefect를 선택하세요. Airflow는 플러그인 생태계가 더 크고(500+ provider 패키지) 성숙했지만 더 무겁고 느리며, task 스케줄링 지연이 1-5초인 반면 Prefect는 100ms 미만입니다.

Prefect는 task 재시도와 실패를 어떻게 처리하나요?

Prefect는 지수 백오프(exponential backoff) 방식의 자동 재시도를 내장하고 있으며, retries, retry_delay_seconds, retry_jitter 등의 파라미터로 task마다 개별 설정이 가능합니다. 모든 task와 flow의 상태가 데이터베이스에 저장되므로, 실행 도중 worker가 크래시하더라도 새 worker가 상태를 잃지 않고 이어서 재개합니다.

Prefect 2026: 데이터 및 AI 파이프라인을 위한 현대적 워크플로우 오케스트레이션 엔진

소개: 당신의 Cron 작업은 시한폭탄이다 #

새벽 3시 17분, 중요한 ETL 파이프라인이 조용히 실패했다. 로그 파일은 아묏도 확인하지 않는 서버에 400MB짜리 텍스트 벽이다. 하류 대시보드는 화요일의 오래된 데이터를 보여주고, 하지만 지금은 목요일이다. 팀은 14시간 후 클라이언트 통화 중 이 실패를 발견한다. 이것은 장애가 아니다. 이것은 관리되지 않는 워크플로우의 기본 상태이다.

2025 데이터 엔지니어링 설문조사에 따륩면, 72%의 데이터 파이프라인 실패가 6시간 이상 감지되지 않으며, cron 기반 스케줄링은 여전히 61% 팀의 주요 오케스트레이션 방법이다. Cron은 실패한 작업을 재시도하지 않는다. 문제가 발생할 때 알리지 않는다. 어떤 하류 시스템이 영향을 받는지 보여주지 않는다. 그저 명령을 실행하고 최선을 기대할 뿐이다.

Prefect 3.x(v3.3.0, 2026-03-20 릴리스)는 구조화되고, 관찰 가능하고, 탄력적인 파이프라인으로 이 혼란을 대체하기 위해 구축된 Python 네이티브 워크플로우 오케스트레이션 엔진이다. ~22,593 GitHub Stars, Apache-2.0 라이선스, 현대적인 비동기 아키텍처를 바탕으로, Prefect는 서브세컨드 작업 스케줄링, 지수 백오프가 있는 자동 재시도, 실시간 관찰 가능성 대시보드, 전체 제어 평면을 셀프 호스팅하는 기능을 제공한다. 순수 Python 코드로. YAML 필요 없음.

이 가이드에서는 5분 안에 Prefect를 설치하고, 오류 처리 기능을 갖춘 프로덕션급 데이터 파이프라인을 구축하고, 팀 전체 오케스트레이션을 위해 Prefect 서버를 배포하고, Apache Airflow와 Dagster와 직접 비교할 것이다.

Prefect란? #

Prefect는 현대적인 데이터 및 AI 팀을 위해 설계된 Python 네이티브 워크플로우 오케스트레이션 프레임워크이다. 모든 Python 함수를 추적 가능하고, 재시도 가능하고, 관찰 가능한 작업으로 변환하여, 종속성, 병렬 실행, 오류 처리, 스케줄링이 있는 복잡한 파이프라인으로 구성할 수 있게 한다. Prefect는 오케스트레이션 계층을 처리한다 — 당신은 표준 Python을 작성한다.

Prefect의 핵심 철학은 네거티브 엔지니어링이다: 재시도, 로깅, 상태 관리, 모니터링을 위한 인프라를 구축하는 데 시간을 쓰는 대신, 비즈니스 로직을 작성하고 나머지는 Prefect가 처리하도록 한다. 프레임워크는 높은 동시성 작업 실행을 위해 asyncio를 기반으로 구축되었고, 흐름 오케스트레이션을 작업 실행과 분리하는 현대적인 클라이언트-서버 아키텍처를 사용한다.

Prefect의 작동 방식: 아키텍처와 핵심 개념 #

Prefect 3.x는 로컬 개발의 단순성과 분산 오케스트레이션의 강력함을 결합한 하이브리드 실행 모델을 도입했다.

흐름(Flows)과 작업(Tasks) #

흐름은 워크플로우를 정의하는 데코레이트된 Python 함수이다. 작업은 흐름 내의 작업 단위 — 역시 데코레이트된 Python 함수이다. 작업은 자동으로 재시도, 캐싱, 타임아웃, 동시성 제한을 받는다. 흐름은 다른 흐름을 호출할 수 있다(서브플로우) — 모듈식 구성을 위해.

h
o
n
from prefect import flow, task

@task(retries=3, retry_delay_seconds=5)
def fetch_data(url: str) -> dict:
    """Fetch data from an API with automatic retry."""
    import requests
    response = requests.get(url, timeout=30)
    response.raise_for_status()
    return response.json()

@flow(name="data-ingestion-pipeline")
def main_flow():
    """Main pipeline orchestrating multiple tasks."""
    raw_data = fetch_data("https://api.example.com/data")
    # ... more tasks

Prefect 서버 #

Prefect 서버는 다음을 제공하는 경량의 셀프 호스팅 가능한 제어 평면이다:

흐름 등록, 스케줄링, 실행 추적을 위한 REST API
실시간 작업 상태 업데이트를 위한 WebSocket 레이어
실행 모니터링, 필터링, 디버깅을 위한 React 기반 대시보드
Slack, PagerDuty, 커스텀 엔드포인트를 위한 Webhook 통합

서버는 단일 머신에서 SQLite로(소규모 팀용) 실행하거나, 프로덕션 워크로드를 위해 PostgreSQL + Redis로 확장할 수 있다.

작업 풀(Work Pools)과 워커(Workers) #

작업 풀은 흐름 제출과 실행을 분리한다. 흐름 실행을 풀에 제출하고, 워커(경량 Python 프로세스)가 이를 받아 실행한다. 이는 다음을 가능하게 한다:

다양한 실행 환경(로컬, Docker, Kubernetes, 서버리스)
큐 깊이에 따른 워커 동적 확장
오케스트레이션과 컴퓨팅의 분리

상태와 상태 전환 #

모든 작업과 흐름 실행은 잘 정의된 상태 머신을 통해 전환된다:

Scheduled → Pending → Running → Completed
                              → Failed → Retrying → Running
                              → Cancelled

전환은 Prefect 데이터베이스에 지속되고 대시보드에서 실시간으로 볼 수 있다. 모든 전환에서 작업을 트리거하는 상태 변경 훅(알림 전송, 정리 실행, 하류 흐름 트리거)을 정의할 수 있다.

설치 및 설정: 5분 만에 서버 실행 #

전제 조건 #

Python 3.9+
pip 또는 uv
Docker (컨테이너 배포용)

단계 1: Prefect 설치 #

a
s
h
python -m venv prefect-env
source prefect-env/bin/activate  # Linux/Mac
# prefect-env\Scripts\activate  # Windows

# Prefect 설치
pip install prefect>=3.3.0

# 설치 확인
prefect version
# Expected output: 3.3.0+

단계 2: Prefect 서버 시작 (셀프 호스팅) #

a
s
h
# 옵션 A: SQLite로 빠른 시작 (단일 머신)
prefect server start

# 서버가 http://localhost:4200에서 시작
# 브라우저에서 대시보드 열기

PostgreSQL을 사용한 팀 배포:

a
s
h
# 옵션 B: PostgreSQL이 있는 Docker Compose
cat > docker-compose.yml << 'EOF'
services:
  prefect-server:
    image: prefecthq/prefect:3.3.0-python3.12
    ports:
      - "4200:4200"
    environment:
      - PREFECT_API_DATABASE_CONNECTION_URL=postgresql+asyncpg://prefect:prefect@postgres:5432/prefect
      - PREFECT_HOME=/home/prefect
    command: prefect server start --host 0.0.0.0
    depends_on:
      - postgres

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: prefect
      POSTGRES_PASSWORD: prefect
      POSTGRES_DB: prefect
    volumes:
      - postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

volumes:
  postgres_data:
EOF

docker-compose up -d

Prefect 클라이언트 연결 구성:

a
s
h
# Prefect CLI를 서버로 향하게 설정
prefect config set PREFECT_API_URL=http://localhost:4200/api

# 연결 확인
prefect version
# Should show: Server: http://localhost:4200/api

단계 3: 첫 번째 흐름 빌드 #

etl_pipeline.py 생성:

h
o
n
from prefect import flow, task
from prefect.tasks import task_input_hash
from prefect.artifacts import create_table_artifact
import requests
import pandas as pd
from datetime import timedelta

@task(retries=3, retry_delay_seconds=[10, 30, 60], cache_key_fn=task_input_hash, cache_expiration=timedelta(hours=1))
def extract_api_data(endpoint: str, api_key: str) -> list[dict]:
    """Extract data from REST API with retry and caching."""
    headers = {"Authorization": f"Bearer {api_key}"}
    response = requests.get(endpoint, headers=headers, timeout=30)
    response.raise_for_status()
    data = response.json()
    print(f"Extracted {len(data)} records from {endpoint}")
    return data

@task(retries=2)
def transform_validate(raw_data: list[dict]) -> pd.DataFrame:
    """Transform and validate raw API data."""
    df = pd.DataFrame(raw_data)
    
    # Data quality checks
    assert not df.empty, "Dataset is empty"
    assert df["id"].is_unique, "Duplicate IDs found"
    
    # Type conversions
    df["created_at"] = pd.to_datetime(df["created_at"])
    df["amount"] = pd.to_numeric(df["amount"], errors="coerce")
    
    # Remove invalid rows
    df = df.dropna(subset=["amount"])
    
    print(f"Validated {len(df)} records after cleaning")
    return df

@task
def load_to_database(df: pd.DataFrame, table_name: str) -> int:
    """Load cleaned data to PostgreSQL."""
    from sqlalchemy import create_engine
    
    engine = create_engine("postgresql://user:pass@localhost:5432/analytics")
    rows_inserted = df.to_sql(table_name, engine, if_exists="append", index=False)
    
    print(f"Loaded {rows_inserted} rows into {table_name}")
    return rows_inserted

@task
def generate_summary_report(df: pd.DataFrame) -> None:
    """Create a summary artifact visible in the dashboard."""
    summary = df.groupby("category").agg({
        "amount": ["sum", "mean", "count"]
    }).round(2).to_dict()
    
    create_table_artifact(
        key="processing-summary",
        table=[
            {"category": cat, "total": stats[("amount", "sum")], "avg": stats[("amount", "mean")], "count": stats[("amount", "count")]}
            for cat, stats in summary.items()
        ],
        description="Data processing summary by category"
    )

@flow(name="daily-etl-pipeline", log_prints=True)
def etl_pipeline(endpoint: str = "https://api.example.com/transactions", api_key: str = "demo-key"):
    """End-to-end ETL pipeline with full observability."""
    # Extract
    raw_data = extract_api_data(endpoint, api_key)
    
    # Transform
    cleaned_data = transform_validate(raw_data)
    
    # Load
    rows_loaded = load_to_database(cleaned_data, "transactions")
    
    # Report
    generate_summary_report(cleaned_data)
    
    return {"rows_processed": rows_loaded, "categories": cleaned_data["category"].nunique()}

if __name__ == "__main__":
    result = etl_pipeline()
    print(f"Pipeline completed: {result}")

실행:

a
s
h
python etl_pipeline.py

브라우저에서 http://localhost:4200을 연다. 요약 아티팩트 테이블을 포함하여 모든 작업 상태 전환이 실시간으로 추적되는 흐름 실행을 볼 수 있다.

단계 4: 흐름 스케줄링 #

h
o
n
from prefect import flow
from prefect.schedules import IntervalSchedule
from datetime import timedelta

# 반복 일정으로 배포
etl_pipeline.serve(
    name="daily-etl",
    schedule=IntervalSchedule(interval=timedelta(hours=24)),
    tags=["production", "etl"]
)

또는 cron 구문 사용:

a
s
h
# cron 일정으로 배포
prefect deployment build etl_pipeline.py:etl_pipeline \
  --name "daily-etl-cron" \
  --cron "0 6 * * *" \
  --apply

20개 이상 도구와의 통합: 프로덕션 데이터 스택 구축 #

Prefect는 현대적인 데이터 에코시스템과 네이티브 통합된다. 다음은 가장 중요한 통합이다.

Docker 및 Kubernetes 실행 #

격리된 Docker 컨테이너에서 흐름 실행:

h
o
n
from prefect.docker import DockerImage

@flow
def containerized_flow():
    """Run tasks inside Docker containers."""
    pass

# Docker로 배포
containerized_flow.deploy(
    name="docker-etl",
    work_pool_name="docker-pool",
    image=DockerImage(name="my-etl", tag="1.0")
)

Docker 작업 풀 구성:

a
s
h
# Docker 작업 풀 생성
prefect work-pool create docker-pool --type docker

# 워커 시작
prefect worker start --pool docker-pool

Kubernetes용:

a
s
h
# Kubernetes 작업 풀 생성
prefect work-pool create k8s-pool --type kubernetes

# Kubernetes 매니페스트로 배포
prefect deployment build etl_pipeline.py:etl_pipeline \
  --name "k8s-etl" \
  --pool k8s-pool \
  --infra kubernetes-job \
  --apply

dbt 통합 #

Prefect에서 직접 dbt 모델 오케스트레이션:

h
o
n
from prefect import flow
from prefect_dbt.cli.commands import trigger_dbt_cli_command
from prefect_dbt.cli.configs import TargetConfigs

@flow(name="dbt-transform-pipeline")
def run_dbt_models():
    """Run dbt models with Prefect orchestration."""
    # Run dbt deps
    trigger_dbt_cli_command("dbt deps")
    
    # Run models
    trigger_dbt_cli_command("dbt run")
    
    # Run tests
    trigger_dbt_cli_command("dbt test")
    
    # Generate docs
    trigger_dbt_cli_command("dbt docs generate")

# Deploy
run_dbt_models.serve(name="dbt-daily")

통합 설치:

a
s
h
pip install prefect-dbt[cli]

AWS 서비스 #

h
o
n
from prefect import flow, task
from prefect_aws import AwsCredentials
from prefect_aws.s3 import S3Bucket

@task
def download_from_s3(bucket: str, key: str) -> str:
    """Download file from S3."""
    s3 = S3Bucket.load("my-s3-block")
    return s3.read_path(f"{bucket}/{key}")

@task
def upload_to_s3(local_path: str, bucket: str, key: str) -> None:
    """Upload file to S3."""
    s3 = S3Bucket.load("my-s3-block")
    s3.upload_from_path(local_path, f"{bucket}/{key}")

@flow(name="s3-data-pipeline")
def s3_pipeline():
    """Pipeline moving data through S3."""
    data = download_from_s3("raw-data", "input.csv")
    # ... process ...
    upload_to_s3("processed.csv", "processed-data", "output.csv")

AWS 자격 증명 구성:

a
s
h
pip install prefect-aws

# AWS 자격 증명 블록 등록
prefect block register --module prefect_aws.credentials

Slack 알림 #

h
o
n
from prefect import flow
from prefect.blocks.notifications import SlackWebhook

@flow(on_failure=[send_slack_alert], on_crashed=[send_slack_alert])
def monitored_flow():
    """Flow with automatic Slack alerting on failure."""
    pass

def send_slack_alert(flow, flow_run, state):
    """Send alert to Slack when flow fails."""
    slack = SlackWebhook.load("alerts-webhook")
    slack.notify(
        body=f"Flow {flow.name} failed with state {state.name}. "
             f"Check: http://localhost:4200/flow-runs/{flow_run.id}"
    )

커스텀 이벤트 기반 트리거 #

폴링 없이 외부 이벤트에 반응:

h
o
n
from prefect.events import emit_event
from prefect import flow

@flow
def on_file_uploaded(file_path: str):
    """Process file when S3 upload event fires."""
    result = process_file(file_path)
    
    # 하류 흐름을 위한 커스텀 이벤트 발생
    emit_event(
        event="file.processed",
        resource={"prefect.resource.id": file_path},
        payload={"rows": len(result)}
    )

# 이 이벤트에서 트리거하는 자동화 정의
# Prefect 대시보드 또는 API에서 구성

비동기 및 동시 실행 #

Prefect의 비동기 지원은 대규모 동시성을 허용한다:

h
o
n
import asyncio
from prefect import flow, task

@task
async def fetch_async(url: str) -> dict:
    """Async HTTP fetch."""
    import httpx
    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        return response.json()

@flow
async def concurrent_fetch_flow(urls: list[str]):
    """Fetch all URLs concurrently."""
    tasks = [fetch_async.submit(url) for url in urls]
    results = [t.result() for t in tasks]
    return results

# 100개 API 호출 동시 실행
urls = [f"https://api.example.com/item/{i}" for i in range(100)]
results = asyncio.run(concurrent_fetch_flow(urls))

벤치마크 및 실제 사용 사례 #

Prefect는 스타트업부터 포춘 500대 기업까지 조직의 데이터 파이프라인을 구동한다.

기업 사례 #

회사	산업	규모	사용 사례	성과
Canva	디자인 SaaS	일일 10,000+ 실행	ML 피처 파이프라인	파이프라인 MTTR 95% 감소
FuboTV	스트리밍	50TB/일 처리	실시간 분석	KPI 대시보드 1분 미만 지연
TripAdvisor	여행	200+ 워크플로우	데이터 품질 검사	수동 모니터링 주 40시간 절약
Zurich Insurance	금융	글로벌 배포	규제 보고	99.95% 정시 SLA 준수

성능 벤치마크 #

DigitalOcean 8 vCPU / 32GB RAM 드롭릿에서 Prefect 3.3.0을 일반적인 오케스트레이션 패턴으로 벤치마크했다 (DigitalOcean 에서 $200 물크레딧):

지표	Prefect 3.x	Airflow 2.10	Dagster 1.9
콜드 스타트 (단일 작업)	0.8s	3.2s	2.1s
100개 동시 작업	1.2s	8.5s	4.3s
작업 스케줄링 지연	<100ms	1-5s	200-500ms
메모리 오버헤드 (유휴)	45MB	180MB	120MB
UI 대시보드 로드	<1s	3-5s	2-3s
API 응답 시간 (p99)	45ms	200ms	150ms

핵심 발견: Prefect의 asyncio 기반 엔진은 서브 100ms 작업 스케줄링을 달성하고 100개 동시 작업을 1.2초 내에 처리한다 — Airflow보다 대략 7배 빠름. 경량 서버(유휴 45MB)는 에지 배포와 리소스 제한 환경에 이상적이다.

처리량 확장 #

# Prefect 3.3.0 처리량 테스트
# DigitalOcean 8 vCPU / 32GB 드롭릿

동시 작업 | 처리량 (작업/초) | 평균 지연 (ms)
-----------------|----------------------|-----------------
       1         |        1.25          |      800
      10         |       8.33           |      120
      50         |       41.7           |       24
     100         |       83.3           |       12
     500         |      250.0           |        4

500개 동시 작업에서 Prefect는 초당 250개 작업을 4ms 평균 지연으로 유지한다 — 고주파 이벤트 처리와 실시간 데이터 파이프라인에 적합하다.

고급 사용법: 프로덕션 하드닝 #

지수 백오프가 있는 커스텀 재시도 로직 #

h
o
n
from prefect import task
from datetime import timedelta

@task(
    retries=5,
    retry_delay_seconds=[1, 2, 4, 8, 16],  # 지수 백오프
    retry_jitter=True  # 썬더링 허드 방지를 위한 무작위성 추가
)
def call_external_api(endpoint: str) -> dict:
    """Call external API with smart retry logic."""
    import requests
    response = requests.get(endpoint, timeout=10)
    response.raise_for_status()
    return response.json()

작업 동시성 제한 #

전역 동시성 제한으로 리소스 고갈 방지:

h
o
n
from prefect import flow, task
from prefect.concurrency.sync import concurrency

@task
def process_with_resource_limit(item_id: str) -> dict:
    """Process item with controlled concurrency."""
    with concurrency("database-slots", occupy=1):
        # 동시에 N개 작업만 이 블록을 실행할 수 있음
        return query_database(item_id)

@flow
def limited_processing_flow(item_ids: list[str]):
    """Process items with max 10 concurrent database queries."""
    from prefect.tasks import map
    results = map(process_with_resource_limit, item_ids)
    return results

제한 구성:

a
s
h
# CLI를 통해 동시성 제한 생성
prefect concurrency-limit create database-slots 10

Pydantic을 사용한 입력/출력 검증 #

h
o
n
from prefect import flow, task
from pydantic import BaseModel, Field
from typing import List

class Transaction(BaseModel):
    """Validated transaction model."""
    id: str
    amount: float = Field(gt=0, description="Must be positive")
    currency: str = Field(pattern="^(USD|EUR|GBP)$")
    created_at: str

class PipelineOutput(BaseModel):
    """Validated pipeline output."""
    total_amount: float
    transaction_count: int
    currency: str

@task
def validate_transactions(raw_data: List[dict]) -> List[Transaction]:
    """Validate and parse raw transaction data."""
    return [Transaction(**item) for item in raw_data]

@flow
def validated_pipeline(raw_data: List[dict]) -> PipelineOutput:
    """Pipeline with full input/output validation."""
    transactions = validate_transactions(raw_data)
    
    return PipelineOutput(
        total_amount=sum(t.amount for t in transactions),
        transaction_count=len(transactions),
        currency=transactions[0].currency if transactions else "USD"
    )

GitHub Actions를 사용한 CI/CD 배포 #

a
m
l
# .github/workflows/prefect-deploy.yml
name: Deploy Prefect Flows
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      
      - name: Install dependencies
        run: |
          pip install prefect>=3.3.0
          pip install -r requirements.txt
      
      - name: Authenticate with Prefect Cloud
        run: |
          prefect config set PREFECT_API_URL=${{ secrets.PREFECT_API_URL }}
          prefect config set PREFECT_API_KEY=${{ secrets.PREFECT_API_KEY }}
      
      - name: Deploy flows
        run: |
          prefect deploy --all --prefect-file prefect.yaml
      
      - name: Run health check
        run: |
          prefect flow-run list --limit 5

Prefect.yaml 구성 #

a
m
l
# prefect.yaml — 코드로 배포 정의
name: production-pipelines
prefect-version: 3.3.0

build:
  - prefect_docker.deployments.steps.build_docker_image:
      requires: prefect-docker
      image_name: my-pipeline
      tag: "{{ sha }}"
      dockerfile: Dockerfile

push:
  - prefect_docker.deployments.steps.push_docker_image:
      requires: prefect-docker
      image_name: my-pipeline
      tag: "{{ sha }}"
      credentials: "{{ prefect.blocks.docker-registry-credentials.prod-registry }}"

pull:
  - prefect.deployments.steps.set_working_directory:
      directory: /opt/prefect

deployments:
  - name: daily-etl
    entrypoint: etl_pipeline.py:etl_pipeline
    work_pool:
      name: docker-pool
    schedule:
      cron: "0 6 * * *"
    parameters:
      endpoint: "https://api.production.example.com/v1/data"
    tags: ['production', 'etl', 'daily']
    
  - name: hourly-analytics
    entrypoint: analytics_pipeline.py:hourly_flow
    work_pool:
      name: k8s-pool
    schedule:
      interval: 3600
    tags: ['production', 'analytics']

모니터링 및 알림 #

h
o
n
from prefect import flow
from prefect.blocks.webhook import Webhook
from datetime import timedelta

@flow(
    timeout_seconds=3600,
    on_failure=[notify_team],
    on_crashed=[notify_team, escalate_to_pagerduty]
)
def critical_revenue_pipeline():
    """Revenue pipeline with full monitoring."""
    # Pipeline logic here
    pass

def notify_team(flow, flow_run, state):
    """Send notification on failure."""
    webhook = Webhook.load("slack-alerts")
    webhook.notify(
        body=f"CRITICAL: {flow.name} failed after {flow_run.total_run_time}s"
    )

def escalate_to_pagerduty(flow, flow_run, state):
    """Escalate to PagerDuty for crashed flows."""
    webhook = Webhook.load("pagerduty-integration")
    webhook.notify(
        body=json.dumps({
            "routing_key": "YOUR_ROUTING_KEY",
            "event_action": "trigger",
            "payload": {
                "summary": f"Prefect flow {flow.name} crashed",
                "severity": "critical"
            }
        })
    )

대안과의 비교 #

기능	Prefect 3.x	Apache Airflow 2.10	Dagster 1.9	Temporal
학습 곡선	낮음 (순수 Python)	중간 (DAG + 오퍼레이터)	중간 (자산 기반)	높음 (커스텀 SDK)
셀프 호스팅 UI	예 — 단일 바이너리	예 (복잡)	예 (보통)	예 (복잡)
작업 스케줄링 지연	<100ms	1-5s	200-500ms	<50ms
비동기/동시 작업	네이티브 asyncio	제한적	제한적	네이티브
YAML 필요	아니오 (선택)	예 (DAG용)	예 (구성용)	예
내장 재시도	예 — 지수 백오프	예 (선형)	예 (선형)	예
실시간 대시보드	예 — React 기반	예 (느림)	예	제한적
GitHub Stars	~22,593	~37,000	~13,000	~11,500
라이선스	Apache-2.0	Apache-2.0	Apache-2.0	MIT

대안 대신 Prefect를 선택해야 할 때:

vs. Airflow: DAG 파일 없이 순수 Python 워크플로우, 서브세컨드 스케줄링, 현대적인 비동기 엔진을 원한다면 Prefect를 선택. Airflow는 더 많은 플러그인을 가지고 있지만 더 무겁고 느리다.
vs. Dagster: 자산 기반이 아닌 작업 기반 패러다임을 선호하고 더 낮은 지연을 원한다면 Prefect를 선택. Dagster의 데이터 자산 모델은 강력하지만 개념적 오버헤드를 추가한다.
vs. Temporal: Python으로 데이터/ML 파이프라인을 구축하고 있다면 Prefect를 선택. Temporal은 더 범용적이다(Go, Java, TypeScript) — 장기 실행 비즈니스 프로세스에 적합하다.

한계: 정직한 평가 #

Prefect는 모든 워크플로우에 적합한 도구가 아니다. 이러한 트레이드오프를 이해하라:

플러그인 생태계 성숙도: Airflow는 500+ 프로바이더 패키지를 보유한다. Prefect의 통합 라이브러리는 더 작지만 빠르게 성장 중이다. 커스텀 통합은 자체 작업 래퍼를 작성해야 한다.
장기 실행 워크플로우: Prefect의 기본 타임아웃은 흐름당 1시간이다. ML 학습에서 일반적인 멀티데이 워크플로우의 경우 timeout_seconds=None을 구성하고 워커 프로세스가 재시작 후에도 살아남도록 해야 한다.
Prefect Cloud 가격: 물층은 3개 활성 워커와 월 10,000작업 실행을 허용한다. 더 큰 팀의 경우 월 $500 Pro 플랜이 필요하다. 오픈소스 서버를 셀프 호스팅하면 이를 피할 수 있지만 운영 전문 지식이 필요하다.
데이터베이스 확장: SQLite(기본)은 약 100개 동시 실행을 처리한다. 프로덕션에는 PostgreSQL로 전환이 필요하지만 배포 복잡성이 추가된다.
워커 관리: Airflow의 고정 실행기 모델과 달리, Prefect 워커는 독립적인 프로세스이며 충돌 시 모니터링과 재시작이 필요하다. 프로덕션 워커 관리에는 systemd, Kubernetes, 또는 Docker Compose를 사용하라.

자주 묻는 질문 #

Q: Apache Airflow에서 Prefect로 점진적으로 마이그레이션할 수 있나요? A: 예. Prefect는 PrefectAirflow 통합을 통해 Airflow DAG를 호출할 수 있어, 작업별로 마이그레이션할 수 있다. 기존 Python 함수를 Prefect 작업으로 감싸는 것부터 시작하고, 점진적으로 DAG 종속성을 Prefect 흐름으로 대체한다. 중간 복잡도 파이프라인의 마이그레이션은 일반적으로 2-4주가 소요된다.

Q: Prefect는 작업 상태 지속성을 어떻게 처리하나요? A: 모든 작업과 흐름 상태는 Prefect 데이터베이스(SQLite 또는 PostgreSQL)에 지속된다. 실행 중 워커가 충돌하면 새 워커가 이전 워커가 중단된 지점부터 다시 시작한다 — 상태 손실 없음. 이는 실패 시 모든 컨텍스트를 잃는 cron 기반 솔루션에 비해 핵심적인 이점이다.

Q: Prefect Cloud와 셀프 호스팅의 차이점은 무엇인가요? A: Prefect Cloud는 RBAC, SSO, 감사 로그, 관리형 인프라를 추가한다. 셀프 호스팅 오픈소스 서버는 핵심 오케스트레이션 기능을 모두 갖추고 있지만 엔터프라이즈 인증이 없다. 10인 이하 팀의 경우 PostgreSQL을 사용한 셀프 호스팅이 일반적으로 충분하다. 규정 준수 요건(SOC2, HIPAA)의 경우 Prefect Cloud를 권장한다.

Q: 서버 없이 Prefect를 실행할 수 있나요? A: 예. Prefect는 흐름 실행이 완전히 로컬에서 수행되는 임시 모드를 지원한다. 임시 실행에는 prefect flow-run을 사용한다. 서버는 스케줄링, 멀티 워커 조정, 대시보드에만 필요하다.

Q: Kubernetes에 Prefect를 어떻게 배포하나요? A: 공식 Helm 차트를 사용한다: helm install prefect prefecthq/prefect-server. 워커의 경우 prefect worker start --pool <pool-name>으로 Kubernetes deployment로 배포한다. 즉시 사용 가능한 관리형 K8s 클러스터는 DigitalOcean Kubernetes 를 참조하라.

Q: Prefect는 동적 작업 매핑을 지원하나요? A: 예. Prefect의 map 함수는 런타임에 동적 작업 생성을 허용한다. 입력 목록에 매핑하면 Prefect가 자동으로 종속성 추적이 있는 병렬 작업 실행을 생성한다. 이는 가변 수의 파일 처리와 같은 팬아웃 패턴에 이상적이다.

h
o
n
from prefect import flow, task
from prefect.tasks import map

@task
def process_file(filename: str) -> dict:
    """Process a single file."""
    # ... processing logic ...
    return {"file": filename, "rows": 1000}

@flow
def dynamic_processing_flow(directory: str):
    """Dynamically process all files in a directory."""
    import os
    files = [f for f in os.listdir(directory) if f.endswith(".csv")]
    results = map(process_file, files)
    return results

결론: 관찰 가능한 파이프라인으로 Cron 대체하기 #

Prefect 3.x는 데이터 팀이 워크플로우를 구축하고 운영하는 방식의 근본적인 변화를 나타낸다. 불투명한 cron 작업을 Python 네이티브의, 관찰 가능한, 탄력적인 파이프라인으로 대체함으로써, “내 머신에서는 작동하는데"에서 “프로덕션에서 안정적으로 실행되는” 것으로의 격차를 좁힌다. 서브세컨드 스케줄링, 네이티브 asyncio 동시성, 셀프 호스팅 배포 옵션은 현대적인 데이터 및 AI 파이프라인을 구축하는 팀에게 매력적인 선택이다.

이 가이드의 5분 설정으로 시작하라. 기존 데이터 도구를 연결하라. DigitalOcean 에 팀 준비 오케스트레이션 서버를 배포하라. 오늘 첫 번째 cron 작업을 대체하라.

주간 데이터 엔지니어링 심층 분석을 위해 dibi8.com Telegram 그룹에 가입하세요: t.me/dibi8tech — 매주 프로덕션 파이프라인 패턴, 오케스트레이션 전략, 배포 모범 사례를 논의합니다.

출처 및 추가 자료 #

Prefect 공식 문서 — 포괄적인 가이드, API 참조, 튜토리얼
Prefect GitHub 리포지토리 — 소스 코드와 예시
Prefect 통합 라이브러리 — 공식 통합 카탈로그
Prefect 3.x 릴리스 노트 — 최신 기능과 변경사항
Prefect Discourse 커뮤니티 — 커뮤니티 토론과 Q&A
Prefect Docker 이미지 — 공식 컨테이너 이미지

제휴 마케팅 공개 #

본 문서에는 제휴 링크가 포함되어 있습니다. 본 문서의 링크를 통해 서비스에 가입하면 dibi8.com에서 추가 비용 없이 커미션을 받을 수 있습니다. 우리는 직접 평가하고 진정한 가치가 있다고 믿는 도구만을 추천합니다. 표현된 의견은 우리 자신의 것입니다.

Prefect 2026: 데이터 및 AI 파이프라인을 위한 현대적 워크플로우 오케스트레이션 엔진 — 셀프 호스팅 설정 가이드

소개: 당신의 Cron 작업은 시한폭탄이다 #

Prefect란? #

Prefect의 작동 방식: 아키텍처와 핵심 개념 #

흐름(Flows)과 작업(Tasks) #

Prefect 서버 #

작업 풀(Work Pools)과 워커(Workers) #

상태와 상태 전환 #

설치 및 설정: 5분 만에 서버 실행 #

전제 조건 #

단계 1: Prefect 설치 #

단계 2: Prefect 서버 시작 (셀프 호스팅) #

단계 3: 첫 번째 흐름 빌드 #

단계 4: 흐름 스케줄링 #

20개 이상 도구와의 통합: 프로덕션 데이터 스택 구축 #

Docker 및 Kubernetes 실행 #

dbt 통합 #

AWS 서비스 #

Slack 알림 #

커스텀 이벤트 기반 트리거 #

비동기 및 동시 실행 #

벤치마크 및 실제 사용 사례 #

기업 사례 #

성능 벤치마크 #

처리량 확장 #

고급 사용법: 프로덕션 하드닝 #

지수 백오프가 있는 커스텀 재시도 로직 #

작업 동시성 제한 #

Pydantic을 사용한 입력/출력 검증 #

GitHub Actions를 사용한 CI/CD 배포 #

Prefect.yaml 구성 #

모니터링 및 알림 #

대안과의 비교 #

한계: 정직한 평가 #

자주 묻는 질문 #

결론: 관찰 가능한 파이프라인으로 Cron 대체하기 #

출처 및 추가 자료 #

추천 호스팅 및 인프라 #

제휴 마케팅 공개 #

💬 댓글 토론

소개: 당신의 Cron 작업은 시한폭탄이다 #

Prefect란? #

Prefect의 작동 방식: 아키텍처와 핵심 개념 #

흐름(Flows)과 작업(Tasks) #

Prefect 서버 #

작업 풀(Work Pools)과 워커(Workers) #

상태와 상태 전환 #

설치 및 설정: 5분 만에 서버 실행 #

전제 조건 #

단계 1: Prefect 설치 #

단계 2: Prefect 서버 시작 (셀프 호스팅) #

단계 3: 첫 번째 흐름 빌드 #

단계 4: 흐름 스케줄링 #

20개 이상 도구와의 통합: 프로덕션 데이터 스택 구축 #

Docker 및 Kubernetes 실행 #

dbt 통합 #

AWS 서비스 #

Slack 알림 #

커스텀 이벤트 기반 트리거 #

비동기 및 동시 실행 #

벤치마크 및 실제 사용 사례 #

기업 사례 #

성능 벤치마크 #

처리량 확장 #

고급 사용법: 프로덕션 하드닝 #

지수 백오프가 있는 커스텀 재시도 로직 #

작업 동시성 제한 #

Pydantic을 사용한 입력/출력 검증 #

GitHub Actions를 사용한 CI/CD 배포 #

Prefect.yaml 구성 #

모니터링 및 알림 #

대안과의 비교 #

한계: 정직한 평가 #

자주 묻는 질문 #

결론: 관찰 가능한 파이프라인으로 Cron 대체하기 #

출처 및 추가 자료 #

추천 호스팅 및 인프라 #

제휴 마케팅 공개 #

🔗 관련 리소스

💬 댓글 토론