Netdata: Giám Sát Thở Gian Thự 78K+ Star

Netdata (ND) là agent giám sát thở gian thực hiệu suất cao với metrics từng giây và khả năng trực quan hóa. Tương thích với Docker, Kubernetes, Prometheus và Grafana. Bao gồm hướng dẫn netdata, cài đặt netdata, giám sát thở gian thực, netdata vs prometheus, và tinh chỉnh hiệu suất netdata.

  • GPL-3.0
  • Cập nhật 2026-05-19

lang: vi slug: netdata title: ‘Netdata: Real-Time Monitoring with 78K+ Stars’ description: ‘Netdata (ND) is a high-performance real-time monitoring agent with per-second metrics and visualization. Compatible with Docker, Kubernetes, Prometheus, and Grafana. Covers netdata tutorial, netdata setup, real time monitoring, netdata vs prometheus, and netdata performance tuning.’ tags: [‘devops’, ‘guide’, ‘monitoring’, ‘observability’, ‘open-source’, ‘reference’, ’tutorial’] date: 2026-05-19 00:00:00+08:00 lastmod: 2026-05-19 00:00:00+08:00 tech_stack: [] application_domain: Dev Utils source_version: ’’ licensing_model: Open Source license_type: GPL-3.0 file_size: ’’ file_md5: ’’ download_url: ’’ backup_url: ’’ github_repo: ‘https://github.com/netdata/netdata' last_maintained: ‘2026-05-19’ draft: false categories: [‘dev-utils’] aliases:

  • /posts/netdata/ faqs:
    • q: ‘How much RAM and CPU does Netdata use per node?’ a: ‘In its default configuration Netdata uses under 5% CPU and roughly 100-150 MB RAM per node. A child node streaming to a parent in RAM mode uses about 50 MB, while a parent storing 7 days of 1-second data for 10 nodes needs 2.5-3.5 GB RAM with a 1.4 GB page cache.’
    • q: ‘Can Netdata run without Netdata Cloud?’ a: ‘Yes. The open-source agent is fully functional with no cloud connection. You access dashboards directly on any agent at port 19999 and use parent-child streaming for centralized aggregation. Netdata Cloud only adds unified multi-host dashboards, mobile apps, and advanced alert routing.’
    • q: ‘How do I install Netdata with one command on Linux?’ a: ‘Run curl -Ss https://get.netdata.cloud/kickstart.sh | sudo bash, which installs Netdata with all defaults in under 60 seconds. Verify it with sudo systemctl status netdata and open the dashboard at http://localhost:19999.’
    • q: ‘How does Netdata compare to Prometheus for monitoring?’ a: ‘Netdata collects metrics at 1-second granularity with zero configuration and a built-in real-time dashboard, while Prometheus polls every 15-60 seconds and requires PromQL with external visualization like Grafana. Many teams run both, using Netdata for per-node real-time troubleshooting and Prometheus for cluster-level aggregation; Netdata can export metrics to Prometheus via OpenMetrics format.’
    • q: ‘Does Netdata support custom application metrics?’ a: ‘Yes. You can send custom metrics through the built-in StatsD server on port 8125, expose them via the OpenMetrics endpoint, or write a custom collector in Python or Go using the go.d.plugin framework.’

{{< resource-info >}}

Introduction #

Most monitoring tools show you what happened 30 seconds ago. By then, the microburst that killed your database connection pool is already gone — leaving only a cryptic log entry and an angry pager. Netdata closes that gap with per-second metrics, sub-2-second visualization latency, and a single binary that installs in under 60 seconds. With 78,874 GitHub stars and a GPL-3.0 license, it is one of the most popular open-source monitoring agents in production today. This guide walks through a complete Netdata setup — from a single-node Docker deployment to a streaming Kubernetes cluster — with real performance-tuning configurations you can apply immediately.

What Is Netdata? #

Netdata is a distributed real-time monitoring agent that collects, stores, and visualizes system and application metrics with per-second granularity. Unlike traditional polling-based systems that sample every 15–60 seconds, Netdata runs directly on each node, capturing thousands of metrics locally and streaming them to central parents when needed. Written in C (54.2%), Go (29%), and Rust (8.5%), it is designed for minimal resource overhead: under 5% CPU and ~150 MB RAM per node in default configuration.

How Netdata Works #

Netdata’s architecture follows an edge-first, distributed model. Each node runs an independent agent that auto-discovers 800+ integrations without configuration files.

Netdata Architecture

Netdata Dashboard Overview

Core components:

  • Data Collection Layer: Built-in collectors for CPU, memory, disk, network, containers, databases, and applications. No external dependencies like Telegraf or Exporters required.
  • Database Engine (dbengine): A custom high-performance time-series store using ~0.5 bytes per sample with tiered storage for long-term retention.
  • ML Anomaly Detection: 18 unsupervised ML models per metric run at the edge, achieving consensus-based anomaly detection with 99% false-positive reduction.
  • Streaming & Parent-Child: Any agent can act as a “parent” to aggregate metrics from child nodes, enabling horizontal scaling without central bottlenecks.
  • Web Dashboard: Embedded static-threaded web server serves real-time charts directly from the agent on port 19999.

Installation & Setup #

One-Line Install (Linux) #

The fastest way to get Netdata running:

a
s
h
# Install Netdata with all defaultscurl -Ss https://get.netdata.cloud/kickstart.sh | sudo bash

Verify the installation:

a
s
h
sudo systemctl status netdata
# Active: active (running) since ...

Access the local dashboard at http://localhost:19999.

Docker Deploy #

For containerized environments:

a
s
h
docker run -d --name=netdata \
  -p 19999:19999 \
  -v /proc:/host/proc:ro \
  -v /sys:/host/sys:ro \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  --cap-add SYS_PTRACE \
  --security-opt apparmor=unconfined \
  netdata/netdata:latest

Docker Compose (Production-Ready) #

a
m
l
version: '3.8'
services:
  netdata:
    image: netdata/netdata:v2.5.0
    container_name: netdata
    hostname: "netdata-${HOSTNAME}"
    ports:
      - "19999:19999"
    restart: unless-stopped
    cap_add:
      - SYS_PTRACE
      - SYS_ADMIN
    security_opt:
      - apparmor:unconfined
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /etc/os-release:/host/etc/os-release:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - netdata-config:/etc/netdata
      - netdata-lib:/var/lib/netdata
      - netdata-cache:/var/cache/netdata
    environment:
      - NETDATA_CLAIM_TOKEN=${NETDATA_CLAIM_TOKEN}
      - NETDATA_CLAIM_URL=https://app.netdata.cloud
      - NETDATA_CLAIM_ROOMS=${NETDATA_CLAIM_ROOMS}
volumes:
  netdata-config:
  netdata-lib:
  netdata-cache:

Netdata System Monitoring

Kubernetes Helm Install #

a
s
h
# Add the Netdata Helm repository
helm repo add netdata https://netdata.github.io/helmchart/
helm repo update

# Install with default values
helm install netdata netdata/netdata \
  --namespace monitoring \
  --create-namespace

Verify the pods:

a
s
h
kubectl get pods -n monitoring
# NAME                    READY   STATUS
# netdata-parent-0        1/1     Running
# netdata-child-xxx       1/1     Running

Generate Current Config #

Download the running configuration to customize:

a
s
h
# Download current effective config
curl -o /etc/netdata/netdata.conf http://localhost:19999/netdata.conf
# Or use the edit-config script
sudo /etc/netdata/edit-config netdata.conf

Prometheus Remote Write #

Export Netdata metrics to Prometheus for long-term storage and PromQL queries:

a
s
h
sudo /etc/netdata/edit-config exporting.conf
o
n
f
[prometheus:remote_write]
    enabled = yes
    destination = prometheus:9090
    remote write URL path = /api/v1/write
    data source = average
    prefix = netdata
    send charts matching = *
    send hosts matching = *

Restart Netdata:

a
s
h
sudo systemctl restart netdata

Grafana Dashboard #

While Netdata has a built-in dashboard, many teams prefer Grafana for centralized visualization. Add Netdata as a Prometheus data source in Grafana:

a
m
l
# datasource.yaml in Grafana
apiVersion: 1
datasources:
  - name: Netdata-Prometheus
    type: prometheus
    url: http://netdata:19999/api/v1/allmetrics?format=prometheus
    access: proxy
    isDefault: false
    jsonData:
      timeInterval: "1s"

Kubernetes DaemonSet (Advanced) #

For full host-level visibility on every K8s node:

a
m
l
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: netdata
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: netdata
  template:
    metadata:
      labels:
        app: netdata
    spec:
      hostNetwork: true
      hostPID: true
      containers:
        - name: netdata
          image: netdata/netdata:v2.5.0
          ports:
            - containerPort: 19999
              hostPort: 19999
          securityContext:
            capabilities:
              add: [SYS_PTRACE, SYS_ADMIN]
          volumeMounts:
            - name: proc
              mountPath: /host/proc
              readOnly: true
            - name: sys
              mountPath: /host/sys
              readOnly: true
            - name: docker-sock
              mountPath: /var/run/docker.sock
              readOnly: true
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: sys
          hostPath:
            path: /sys
        - name: docker-sock
          hostPath:
            path: /var/run/docker.sock

PostgreSQL Monitoring #

Enable the PostgreSQL collector in go.d/postgres.conf:

a
m
l
jobs:
  - name: local
    dsn: 'postgres://netdata_monitor:password@localhost:5432/postgres'
    collect:
      - database_statistics
      - table_statistics
      - index_statistics
      - replication_statistics
    timeout: 2

Test the collector:

a
s
h
sudo /etc/netdata/edit-config go.d/postgres.conf
# Restart to apply
sudo systemctl restart netdata

Nginx Monitoring #

Monitor Nginx stub_status and access logs:

a
m
l
# /etc/netdata/go.d/nginx.conf
jobs:
  - name: local
    url: http://localhost/stub_status

  - name: access_log
    path: /var/log/nginx/access.log
    parser:
      type: ltsv

Benchmarks / Real-World Use Cases #

Resource Footprint Comparison #

The University of Amsterdam published a peer-reviewed study (ICSOC 2023) ranking Netdata as the most energy-efficient monitoring tool for Docker-based systems. Independent benchmarks confirm:

Scenario Netdata Prometheus + Node Exporter Zabbix Agent
CPU Overhead (%) 1–5% 5–15% 10–20%
RAM per Node 100–150 MB 200–500 MB 150–300 MB
Collection Interval 1 second 15–60 seconds 30–60 seconds
Startup Time < 60 seconds 30–60 minutes 2–4 hours
Metrics per Node 2,000+ 500–1,000 1,000–2,000
Config Required Zero Moderate Extensive

Streaming Scale Benchmarks #

Netdata’s parent-child streaming architecture scales horizontally:

  • Single Parent: 1 million+ samples/second ingestion, ~3.5 GB RAM
  • 10 Child Nodes: 20k metrics each, 1-second retention for 7 days, 12 GB disk
  • Active-Active Cluster: Unlimited horizontal scaling with full data replication
  • Ephemeral Nodes: Stream metrics to parents, retain data after node termination

Real-World Deployment: 500-Node K8s Cluster #

A mid-size SaaS company running 500 Kubernetes nodes uses:

  • 5 Netdata parents (active-active) across 3 availability zones
  • 500 child agents via DaemonSet, each running in RAM mode (~50 MB)
  • Tiered storage: 1-second for 7 days, 1-minute for 1 month, 1-hour for 1 year
  • Total parent storage: 25 GB per parent (125 GB cluster-wide)
  • Alert routing via Netdata Cloud to PagerDuty and Slack

Advanced Usage / Production Hardening #

Performance Tuning: netdata.conf #

The default configuration is optimized for standalone use. For production systems, tune the database engine and disable unnecessary collectors.

Minimal footprint (production child nodes):

o
n
f
[global]
    # Run with lowest priority to avoid impacting applications
    process scheduling policy = batch
    process nice level = 19
    # Reduce threads on constrained nodes
    libuv worker threads = 4

[db]
    # Use RAM-only mode for child nodes streaming to parents
    mode = ram
    retention = 3600
    # Single tier is sufficient for local buffering
    storage tiers = 1

[ml]
    # Disable ML on child nodes — run on parents
    enabled = no

[health]
    # Disable local alerting on child nodes
    enabled = no

[web]
    # Bind only to localhost for security
    bind to = 127.0.0.1
    allow connections from = localhost

[plugins]
    # Disable unused collectors to reduce CPU/RAM
    ebpf = no
    perf = no
    nfacct = no
    fping = no
    ioping = no
    tc = no
    idlejitter = no
    debugfs = no
    systemd-journal = no

Parent node with tiered storage (central monitoring):

o
n
f
[db]
    mode = dbengine
    storage tiers = 3
    dbengine page cache size = 1.4GiB

    # Tier 0: 1-second resolution, 7 days
    update every = 1
    dbengine tier 0 retention size = 12GiB
    dbengine tier 0 retention time = 7d

    # Tier 1: 1-minute resolution, 1 month
    dbengine tier 1 update every iterations = 60
    dbengine tier 1 retention size = 4GiB
    dbengine tier 1 retention time = 1mo

    # Tier 2: 1-hour resolution, 1 year
    dbengine tier 2 update every iterations = 60
    dbengine tier 2 retention size = 2GiB
    dbengine tier 2 retention time = 1y

[ml]
    enabled = yes
    # 18 ML models per metric, trains every 3 hours
    train every = 10800
    number of models per dimension = 18

[health]
    enabled = yes
    # Run health checks every 10 seconds
    run at least every seconds = 10

[web]
    # Expose to network for dashboard access
    bind to = *
    # Restrict access to internal networks
    allow connections from = 10.* 192.168.* 172.16.* 172.17.*

Streaming Configuration: stream.conf #

On child nodes (/etc/netdata/stream.conf):

o
n
f
[stream]
    enabled = yes
    destination = tcp:netdata-parent.monitoring.svc.cluster.local:19999
    api key = YOUR_API_KEY_HERE
    timeout seconds = 60
    default port = 19999
    send charts matching = *
    buffer size bytes = 1048576
    reconnect delay seconds = 5
    initial clock resync iterations = 60

On the parent (/etc/netdata/stream.conf):

o
n
f
[API_KEY]
    enabled = yes
    default memory mode = dbengine
    health enabled by default = yes

Security Hardening #

Enable TLS for the web interface:

o
n
f
[web]
    tls version = 1.3
    ssl key = /etc/netdata/ssl/key.pem
    ssl certificate = /etc/netdata/ssl/cert.pem
    # Require TLS for all connections
    bind to = *=dashboard|registry|badges|management|streaming|netdata.conf|readable|writable

Monitoring Netdata Itself #

Track the agent’s own resource usage:

a
s
h
# View internal metrics
curl -s http://localhost:19999/api/v1/info | jq '.version, .hog'

# Check dbengine statistics
curl -s http://localhost:19999/api/v1/data?chart=netdata.dbengine_main_page_stats

Comparison with Alternatives #

Feature Netdata Prometheus Datadog Zabbix
Collection Interval 1 second 15–60 seconds 15 seconds 30–60 seconds
Agent RAM 100–150 MB 200–500 MB 200–400 MB 150–300 MB
Agent CPU 1–5% 5–15% 3–8% 10–20%
Setup Time < 60 seconds 30–60 min 10–20 min 2–4 hours
Auto-Discovery 800+ integrations Limited 600+ integrations Template-based
Built-in Dashboard Yes (real-time) No (PromQL only) Yes Yes (dated)
ML Anomaly Detection 18 models/metric No (requires extra) Yes Limited
Long-Term Storage Tiered dbengine 15 days default Cloud External DB
Licensing GPL-3.0 (free) Apache-2.0 (free) $15/host/month GPL-2.0 (free)
Alert Routing Built-in + Cloud Alertmanager Built-in Built-in
Kubernetes Native Yes (Helm chart) Yes (operator) Yes (operator) Limited

When to choose what:

  • Netdata: Single-node visibility, real-time troubleshooting, resource-constrained environments, teams that need zero-config monitoring.
  • Prometheus: Cloud-native metric aggregation, complex PromQL queries, long-term storage with Thanos/VictoriaMetrics.
  • Datadog: Enterprise-wide observability (metrics + logs + traces + APM), managed SaaS with SOC 2 compliance.
  • Zabbix: Mixed infrastructure (network gear + servers + SNMP), organizations needing agent + agentless monitoring.

Limitations / Honest Assessment #

Netdata is not the right tool for every monitoring scenario:

  1. No centralized multi-host view without Netdata Cloud: The open-source agent dashboards are per-node. For a unified view across 100+ nodes, you either need Netdata Cloud (SaaS) or a parent streaming setup with external visualization.
  2. Limited long-term storage in default config: The dbengine defaults to ~256 MB disk per tier. For years of retention at scale, you need explicit tiered storage configuration or an external TSDB like VictoriaMetrics.
  3. Alerting pipeline less flexible than Alertmanager: While Netdata has built-in health checks and notifications, complex routing trees, silencing, and on-call rotation require Netdata Cloud or integration with PagerDuty/OpsGenie.
  4. Not a full observability platform: Netdata focuses on metrics. For distributed tracing, structured logging, and APM, pair it with Jaeger, Loki, or OpenTelemetry.
  5. Windows support is newer: The Windows agent is functional but has fewer collectors than Linux. For Windows-centric environments, verify collector coverage before deploying.

Frequently Asked Questions #

Q: How much RAM does Netdata actually use in production?

On a child node streaming to a parent, ~50 MB with the RAM-mode configuration shown above. On a parent storing 7 days of 1-second data for 10 nodes, expect 2.5–3.5 GB RAM with a 1.4 GB page cache. The agent auto-tunes cache sizes based on available system memory.

Q: Can I run Netdata without Netdata Cloud?

Yes. The open-source agent is fully functional without any cloud connection. Use parent-child streaming for centralized aggregation and access dashboards directly on any agent at port 19999. Netdata Cloud adds unified dashboards, mobile apps, and advanced alert routing but is not required.

Q: How does Netdata compare to Prometheus + Grafana?

Netdata excels at real-time, per-node visibility with zero configuration. Prometheus excels at metric aggregation across dynamic, ephemeral fleets with PromQL. Many teams run both: Netdata on every node for real-time troubleshooting, Prometheus for cluster-level aggregation and long-term trending. Netdata can export metrics to Prometheus via OpenMetrics format.

Q: What is the maximum scale Netdata can handle?

A single parent node can ingest 1+ million samples/second. For horizontal scaling, deploy active-active parent clusters. There is no theoretical limit — the architecture is distributed by design. The Netdata Cloud SaaS handles millions of nodes.

Q: How do I back up Netdata’s database and configuration?

Configuration lives in /etc/netdata/ and can be version-controlled (Git, Ansible, Puppet). The dbengine database in /var/cache/netdata/ is self-healing and does not require manual backup — streaming to multiple parent nodes provides natural redundancy. For critical environments, run active-active parent pairs.

Q: Does Netdata support custom application metrics?

Yes. Use the built-in StatsD server (port 8125), the OpenMetrics endpoint, or write a custom collector in Python or Go. The go.d.plugin framework supports building new collectors with minimal boilerplate.

Conclusion #

Netdata delivers on a promise most monitoring tools fail to keep: immediate, per-second visibility with negligible resource overhead. For teams running Docker, Kubernetes, or bare-metal infrastructure, it is the fastest path from “nothing” to “monitoring everything.” The streaming architecture scales from a single Raspberry Pi to a multi-region Kubernetes cluster, and the GPL-3.0 license means zero licensing costs.

Action items:

  1. Run the one-line installer on your most critical server today
  2. Deploy the Helm chart on your Kubernetes cluster
  3. Configure parent-child streaming for production hardening
  4. Tune netdata.conf for your resource constraints using the configs above

Join the Netdata community on Telegram for real-time support and discussions with 5,000+ engineers.

Affiliate disclosure: This article contains affiliate links for DigitalOcean and HTStack. If you purchase hosting services through these links, dibi8.com may receive a commission at no additional cost to you.

Before you deploy any of the tools above into production, you’ll need solid infrastructure. Two options dibi8 actually uses and recommends:

  • DigitalOcean — $200 free credit for 60 days across 14+ global regions. The default option for indie devs running open-source AI tools.
  • HTStack — Hong Kong VPS with low-latency access from mainland China. This is the same IDC that hosts dibi8.com — battle-tested in production.

Affiliate links — they don’t cost you extra and they help keep dibi8.com running.

Sources & Further Reading #

References & Sources #

💬 Bình luận & Thảo luận