Netdata sử dụng bao nhiêu RAM và CPU trên mỗi node?

Trong cấu hình mặc định, Netdata sử dụng dưới 5% CPU và khoảng 100-150 MB RAM trên mỗi node. Một node con streaming đến node cha ở chế độ RAM chỉ cần khoảng 50 MB, trong khi node cha lưu trữ dữ liệu 1 giây trong 7 ngày cho 10 node cần 2.5-3.5 GB RAM cùng với 1.4 GB page cache.

Netdata có thể chạy mà không cần Netdata Cloud không?

Có. Agent mã nguồn mở hoạt động đầy đủ mà không cần kết nối cloud. Bạn truy cập dashboard trực tiếp trên bất kỳ Agent nào qua cổng 19999 và sử dụng streaming parent-child để tập trung dữ liệu. Netdata Cloud chỉ bổ sung dashboard đa host thống nhất, ứng dụng di động và định tuyến cảnh báo nâng cao.

Làm thế nào để cài đặt Netdata bằng một lệnh duy nhất trên Linux?

Chạy curl -Ss https://get.netdata.cloud/kickstart.sh | sudo bash, lệnh này sẽ cài đặt Netdata với toàn bộ cấu hình mặc định trong dưới 60 giây. Xác minh bằng sudo systemctl status netdata và mở dashboard tại http://localhost:19999.

Netdata so sánh với Prometheus như thế nào trong việc giám sát?

Netdata thu thập metrics với độ chi tiết 1 giây, không cần cấu hình và có dashboard thời gian thực tích hợp sẵn; trong khi Prometheus thăm dò mỗi 15-60 giây và yêu cầu PromQL cùng công cụ trực quan hóa bên ngoài như Grafana. Nhiều nhóm chạy cả hai: dùng Netdata để xử lý sự cố thời gian thực trên từng node và Prometheus để tổng hợp ở cấp độ cluster. Netdata có thể xuất metrics sang Prometheus qua định dạng OpenMetrics.

Netdata có hỗ trợ metrics tùy chỉnh cho ứng dụng không?

Có. Bạn có thể gửi metrics tùy chỉnh qua server StatsD tích hợp trên cổng 8125, hiển thị chúng qua endpoint OpenMetrics, hoặc viết collector tùy chỉnh bằng Python hoặc Go sử dụng framework go.d.plugin.

Netdata: Giám Sát Thở Gian Thự 78K+ Star

lang: vi slug: netdata title: ‘Netdata: Real-Time Monitoring with 78K+ Stars’ description: ‘Netdata (ND) is a high-performance real-time monitoring agent with per-second metrics and visualization. Compatible with Docker, Kubernetes, Prometheus, and Grafana. Covers netdata tutorial, netdata setup, real time monitoring, netdata vs prometheus, and netdata performance tuning.’ tags: [‘devops’, ‘guide’, ‘monitoring’, ‘observability’, ‘open-source’, ‘reference’, ’tutorial’] date: 2026-05-19 00:00:00+08:00 lastmod: 2026-05-19 00:00:00+08:00 tech_stack: [] application_domain: Dev Utils source_version: ’’ licensing_model: Open Source license_type: GPL-3.0 file_size: ’’ file_md5: ’’ download_url: ’’ backup_url: ’’ github_repo: ‘https://github.com/netdata/netdata' last_maintained: ‘2026-05-19’ draft: false categories: [‘dev-utils’] aliases:

/posts/netdata/ faqs:
- q: ‘How much RAM and CPU does Netdata use per node?’ a: ‘In its default configuration Netdata uses under 5% CPU and roughly 100-150 MB RAM per node. A child node streaming to a parent in RAM mode uses about 50 MB, while a parent storing 7 days of 1-second data for 10 nodes needs 2.5-3.5 GB RAM with a 1.4 GB page cache.’
- q: ‘Can Netdata run without Netdata Cloud?’ a: ‘Yes. The open-source agent is fully functional with no cloud connection. You access dashboards directly on any agent at port 19999 and use parent-child streaming for centralized aggregation. Netdata Cloud only adds unified multi-host dashboards, mobile apps, and advanced alert routing.’
- q: ‘How do I install Netdata with one command on Linux?’ a: ‘Run curl -Ss https://get.netdata.cloud/kickstart.sh | sudo bash, which installs Netdata with all defaults in under 60 seconds. Verify it with sudo systemctl status netdata and open the dashboard at http://localhost:19999.’
- q: ‘How does Netdata compare to Prometheus for monitoring?’ a: ‘Netdata collects metrics at 1-second granularity with zero configuration and a built-in real-time dashboard, while Prometheus polls every 15-60 seconds and requires PromQL with external visualization like Grafana. Many teams run both, using Netdata for per-node real-time troubleshooting and Prometheus for cluster-level aggregation; Netdata can export metrics to Prometheus via OpenMetrics format.’
- q: ‘Does Netdata support custom application metrics?’ a: ‘Yes. You can send custom metrics through the built-in StatsD server on port 8125, expose them via the OpenMetrics endpoint, or write a custom collector in Python or Go using the go.d.plugin framework.’

Introduction #

Most monitoring tools show you what happened 30 seconds ago. By then, the microburst that killed your database connection pool is already gone — leaving only a cryptic log entry and an angry pager. Netdata closes that gap with per-second metrics, sub-2-second visualization latency, and a single binary that installs in under 60 seconds. With 78,874 GitHub stars and a GPL-3.0 license, it is one of the most popular open-source monitoring agents in production today. This guide walks through a complete Netdata setup — from a single-node Docker deployment to a streaming Kubernetes cluster — with real performance-tuning configurations you can apply immediately.

What Is Netdata? #

Netdata is a distributed real-time monitoring agent that collects, stores, and visualizes system and application metrics with per-second granularity. Unlike traditional polling-based systems that sample every 15–60 seconds, Netdata runs directly on each node, capturing thousands of metrics locally and streaming them to central parents when needed. Written in C (54.2%), Go (29%), and Rust (8.5%), it is designed for minimal resource overhead: under 5% CPU and ~150 MB RAM per node in default configuration.

How Netdata Works #

Netdata’s architecture follows an edge-first, distributed model. Each node runs an independent agent that auto-discovers 800+ integrations without configuration files.

Core components:

Data Collection Layer: Built-in collectors for CPU, memory, disk, network, containers, databases, and applications. No external dependencies like Telegraf or Exporters required.
Database Engine (dbengine): A custom high-performance time-series store using ~0.5 bytes per sample with tiered storage for long-term retention.
ML Anomaly Detection: 18 unsupervised ML models per metric run at the edge, achieving consensus-based anomaly detection with 99% false-positive reduction.
Streaming & Parent-Child: Any agent can act as a “parent” to aggregate metrics from child nodes, enabling horizontal scaling without central bottlenecks.
Web Dashboard: Embedded static-threaded web server serves real-time charts directly from the agent on port 19999.

Installation & Setup #

One-Line Install (Linux) #

The fastest way to get Netdata running:

a
s
h
# Install Netdata with all defaultscurl -Ss https://get.netdata.cloud/kickstart.sh | sudo bash

Verify the installation:

a
s
h
sudo systemctl status netdata
# Active: active (running) since ...

Access the local dashboard at http://localhost:19999.

Docker Deploy #

For containerized environments:

a
s
h
docker run -d --name=netdata \
  -p 19999:19999 \
  -v /proc:/host/proc:ro \
  -v /sys:/host/sys:ro \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  --cap-add SYS_PTRACE \
  --security-opt apparmor=unconfined \
  netdata/netdata:latest

Docker Compose (Production-Ready) #

a
m
l
version: '3.8'
services:
  netdata:
    image: netdata/netdata:v2.5.0
    container_name: netdata
    hostname: "netdata-${HOSTNAME}"
    ports:
      - "19999:19999"
    restart: unless-stopped
    cap_add:
      - SYS_PTRACE
      - SYS_ADMIN
    security_opt:
      - apparmor:unconfined
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /etc/os-release:/host/etc/os-release:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - netdata-config:/etc/netdata
      - netdata-lib:/var/lib/netdata
      - netdata-cache:/var/cache/netdata
    environment:
      - NETDATA_CLAIM_TOKEN=${NETDATA_CLAIM_TOKEN}
      - NETDATA_CLAIM_URL=https://app.netdata.cloud
      - NETDATA_CLAIM_ROOMS=${NETDATA_CLAIM_ROOMS}
volumes:
  netdata-config:
  netdata-lib:
  netdata-cache:

Kubernetes Helm Install #

a
s
h
# Add the Netdata Helm repository
helm repo add netdata https://netdata.github.io/helmchart/
helm repo update

# Install with default values
helm install netdata netdata/netdata \
  --namespace monitoring \
  --create-namespace

Verify the pods:

a
s
h
kubectl get pods -n monitoring
# NAME                    READY   STATUS
# netdata-parent-0        1/1     Running
# netdata-child-xxx       1/1     Running

Generate Current Config #

Download the running configuration to customize:

a
s
h
# Download current effective config
curl -o /etc/netdata/netdata.conf http://localhost:19999/netdata.conf
# Or use the edit-config script
sudo /etc/netdata/edit-config netdata.conf

Integration with Popular Tools #

Prometheus Remote Write #

Export Netdata metrics to Prometheus for long-term storage and PromQL queries:

a
s
h
sudo /etc/netdata/edit-config exporting.conf

o
n
f
[prometheus:remote_write]
    enabled = yes
    destination = prometheus:9090
    remote write URL path = /api/v1/write
    data source = average
    prefix = netdata
    send charts matching = *
    send hosts matching = *

Restart Netdata:

a
s
h
sudo systemctl restart netdata

Grafana Dashboard #

While Netdata has a built-in dashboard, many teams prefer Grafana for centralized visualization. Add Netdata as a Prometheus data source in Grafana:

a
m
l
# datasource.yaml in Grafana
apiVersion: 1
datasources:
  - name: Netdata-Prometheus
    type: prometheus
    url: http://netdata:19999/api/v1/allmetrics?format=prometheus
    access: proxy
    isDefault: false
    jsonData:
      timeInterval: "1s"

Kubernetes DaemonSet (Advanced) #

For full host-level visibility on every K8s node:

a
m
l
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: netdata
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: netdata
  template:
    metadata:
      labels:
        app: netdata
    spec:
      hostNetwork: true
      hostPID: true
      containers:
        - name: netdata
          image: netdata/netdata:v2.5.0
          ports:
            - containerPort: 19999
              hostPort: 19999
          securityContext:
            capabilities:
              add: [SYS_PTRACE, SYS_ADMIN]
          volumeMounts:
            - name: proc
              mountPath: /host/proc
              readOnly: true
            - name: sys
              mountPath: /host/sys
              readOnly: true
            - name: docker-sock
              mountPath: /var/run/docker.sock
              readOnly: true
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: sys
          hostPath:
            path: /sys
        - name: docker-sock
          hostPath:
            path: /var/run/docker.sock

PostgreSQL Monitoring #

Enable the PostgreSQL collector in go.d/postgres.conf:

a
m
l
jobs:
  - name: local
    dsn: 'postgres://netdata_monitor:password@localhost:5432/postgres'
    collect:
      - database_statistics
      - table_statistics
      - index_statistics
      - replication_statistics
    timeout: 2

Test the collector:

a
s
h
sudo /etc/netdata/edit-config go.d/postgres.conf
# Restart to apply
sudo systemctl restart netdata

Nginx Monitoring #

Monitor Nginx stub_status and access logs:

a
m
l
# /etc/netdata/go.d/nginx.conf
jobs:
  - name: local
    url: http://localhost/stub_status

  - name: access_log
    path: /var/log/nginx/access.log
    parser:
      type: ltsv

Benchmarks / Real-World Use Cases #

Resource Footprint Comparison #

The University of Amsterdam published a peer-reviewed study (ICSOC 2023) ranking Netdata as the most energy-efficient monitoring tool for Docker-based systems. Independent benchmarks confirm:

Scenario	Netdata	Prometheus + Node Exporter	Zabbix Agent
CPU Overhead (%)	1–5%	5–15%	10–20%
RAM per Node	100–150 MB	200–500 MB	150–300 MB
Collection Interval	1 second	15–60 seconds	30–60 seconds
Startup Time	< 60 seconds	30–60 minutes	2–4 hours
Metrics per Node	2,000+	500–1,000	1,000–2,000
Config Required	Zero	Moderate	Extensive

Streaming Scale Benchmarks #

Netdata’s parent-child streaming architecture scales horizontally:

Single Parent: 1 million+ samples/second ingestion, ~3.5 GB RAM
10 Child Nodes: 20k metrics each, 1-second retention for 7 days, 12 GB disk
Active-Active Cluster: Unlimited horizontal scaling with full data replication
Ephemeral Nodes: Stream metrics to parents, retain data after node termination

Real-World Deployment: 500-Node K8s Cluster #

A mid-size SaaS company running 500 Kubernetes nodes uses:

5 Netdata parents (active-active) across 3 availability zones
500 child agents via DaemonSet, each running in RAM mode (~50 MB)
Tiered storage: 1-second for 7 days, 1-minute for 1 month, 1-hour for 1 year
Total parent storage: 25 GB per parent (125 GB cluster-wide)
Alert routing via Netdata Cloud to PagerDuty and Slack

Advanced Usage / Production Hardening #

Performance Tuning: netdata.conf #

The default configuration is optimized for standalone use. For production systems, tune the database engine and disable unnecessary collectors.

Minimal footprint (production child nodes):

o
n
f
[global]
    # Run with lowest priority to avoid impacting applications
    process scheduling policy = batch
    process nice level = 19
    # Reduce threads on constrained nodes
    libuv worker threads = 4

[db]
    # Use RAM-only mode for child nodes streaming to parents
    mode = ram
    retention = 3600
    # Single tier is sufficient for local buffering
    storage tiers = 1

[ml]
    # Disable ML on child nodes — run on parents
    enabled = no

[health]
    # Disable local alerting on child nodes
    enabled = no

[web]
    # Bind only to localhost for security
    bind to = 127.0.0.1
    allow connections from = localhost

[plugins]
    # Disable unused collectors to reduce CPU/RAM
    ebpf = no
    perf = no
    nfacct = no
    fping = no
    ioping = no
    tc = no
    idlejitter = no
    debugfs = no
    systemd-journal = no

Parent node with tiered storage (central monitoring):

o
n
f
[db]
    mode = dbengine
    storage tiers = 3
    dbengine page cache size = 1.4GiB

    # Tier 0: 1-second resolution, 7 days
    update every = 1
    dbengine tier 0 retention size = 12GiB
    dbengine tier 0 retention time = 7d

    # Tier 1: 1-minute resolution, 1 month
    dbengine tier 1 update every iterations = 60
    dbengine tier 1 retention size = 4GiB
    dbengine tier 1 retention time = 1mo

    # Tier 2: 1-hour resolution, 1 year
    dbengine tier 2 update every iterations = 60
    dbengine tier 2 retention size = 2GiB
    dbengine tier 2 retention time = 1y

[ml]
    enabled = yes
    # 18 ML models per metric, trains every 3 hours
    train every = 10800
    number of models per dimension = 18

[health]
    enabled = yes
    # Run health checks every 10 seconds
    run at least every seconds = 10

[web]
    # Expose to network for dashboard access
    bind to = *
    # Restrict access to internal networks
    allow connections from = 10.* 192.168.* 172.16.* 172.17.*

Streaming Configuration: stream.conf #

On child nodes (/etc/netdata/stream.conf):

o
n
f
[stream]
    enabled = yes
    destination = tcp:netdata-parent.monitoring.svc.cluster.local:19999
    api key = YOUR_API_KEY_HERE
    timeout seconds = 60
    default port = 19999
    send charts matching = *
    buffer size bytes = 1048576
    reconnect delay seconds = 5
    initial clock resync iterations = 60

On the parent (/etc/netdata/stream.conf):

o
n
f
[API_KEY]
    enabled = yes
    default memory mode = dbengine
    health enabled by default = yes

Security Hardening #

Enable TLS for the web interface:

o
n
f
[web]
    tls version = 1.3
    ssl key = /etc/netdata/ssl/key.pem
    ssl certificate = /etc/netdata/ssl/cert.pem
    # Require TLS for all connections
    bind to = *=dashboard|registry|badges|management|streaming|netdata.conf|readable|writable

Monitoring Netdata Itself #

Track the agent’s own resource usage:

a
s
h
# View internal metrics
curl -s http://localhost:19999/api/v1/info | jq '.version, .hog'

# Check dbengine statistics
curl -s http://localhost:19999/api/v1/data?chart=netdata.dbengine_main_page_stats

Comparison with Alternatives #

Feature	Netdata	Prometheus	Datadog	Zabbix
Collection Interval	1 second	15–60 seconds	15 seconds	30–60 seconds
Agent RAM	100–150 MB	200–500 MB	200–400 MB	150–300 MB
Agent CPU	1–5%	5–15%	3–8%	10–20%
Setup Time	< 60 seconds	30–60 min	10–20 min	2–4 hours
Auto-Discovery	800+ integrations	Limited	600+ integrations	Template-based
Built-in Dashboard	Yes (real-time)	No (PromQL only)	Yes	Yes (dated)
ML Anomaly Detection	18 models/metric	No (requires extra)	Yes	Limited
Long-Term Storage	Tiered dbengine	15 days default	Cloud	External DB
Licensing	GPL-3.0 (free)	Apache-2.0 (free)	$15/host/month	GPL-2.0 (free)
Alert Routing	Built-in + Cloud	Alertmanager	Built-in	Built-in
Kubernetes Native	Yes (Helm chart)	Yes (operator)	Yes (operator)	Limited

When to choose what:

Netdata: Single-node visibility, real-time troubleshooting, resource-constrained environments, teams that need zero-config monitoring.
Prometheus: Cloud-native metric aggregation, complex PromQL queries, long-term storage with Thanos/VictoriaMetrics.
Datadog: Enterprise-wide observability (metrics + logs + traces + APM), managed SaaS with SOC 2 compliance.
Zabbix: Mixed infrastructure (network gear + servers + SNMP), organizations needing agent + agentless monitoring.

Limitations / Honest Assessment #

Netdata is not the right tool for every monitoring scenario:

No centralized multi-host view without Netdata Cloud: The open-source agent dashboards are per-node. For a unified view across 100+ nodes, you either need Netdata Cloud (SaaS) or a parent streaming setup with external visualization.
Limited long-term storage in default config: The dbengine defaults to ~256 MB disk per tier. For years of retention at scale, you need explicit tiered storage configuration or an external TSDB like VictoriaMetrics.
Alerting pipeline less flexible than Alertmanager: While Netdata has built-in health checks and notifications, complex routing trees, silencing, and on-call rotation require Netdata Cloud or integration with PagerDuty/OpsGenie.
Not a full observability platform: Netdata focuses on metrics. For distributed tracing, structured logging, and APM, pair it with Jaeger, Loki, or OpenTelemetry.
Windows support is newer: The Windows agent is functional but has fewer collectors than Linux. For Windows-centric environments, verify collector coverage before deploying.

Frequently Asked Questions #

Q: How much RAM does Netdata actually use in production?

On a child node streaming to a parent, ~50 MB with the RAM-mode configuration shown above. On a parent storing 7 days of 1-second data for 10 nodes, expect 2.5–3.5 GB RAM with a 1.4 GB page cache. The agent auto-tunes cache sizes based on available system memory.

Q: Can I run Netdata without Netdata Cloud?

Yes. The open-source agent is fully functional without any cloud connection. Use parent-child streaming for centralized aggregation and access dashboards directly on any agent at port 19999. Netdata Cloud adds unified dashboards, mobile apps, and advanced alert routing but is not required.

Q: How does Netdata compare to Prometheus + Grafana?

Netdata excels at real-time, per-node visibility with zero configuration. Prometheus excels at metric aggregation across dynamic, ephemeral fleets with PromQL. Many teams run both: Netdata on every node for real-time troubleshooting, Prometheus for cluster-level aggregation and long-term trending. Netdata can export metrics to Prometheus via OpenMetrics format.

Q: What is the maximum scale Netdata can handle?

A single parent node can ingest 1+ million samples/second. For horizontal scaling, deploy active-active parent clusters. There is no theoretical limit — the architecture is distributed by design. The Netdata Cloud SaaS handles millions of nodes.

Q: How do I back up Netdata’s database and configuration?

Configuration lives in /etc/netdata/ and can be version-controlled (Git, Ansible, Puppet). The dbengine database in /var/cache/netdata/ is self-healing and does not require manual backup — streaming to multiple parent nodes provides natural redundancy. For critical environments, run active-active parent pairs.

Q: Does Netdata support custom application metrics?

Yes. Use the built-in StatsD server (port 8125), the OpenMetrics endpoint, or write a custom collector in Python or Go. The go.d.plugin framework supports building new collectors with minimal boilerplate.

Conclusion #

Netdata delivers on a promise most monitoring tools fail to keep: immediate, per-second visibility with negligible resource overhead. For teams running Docker, Kubernetes, or bare-metal infrastructure, it is the fastest path from “nothing” to “monitoring everything.” The streaming architecture scales from a single Raspberry Pi to a multi-region Kubernetes cluster, and the GPL-3.0 license means zero licensing costs.

Action items:

Run the one-line installer on your most critical server today
Deploy the Helm chart on your Kubernetes cluster
Configure parent-child streaming for production hardening
Tune netdata.conf for your resource constraints using the configs above

Join the Netdata community on Telegram for real-time support and discussions with 5,000+ engineers.

Affiliate disclosure: This article contains affiliate links for DigitalOcean and HTStack. If you purchase hosting services through these links, dibi8.com may receive a commission at no additional cost to you.

Recommended Hosting & Infrastructure #

Before you deploy any of the tools above into production, you’ll need solid infrastructure. Two options dibi8 actually uses and recommends:

DigitalOcean — $200 free credit for 60 days across 14+ global regions. The default option for indie devs running open-source AI tools.
HTStack — Hong Kong VPS with low-latency access from mainland China. This is the same IDC that hosts dibi8.com — battle-tested in production.

Affiliate links — they don’t cost you extra and they help keep dibi8.com running.

Netdata: Giám Sát Thở Gian Thự 78K+ Star

Introduction #

What Is Netdata? #

How Netdata Works #

Installation & Setup #

One-Line Install (Linux) #

Docker Deploy #

Docker Compose (Production-Ready) #

Kubernetes Helm Install #

Generate Current Config #

Integration with Popular Tools #

Prometheus Remote Write #

Grafana Dashboard #

Kubernetes DaemonSet (Advanced) #

PostgreSQL Monitoring #

Nginx Monitoring #

Benchmarks / Real-World Use Cases #

Resource Footprint Comparison #

Streaming Scale Benchmarks #

Real-World Deployment: 500-Node K8s Cluster #

Advanced Usage / Production Hardening #

Performance Tuning: netdata.conf #

Streaming Configuration: stream.conf #

Security Hardening #

Monitoring Netdata Itself #

Comparison with Alternatives #

Limitations / Honest Assessment #

Frequently Asked Questions #

Conclusion #

Recommended Hosting & Infrastructure #

Sources & Further Reading #

References & Sources #

💬 Bình luận & Thảo luận

Introduction #

What Is Netdata? #

How Netdata Works #

Installation & Setup #

One-Line Install (Linux) #

Docker Deploy #

Docker Compose (Production-Ready) #

Kubernetes Helm Install #

Generate Current Config #

Integration with Popular Tools #

Prometheus Remote Write #

Grafana Dashboard #

Kubernetes DaemonSet (Advanced) #

PostgreSQL Monitoring #

Nginx Monitoring #

Benchmarks / Real-World Use Cases #

Resource Footprint Comparison #

Streaming Scale Benchmarks #

Real-World Deployment: 500-Node K8s Cluster #

Advanced Usage / Production Hardening #

Performance Tuning: netdata.conf #

Streaming Configuration: stream.conf #

Security Hardening #

Monitoring Netdata Itself #

Comparison with Alternatives #

Limitations / Honest Assessment #

Frequently Asked Questions #

Conclusion #

Recommended Hosting & Infrastructure #

Sources & Further Reading #

References & Sources #

🔗 Tài nguyên liên quan

💬 Bình luận & Thảo luận