1. 概述

See the source image

node_exporter是我们最常用的exporter之一,我们可以把他认为是一个agent,需要被安装在操作系统之上,然后才能采集到系统的数据。采集Linux类的exporter被称为node_exporter,如果我们使用sidecar的方式来运行的话,他就可以采集pod的信息。

我们在前面的指标采集一章中,已经使用node_exporter做了一个demo,他的采集的指标比zabbix多了好多。其实,这只是他的冰山一角,因为我们配置exporter的时候,使用的是默认启动的方式,他还有很多开关,需要添加参数才能生效。同时,他还可以指定那类指标不收集,以此来节省我们的资源消耗。

2. 详解node_exporter

node_exporter是prometheus官方提供的agent,项目被托管在prometheus的账号之下。我们也可以通过官方提供的链接下载最新的版本。在官方的github里面已经提供了非常详细使用说明,我们这里带大家梳理一下

2.1. 默认启用的参数

官方地址,这些参数就是我们前面做实验的时候默认收集的指标

Name Description OS
arp Exposes ARP statistics from /proc/net/arp. Linux
bcache Exposes bcache statistics from /sys/fs/bcache/. Linux
bonding Exposes the number of configured and active slaves of Linux bonding interfaces. Linux
boottime Exposes system boot time derived from the kern.boottime sysctl. Darwin, Dragonfly, FreeBSD, NetBSD, OpenBSD, Solaris
conntrack Shows conntrack statistics (does nothing if no /proc/sys/net/netfilter/ present). Linux
cpu Exposes CPU statistics Darwin, Dragonfly, FreeBSD, Linux, Solaris
cpufreq Exposes CPU frequency statistics Linux, Solaris
diskstats Exposes disk I/O statistics. Darwin, Linux, OpenBSD
edac Exposes error detection and correction statistics. Linux
entropy Exposes available entropy. Linux
exec Exposes execution statistics. Dragonfly, FreeBSD
filefd Exposes file descriptor statistics from /proc/sys/fs/file-nr. Linux
filesystem Exposes filesystem statistics, such as disk space used. Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
hwmon Expose hardware monitoring and sensor data from /sys/class/hwmon/. Linux
infiniband Exposes network statistics specific to InfiniBand and Intel OmniPath configurations. Linux
ipvs Exposes IPVS status from /proc/net/ip_vs and stats from /proc/net/ip_vs_stats. Linux
loadavg Exposes load average. Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris
mdadm Exposes statistics about devices in /proc/mdstat (does nothing if no /proc/mdstat present). Linux
meminfo Exposes memory statistics. Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netclass Exposes network interface info from /sys/class/net/ Linux
netdev Exposes network interface statistics such as bytes transferred. Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netstat Exposes network statistics from /proc/net/netstat. This is the same information as netstat -s. Linux
nfs Exposes NFS client statistics from /proc/net/rpc/nfs. This is the same information as nfsstat -c. Linux
nfsd Exposes NFS kernel server statistics from /proc/net/rpc/nfsd. This is the same information as nfsstat -s. Linux
pressure Exposes pressure stall statistics from /proc/pressure/. Linux (kernel 4.20+ and/or CONFIG_PSI)
rapl Exposes various statistics from /sys/class/powercap. Linux
schedstat Exposes task scheduler statistics from /proc/schedstat. Linux
sockstat Exposes various statistics from /proc/net/sockstat. Linux
softnet Exposes statistics from /proc/net/softnet_stat. Linux
stat Exposes various statistics from /proc/stat. This includes boot time, forks and interrupts. Linux
textfile Exposes statistics read from local disk. The --collector.textfile.directory flag must be set. any
thermal_zone Exposes thermal zone & cooling device statistics from /sys/class/thermal. Linux
time Exposes the current system time. any
timex Exposes selected adjtimex(2) system call stats. Linux
udp_queues Exposes UDP total lengths of the rx_queue and tx_queue from /proc/net/udp and /proc/net/udp6. Linux
uname Exposes system information as provided by the uname system call. Darwin, FreeBSD, Linux, OpenBSD
vmstat Exposes statistics from /proc/vmstat. Linux
xfs Exposes XFS runtime statistics. Linux (kernel 4.4+)
zfs Exposes ZFS performance statistics. Linux, Solaris

可以看到,每个指标都有他对应的操作系统,并不是每种系统都适用的。如果不想收集某个类型的指标,就使用--no-collector.<name>参数,比如

./node_exporter --no-collector.time

2.2. 默认不启用的参数

默认不启用的参数需要通过--collector.<name>参数来启用,官方提供的不启用的参数如下

Name Description OS
buddyinfo Exposes statistics of memory fragments as reported by /proc/buddyinfo. Linux
devstat Exposes device statistics Dragonfly, FreeBSD
drbd Exposes Distributed Replicated Block Device statistics (to version 8.4) Linux
interrupts Exposes detailed interrupts statistics. Linux, OpenBSD
ksmd Exposes kernel and system statistics from /sys/kernel/mm/ksm. Linux
logind Exposes session counts from logind. Linux
meminfo_numa Exposes memory statistics from /proc/meminfo_numa. Linux
mountstats Exposes filesystem statistics from /proc/self/mountstats. Exposes detailed NFS client statistics. Linux
ntp Exposes local NTP daemon health to check time any
processes Exposes aggregate process statistics from /proc. Linux
qdisc Exposes queuing discipline statistics Linux
runit Exposes service status from runit. any
supervisord Exposes service status from supervisord. any
systemd Exposes service and system status from systemd. Linux
tcpstat Exposes TCP connection status information from /proc/net/tcp and /proc/net/tcp6. (Warning: the current version has potential performance issues in high load situations.) Linux
wifi Exposes WiFi device and station statistics. Linux
perf Exposes perf based metrics (Warning: Metrics are dependent on kernel configuration and settings). Linux

如果大家使用的是Centos系,或者ubuntu系的,我们建议大家打开ntp,mountstats,systemd,ntp,tcpstat这几个选项

2.3. 文本收集器

使用--collector.textfile.directory选项可以把文本中的内容也用metrics的方式收集起来,但是用的不多,大家了解就好

2.4. 在prometheus中过滤collector

我们还可以在prometheus中过滤要采集的选项,在exporter的配置文件中使用下面的配置

  params:
    collect[]:
      - foo
      - bar

这个功能可以减少prometheus的压力,但是并不会减少node_exporter采集到的数据。有点鸡肋。。。既然不要这个,就不要采集了嘛。。。

2.5. TLS的配置

我们默认暴露的都是http的请求的,也没有验证,也就是明文传输,很容易存在安全隐患,如果需要对数据加密,我们就要使用参数./node_exporter --web.config=web-config.yml。这个文件中含有TLS或者basic auth等选项的配置。我们先来配置TLS

  • 生成证书

    # mkdir -p prometheus-tls
    # cd prometheus-tls
    # prometheus-tls openssl req -new -newkey rsa:2048 -days 365 -nodes -x509 -keyout node_exporter.key -out node_exporter.crt -subj "/C=CN/ST=Beijing/L=Beijing/O=jormun.com/CN=localhost"
    Generating a RSA private key
    ...........................................+++++
    ..+++++
    writing new private key to 'node_exporter.key'
    -----
    # prometheus-tls ls
    node_exporter.crt  node_exporter.key
    
  • 配置web-config.yml文件

    tls_server_config:
      cert_file: node_exporter.crt
      key_file: node_exporter.key
    

2.6. 配置http的basic认证

同样是修改--web.config=web-config.yml中的web-config.yml文件中的参数。我们需要使用一些工具来生成密码。比如,我们的用户名为prometheus,密码为node_exporter。

  • 注意:这个功能是在node_exporter 1.0.0之后才有的功能,老版本想要认证,就需要借助反向代理,比如nginx上的认证

我们先配置node_exporter,这里面的密码是使用hash来加密的,我们常用的工具是htpasswd

htpasswd -nBC 12 '' | tr -d ':\n'
New password: 
Re-type new password: 
$2y$12$YND5ZlkC/pbUh4pvKnE1wehfpnha2DRdR.GrmRSQAcLff1.gOmh0.

使用不交互的方式,生成文件

htpasswd -bcBC 12 /application/prometheus/conf/htpasswd prometheus node_exporter

在原有文件中添加一个用户

htpasswd -bBC 12 /application/prometheus/conf/htpasswd prometheus node_exporter

不更新密码文件,只在屏幕上输出用户名和经过加密后的密码

htpasswd -nbBC 12 prometheus node_exporter

修改web-config.yml的配置如下

basic_auth_users:
  prometheus: $2y$12$c97Lg30Imrl6hXZJ8h9IC.KQ0YhL8rQtHCuKlfrgSYv6RmmUbBlXO

2.6.1. 配置prometheus使用scrapy

修改prometheus的配置如下

global:
  scrape_interval:     15s 
  evaluation_interval: 15s 

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    scheme: http
    basic_auth:
      username: prometheus
      password: node_exporter
    static_configs:
    - targets: ['localhost:9100']

2.6.2. 在consul中注册node_exporter

如果要把node_exporter注册在consul中,从而给prometheus作为数据源的话,我们在前面介绍过了。在添加认证之后的node_exporter则需要在注册服务的同时,在check选项中,加入header字段。但是,header字段是通过username:password的方式同时传递给header的,因此,我们需要使用这个方式,把prometheus:node_exporter同时做base64处理。

perl -e 'use MIME::Base64; print encode_base64("prometheus:node_exporter")'
cHJvbWV0aGV1czpub2RlX2V4cG9ydGVy

然后在注册的时候,json文件如下

{
  "ID": "node-exporter-10-114-2-73",
  "Name": "node-exporter-10-114-2-73",
  "Tags": [
    "node-exporter",
    "nginx",
    "keepalived"
  ],
  "Address": "10.114.2.73",
  "Port": 9100,
  "Meta": {
    "team": "cloud",
    "project": "monitoring"
  },
  "EnableTagOverride": false,
  "Check": {
    "HTTP": "http://10.114.2.73:9100/metrics",
    "header": {"Authorization":["Basic cHJvbWV0aGV1czpub2RlX2V4cG9ydGVy"]},
    "Interval": "10s"
  },
  "Weights": {
    "Passing": 10,
    "Warning": 1
  }
}

2.6. systemd例子

为了便于管理,我们使用systemd来管理这个服务

cat > /etc/systemd/system/node_exporter.service << EOF
[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/bin/node_exporter \
          --web.config=web-config.yml \
          --collector.ntp \
          --collector.mountstats \
          --collector.systemd \
          --collector.tcpstat
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
Restart=always
[Install]
WantedBy=multi-user.target
EOF