一、Promtheus概述
Prometheus(普罗米修斯)是一套开源的监控&报警&时间序列数据库的组合,由 SoundCloud 公司开发,广泛用于云原生环境和容器化应用的监控和性能分析。其提供了通用的数据模型和快捷数据采集、存储和查询接口。它的核心组件Prometheus server会定期从静态配置的监控目标或者基于服务发现自动配置的自标中进行拉取数据,当新拉取到的数据大于配置的内存缓存区时,数据就会持久化到存储设备当中。
Prometheus 基本原理是通过 HTTP 协议周期性抓取被监控组件的状态,这样做的好处是任意组件只要提供 HTTP 接口就可以接入监控系统,不需要任何 SDK 或者其他的集成过程。这样做非常适合虚拟化环境比如 VM 或者 Docker 。
Prometheus 应该是为数不多的适合 Docker、Mesos、Kubernetes 环境的监控系统之一。
每个被监控的主机都可以通过专用的 exporter 程序提供输出监控数据的接口,它会在目标处收集监控数据,并暴露出一个HTTP接口供Prometheus server查询,Prometheus通过基于HTTP的pull的方式来周期性的采集数据。
如果存在告警规则,则抓取到数据之后会根据规则进行计算,满足告警条件则会生成告警,并发送到Alertmanager完成告警的汇总和分发
当被监控的目标有主动推送数据的需求时,可以以Pushgateway组件进行接收并临时存储数据,然后等待Prometheus Server完成数据的采集。
任何被监控的目标都需要事先纳入到监控系统中才能进行时序数据采集、存储、告警和展示,监控目标可以通过配置信息以静态形式指定,也可以让Prometheus通过服务发现的机制进行动态管理。
Prometheus 能够直接把API Server作为服务发现系统使用,进而动态发现和监控集群中的所有可被监控的对象。
二、Prometheus基础架构
Prometheus 生态系统包含了几个关键的组件:Prometheus server、Pushgateway、Alertmanager、Web UI 等。
Prometheus:主要是负责存储、抓取、聚合、查询方面。
Alertemanager:主要是负责实现报警功能。
Pushgateway:主要是实现接收有 Client-push 过来的指标数据,在指定的时间间隔,有主程序来抓取。
\*\_exporter:主要是负责采集物理机、中间件的信息。
三、Prometheus安装
主机名 | 主机IP |
---|---|
Prometheus | 192.168.213.142 |
Granfana | 192.168.213.142 |
Client | 192.168.213.131 |
①、环境准备
Prometheus服务器操作系统:centos 7(64)
其中Grafana、Prometheus部署在一台机器上,Client我随意挑选了一台现有的k8s主机
部署前的准备:
1、关闭所有机器上的防火墙
systemctl stop firewalld.service && systemctl disable firewalld.service
2、更换源
curl -o /etc/yum.repos.d/CentOS-Base.repo https://mirrors.aliyun.com/repo/Centos-7.repo
sed -i -e '/mirrors.cloud.aliyuncs.com/d' -e '/mirrors.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo
yum clean all && yum makecache
3、保证所有的机器上的时间是准确的,可以使用date命令进行查询,如果不准确必须更改,可以使用ntp命令同步最新的网络时间
yum install -y ntp
ntpdate ntp.aliyun.com
②、Prometheus 官网下载地址:
https://prometheus.io/download/
下载最新的安装包:prometheus-3.0.1.linux-amd64.tar.gz
上传至服务器,并解压,我存放在opt目录下面
cd /opt
tar -zxvf prometheus-3.0.1.linux-amd64.tar.gz
③、直接启动
nohup /opt/prometheus-3.0.1.linux-amd64/prometheus --config.file="/opt/prometheus-3.0.1.linux-amd64/prometheus.yml" &
④、访问平台主页,默认是9090端口:
⑤、开机自启动prometheus:
[root@localhost ~]# vim /etc/rc.local
#!/bin/bash
# THIS FILE IS ADDED FOR COMPATIBILITY PURPOSES
#
# It is highly advisable to create own systemd services or udev rules
# to run scripts during boot instead of using this file.
#
# In contrast to previous versions due to parallel execution during boot
# this script will NOT be run after all other services.
#
# Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure
# that this script will be executed during boot.
touch /var/lock/subsys/local
nohup /opt/prometheus-3.0.1.linux-amd64/prometheus --config.file="/opt/prometheus-3.0.1.linux-amd64/prometheus.yml" &
[root@localhost ~]# chmod +x /etc/rc.local
[root@localhost ~]#
四、Client加入Prometheus监控
①、在被监控的机器上安装Node_exporter,根据操作系统版本选择合适的安装包。
安装包下载地址:https://prometheus.io/download/
我的客户端是centos 7的操作系统,下载了node_exporter-1.8.2.linux-amd64.tar.gz,上传至Client服务器,并解压。
tar -zxvf node_exporter-1.8.2.linux-amd64.tar.gz
②、直接启动即可
nohup /root/node_exporter-1.8.2.linux-amd64/node_exporter &
③、查看9100端口是否为监听状态
[root@k8s-master ~]# lsof -i:9100
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
node_expo 5028 root 3u IPv6 52025 0t0 TCP *:jetdirect (LISTEN)
node_expo 5028 root 7u IPv6 52152 0t0 TCP k8s-master:jetdirect->192.168.213.142:54952 (ESTABLISHED)
#如果提示lsof: command not found,执行yum install -y lsof即可
通过网页可以查看被收集的信息:
http://192.168.213.131:9100/metrics
五、Prometheus监控添加Client信息
①、yml文件中增加配置项
vim /opt/prometheus-3.0.1.linux-amd64/prometheus.yml
最下面三行是我的增加项,其余是平台默认设置
注意配置参数要和默认的对齐,否则yml解析异常,平台无法启动
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: "k8s-master-131" #取一个job名称来代表被监控的机器
static_configs:
- targets: ["192.168.213.131:9100"] #这里改成被监控机器的IP,后面端口为9100
②、中止正在执行的平台进程
pkill prometheus
③、检查是否成功中止
lsof -i:9090
④、启动prometheus
nohup /opt/prometheus-3.0.1.linux-amd64/prometheus --config.file="/opt/prometheus-3.0.1.linux-amd64/prometheus.yml" &
⑤、检查是否成功运行
[root@localhost opt]# lsof -i:9090
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
prometheu 16209 root 6u IPv6 40997 0t0 TCP *:websm (LISTEN)
prometheu 16209 root 11u IPv6 40112 0t0 TCP localhost:42456->localhost:websm (ESTABLISHED)
prometheu 16209 root 12u IPv6 40113 0t0 TCP localhost:websm->localhost:42456 (ESTABLISHED)
⑥、监控平台上可以看到Client端信息:
⑦、开机自启动Node_exporter服务
[root@localhost ~]# vim /etc/rc.local
#!/bin/bash
# THIS FILE IS ADDED FOR COMPATIBILITY PURPOSES
#
# It is highly advisable to create own systemd services or udev rules
# to run scripts during boot instead of using this file.
#
# In contrast to previous versions due to parallel execution during boot
# this script will NOT be run after all other services.
#
# Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure
# that this script will be executed during boot.
touch /var/lock/subsys/local
nohup /opt/prometheus-3.0.1.linux-amd64/prometheus --config.file="/opt/prometheus-3.0.1.linux-amd64/prometheus.yml" &
[root@localhost ~]# chmod +x /etc/rc.local
[root@localhost ~]#
六、监控MYSQL
①、测试环境Client端安装mysql可以参考如下链接:
https://blog.csdn.net/m0_72532428/article/details/140465141
②、安装 mysqld_exporter
安装包下载地址:https://prometheus.io/download/
[root@k8s-master ~]# tar -zxvf mysqld_exporter-0.16.0.linux-amd64.tar.gz
mysqld_exporter-0.16.0.linux-amd64/
mysqld_exporter-0.16.0.linux-amd64/NOTICE
mysqld_exporter-0.16.0.linux-amd64/LICENSE
mysqld_exporter-0.16.0.linux-amd64/mysqld_exporter
③、在MYSQL数据库中创建一个授权用户
[root@k8s-master ~]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 9
Server version: 8.0.40 MySQL Community Server - GPL
Copyright (c) 2000, 2024, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>SET GLOBAL validate_password.policy = LOW; ##设置密码策略为低
Query OK, 0 rows affected (0.04 sec)
mysql>SET GLOBAL VALIDATE_PASSWORD.LENGTH = 6; ##设置密码长度最少为6位
Query OK, 0 rows affected (0.04 sec)
mysql> create user 'mysql_monitor'@'localhost' identified by '123456'; ##创建用户
Query OK, 0 rows affected (0.04 sec)
mysql> FLUSH PRIVILEGES; ##刷新权限
Query OK, 0 rows affected (0.04 sec)
mysql> quit
Bye
注意:(创建用户时授权ip为localhost,因为不是prometheus服务器来直接找mariadb 获取数据, 而是prometheus服务器找mysql_exporter,mysql_exporter 再找mariadb。 所以这个localhost是指的mysql_exporter的IP)
④、回到mysqld_exporter,配置上授权用户的账号信息
vim /root/mysqld_exporter-0.16.0.linux-amd64/.my.cnf
添加以下信息:
[client]
user=mysql_monitor
password=123456
⑤、启动mysqld_exporter,检查是否成功运行:
[root@k8s-master mysqld_exporter-0.16.0.linux-amd64]# nohup /root/mysqld_exporter-0.16.0.linux-amd64/mysqld_exporter --config.my-cnf="/root/mysqld_exporter-0.16.0.linux-amd64/.my.cnf" &
[2] 10574
[root@k8s-master mysqld_exporter-0.16.0.linux-amd64]# nohup: ignoring input and appending output to ‘nohup.out’
[root@k8s-master mysqld_exporter-0.16.0.linux-amd64]# lsof -i:9104
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mysqld_ex 10574 root 3u IPv6 3993984 0t0 TCP *:peerwire (LISTEN)
⑥、Prometheus监控平台增加mysqld_exporter对应的配置项:
最下面三行是我新增的项
[root@localhost opt]# vim /opt/prometheus-3.0.1.linux-amd64/prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: "k8s-master-131"
static_configs:
- targets: ["192.168.213.131:9100"]
- job_name: "k8s-master-131-mysql"
static_configs:
- targets: ["192.168.213.131:9104"]
⑦、重启平台服务:
[root@localhost opt]# pkill prometheus
[root@localhost opt]# nohup /opt/prometheus-3.0.1.linux-amd64/prometheus --config.file="/opt/prometheus-3.0.1.linux-amd64/prometheus.yml" &
[2] 21747
[1] Done nohup /opt/prometheus-3.0.1.linux-amd64/prometheus --config.file="/opt/prometheus-3.0.1.linux-amd64/prometheus.yml"
[root@localhost opt]# nohup: ignoring input and appending output to ‘nohup.out’
[root@localhost opt]# lsof -i:9090
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
prometheu 21747 root 6u IPv6 64326 0t0 TCP *:websm (LISTEN)
[root@localhost opt]#
⑧、刷新一下Target页面就有对应的mysql监控信息:
⑨、开机自启动mysqld_exporter服务
[root@localhost ~]# vim /etc/rc.local
#!/bin/bash
# THIS FILE IS ADDED FOR COMPATIBILITY PURPOSES
#
# It is highly advisable to create own systemd services or udev rules
# to run scripts during boot instead of using this file.
#
# In contrast to previous versions due to parallel execution during boot
# this script will NOT be run after all other services.
#
# Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure
# that this script will be executed during boot.
touch /var/lock/subsys/local
nohup /root/mysqld_exporter-0.16.0.linux-amd64/mysqld_exporter --config.my-cnf="/root/mysqld_exporter-0.16.0.linux-amd64/.my.cnf" &
[root@localhost ~]# chmod +x /etc/rc.local
[root@localhost ~]#
七、部署grafana可视化工具
①、官网下载地址:https://grafana.com/grafana/download
②、使用以下命令进行下载安装
sudo yum install -y https://dl.grafana.com/oss/release/grafana-11.3.2-1.x86_64.rpm
安装完成之后使用命令:systemctl start grafana-server 进行启动grafana
在浏览器中访问grafana:http://ip:3000(默认账号密码为admin)
跳过修改密码后进入到主页按照下图顺序添加数据源
点击左下角的save & Test按钮,如果提示success,就代表配置成功,然后点击Back返回。
③、导入监控模版,直接导入ID为8919监控模板即可
也可以自行去以下地址下载模板后导入:https://grafana.com/grafana/dashboards/
④、正常显示监控画面