prometheus 配置

大番茄 2019年12月06日 2,545次浏览

官方网站配置:
https://prometheus.io/docs/prometheus/latest/configuration/configuration


重载配置:

  1. kill -1 PID # 重载配置。
  2. 在添加了--web.enable-lifecycle启动参数以后,访问 /-/reload 接口。

https://prometheus.io/docs/prometheus/latest/management_api/

# 使用PUT, POST 访问reload接口重载配置。
[root@k8s-op prometheus]# curl -X PUT http://127.0.0.1:9090/-/reload -i
HTTP/1.1 200 OK
Date: Fri, 06 Dec 2019 23:22:59 GMT
Content-Length: 0

# 使用 GET 方法访问healthy接口, 查看健康状态。
[root@k8s-op prometheus]# curl -X GET http://127.0.0.1:9090/-/healthy -i
HTTP/1.1 200 OK
Date: Fri, 06 Dec 2019 23:23:18 GMT
Content-Length: 23
Content-Type: text/plain; charset=utf-8

Prometheus is Healthy.

# PUT, POST 访问quit接口, 正常关闭prometheus, 如果没有关闭,可能是systemd又给起来了。
[root@k8s-op prometheus]# curl -X PUT http://127.0.0.1:9090/-/quit -i
HTTP/1.1 200 OK
Date: Fri, 06 Dec 2019 23:25:23 GMT
Content-Length: 34
Content-Type: text/plain; charset=utf-8

Requesting termination... Goodbye!

重载以及退出都需要--web.enable-lifecycle参数。


官方配置中一些占位符的描述:

Generic placeholders are defined as follows:  
  
* `<boolean>`: a boolean that can take the values `true` or `false`  
* `<duration>`: a duration matching the regular expression `[0-9]+(ms|[smhdwy])`  
* `<labelname>`: a string matching the regular expression `[a-zA-Z_][a-zA-Z0-9_]*`  
* `<labelvalue>`: a string of unicode characters  
* `<filename>`: a valid path in the current working directory  
* `<host>`: a valid string consisting of a hostname or IP followed by an optional port number  
* `<path>`: a valid URL path  
* `<scheme>`: a string that can take the values `http` or `https`  
* `<string>`: a regular string  
* `<secret>`: a regular string that is a secret, such as a password  
* `<tmpl_string>`: a string which is template-expanded before usage  

一、全局

global:
  # How frequently to scrape targets by default.
  [ scrape_interval: <duration> | default = 1m ]

  # How long until a scrape request times out.
  [ scrape_timeout: <duration> | default = 10s ]

  # How frequently to evaluate rules.
  [ evaluation_interval: <duration> | default = 1m ]

  # The labels to add to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    [ <labelname>: <labelvalue> ... ]

  # File to which PromQL queries are logged.
  # Reloading the configuration will reopen the file.
  [ query_log_file: <string> ]

# Rule files specifies a list of globs. Rules and alerts are read from
# all matching files.
rule_files:
  [ - <filepath_glob> ... ]

# A list of scrape configurations.
scrape_configs:
  [ - <scrape_config> ... ]

# Alerting specifies settings related to the Alertmanager.
alerting:
  alert_relabel_configs:
    [ - <relabel_config> ... ]
  alertmanagers:
    [ - <alertmanager_config> ... ]

# Settings related to the remote write feature.
remote_write:
  [ - <remote_write> ... ]

# Settings related to the remote read feature.
remote_read:
  [ - <remote_read> ... ]

整体分位六部分:

1、global, 全局配置

scrape_interval: 从targets获取数据的间隔,这里配置的是默认值。
scrape_timeout: 获取数据的超时时间
evaluation_interval: alert告警规则以及record规则的计算周期, 计算是否到达阈值。
external_labels: 发送给外部系统,添加的标签。比如: 发送给alertmanager的报警信息中包含,alertmanager发出来的报警也就包含这个标签。
query_log_file: 记录每次查询的日志,包含客户端ip、http方法以及查询的参数...

{"httpRequest":{"clientIP":"192.168.1.6","method":"GET","path":"/api/v1/query"},"params":{"end":"2021-07-25T02:35:33.590Z","query":"go_memstats_heap_inuse_bytes{job=\"prometheus\"}","start":"2021-07-25T02:35:33.590Z","step":0},"stats":{"timings":{"evalTotalTime":0.000082209,"resultSortTime":0,"queryPreparationTime":0.00004556,"innerEvalTime":0.000031775,"execQueueTime":0.000026771,"execTotalTime":0.000114635}},"ts":"2021-07-25T02:35:33.941Z"}
{"httpRequest":{"clientIP":"::1","method":"POST","path":"/api/v1/query"},"params":{"end":"2021-07-25T02:36:34.868Z","query":"go_memstats_heap_inuse_bytes","start":"2021-07-25T02:36:34.868Z","step":0},"stats":{"timings":{"evalTotalTime":0.000097273,"resultSortTime":0,"queryPreparationTime":0.000069433,"innerEvalTime":0.000023828,"execQueueTime":0.000006443,"execTotalTime":0.000109044}},"ts":"2021-07-25T02:36:34.869Z"}

文档里写的reopen没有别的意思,不会删除文件内容,就是单纯的reopen。

2、rule_files, 指定报警规则的文件

https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/

3、scrape_configs, 指定监控目标targets的配置。

4、alerting

alert_relabel_configs: 对发给alertmanager的metrics过滤,如: 删除metrics里没用的label。
alertmanagers: 指定alertmanager的地址。

5、remote_write

指定url,写入远程服务器。还没有测试过。

6、remote_read

指定url, 从远程服务器读, 还没有测试过。


二、报警相关

https://www.yxingxing.net/articles/2019/12/18/1576663983469.html

rule_files: 是用来指定报警规则文件的, 规则文件与alertmanager的配置在网址里。

rule_files:
   - "rules/test.yml"

alerting: 指定alertmanager的地址, 还可以对发给alertmanager的数据做过滤。

alerting:
  alert_relabel_configs:
  - regex: ^(id|image|pod|job)$
    action: labeldrop
  alertmanagers:
  - static_configs:
    - targets:
      - 127.0.0.1:9093

alert_relabel_configs的配置与relabel_configs一样,这里是把发给alertmanager的报警数据中符合正则的label删除。
也可以自动发现alertmanager, 这部分跟scrape_configs差不多。


二、scrape_configs 配置监控对象(target)以及获取监控数据

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config

scrape_configs:
  [ - <scrape_config> ... ]

上面的 - 表示是列表项(元组项),每个<scrape_config> 配置段是一个列表项。
官方的配置项太多,大部分都是对各种系统的自动发现配置,自动发现这里只介绍一下文件以及k8s的,其他配置都会解释一遍。

先来一个简单的配置:

scrape_configs:
  - job_name: 'mariadb-01'
    static_configs:
    - targets: ['172.100.102.100:9104']
      labels:
        alias: mariadb-01

  - job_name: 'mariadb-02'
    static_configs:
    - targets: ['172.100.102.102:9104']
      labels:
        alias: mariadb-02

1、常规配置

job_name:

给获取监控对象的配置起个名字。每段配置称为一个job。

# The job name assigned to scraped metrics by default.
job_name: <job_name>

如:
image-fb71cee8
每个都是一个job, 里面的数字就是target(监控对象)。
image-0b7ec16a


2、访问target metrics获取监控数据相关

scrape_interval:

意思跟global里的一样。

scrape_timeout:

意思跟global里的一样。

metrics_path:

# The HTTP resource path on which to fetch metrics from targets.
[ metrics_path: <path> | default = /metrics ]

从targets获取数据的url, 默认是/metrics

scrape_configs:
  - job_name: 'pod'
    scrape_interval: 15s
    scrape_timeout: 10s
    metrics_path: '/metrics/cadvisor'
    static_configs:
    - targets: ['172.100.101.193:10255', '172.100.101.194:10255']

下面这两个honor没有测试过, 英语也差,不确定意思是不是对。

honor_labels:

# honor_labels controls how Prometheus handles conflicts between labels that are
# already present in scraped data and labels that Prometheus would attach
# server-side ("job" and "instance" labels, manually configured target
# labels, and labels generated by service discovery implementations).
#
# If honor_labels is set to "true", label conflicts are resolved by keeping label
# values from the scraped data and ignoring the conflicting server-side labels.
#
# If honor_labels is set to "false", label conflicts are resolved by renaming
# conflicting labels in the scraped data to "exported_<original-label>" (for
# example "exported_instance", "exported_job") and then attaching server-side
# labels.
#
# Setting honor_labels to "true" is useful for use cases such as federation and
# scraping the Pushgateway, where all labels specified in the target should be
# preserved.
#
# Note that any globally configured "external_labels" are unaffected by this
# setting. In communication with external systems, they are always applied only
# when a time series does not have a given label yet and are ignored otherwise.
[ honor_labels: <boolean> | default = false ]

标签冲突的情况下怎么办。 如手动添加的标签以及prometheus自动添加的和获取的metrics里自带的冲突。external_labels 添加的标签没有影响。
true : metrics中的标签保留,服务端标签忽略。
false : 修改metrics中的标签名为 exported_<label_name>

honor_timestamps:

# honor_timestamps controls whether Prometheus respects the timestamps present
# in scraped data.
#
# If honor_timestamps is set to "true", the timestamps of the metrics exposed
# by the target will be used.
#
# If honor_timestamps is set to "false", the timestamps of the metrics exposed
# by the target will be ignored.
[ honor_timestamps: <boolean> | default = true ]

是否保留获取metrics数据中的时间戳,true 表示保留, false 表示忽略。

scheme:

# Configures the protocol scheme used for requests.
[ scheme: <scheme> | default = http ]

访问targets的协议, 默认http。

params:

# Optional HTTP URL parameters.
params:
  [ <string>: [<string>, ...] ]

访问targets的url添加的参数, 也没有测试过,一般也用不到。

basic_auth:

# Sets the `Authorization` header on every scrape request with the
# configured username and password.
# password and password_file are mutually exclusive.
basic_auth:
  [ username: <string> ]
  [ password: <secret> ]
  [ password_file: <string> ]

就是在获取数据的时候使用http标准认证。如果目标开启了http认证,就用这个。

bearer_token:

# Sets the `Authorization` header on every scrape request with
# the configured bearer token. It is mutually exclusive with `bearer_token_file`.
[ bearer_token: <secret> ]

在获取数据的时候使用token认证。直接指定秘钥。

bearer_token_file:

# Sets the `Authorization` header on every scrape request with the bearer token
# read from the configured file. It is mutually exclusive with `bearer_token`.
[ bearer_token_file: /path/to/bearer/token/file ]

跟上面的一样,只是这个是指定文件。 bearer_token与这个只能使用一个。

tls_config:

# Configures the scrape request's TLS settings.
tls_config:
  [ <tls_config> ]

使用tls双向认证。指定https的证书。

# CA certificate to validate API server certificate with.
[ ca_file: <filename> ]

# Certificate and key files for client cert authentication to the server.
[ cert_file: <filename> ]
[ key_file: <filename> ]

# ServerName extension to indicate the name of the server.
# https://tools.ietf.org/html/rfc4366#section-3.1
[ server_name: <string> ]

# Disable validation of the server certificate.
[ insecure_skip_verify: <boolean> ]

如:

  - job_name: 'k8s-apiserver'
    scheme: https
    tls_config:
      ca_file: /home/qfpay/prometheus/ssl/cacert.pem
      cert_file: /home/qfpay/prometheus/ssl/prometheus.crt
      key_file: /home/qfpay/prometheus/ssl/prometheus.key

如果没有ca_file就要使用insecure_skip_verify跳过prometheus对targets的认证。

proxy_url:

设置代理,通过代理访问targets。

metric_relabel_configs:

配置方式与relabel_configs一样。
现在获取到的metric:
image-f0909a4f

添加metric_relabel_config:

 relabel_configs:  
 - source_labels: [ __address__ ]  
   target_label: __address__  
   regex: (.*):10250  
   replacement: $1:10255  
   action: replace  
 metric_relabel_configs:  
 - regex: ^(id|image|pod|job)$  
   action: labeldrop  

image-baf36949

重载配置可能会反应慢,会长时间处于两种结果都有的状态, 重启一下。

sample_limit:

target metrics接口中显示的样本(监控项)的数量限制。默认是 0, 表示不限制。
如果获取数据时发现样本超出限制, 数据获取失败。 这个数量不包括#开头的。

image-a21af315

image-ad036f42


3、操作target相关,自动发现,relabel过滤。

relabel_configs:

# List of target relabel configurations.
relabel_configs:
  [ - <relabel_config> ... ]

重写label,注意:这里重写或添加的label不是metrics里的label,而是target的label,一般就是在自动发现target的时候用来过滤target用的,有时候也是为了添加新的label,因为添加到target的label,在prometheus查询的时候会添加到这个target metrics所产生的所有指标中。如果与metrics里的指标冲突,还没有测试过会怎么样。
下面还有一个metric_relabel_configs是专门用来处理metrics里的label的,用来操作metrics里指定的一些label。

下面的图片是target的label。

image-f85d4cb3

image-98c62091

每个job可以配置多个relabel_config,按照在配置文件中出现的顺序依次执行。

画框部分为relabel_config内置的标签。
address 是target的主机与端口, 如果想要修改地址,修改 __address__就可以。
instance 一般跟__address__是一样的,但在自动发现的时候有时候会是主机名。
scheme 就是配置里的 scheme。
metrics_path 就是配置里的 metrics。

# The source labels select values from existing labels. Their content is concatenated
# using the configured separator and matched against the configured regular expression
# for the replace, keep, and drop actions.
[ source_labels: '[' <labelname> [, ...] ']' ]

# Separator placed between concatenated source label values.
[ separator: <string> | default = ; ]

# Label to which the resulting value is written in a replace action.
# It is mandatory for replace actions. Regex capture groups are available.
[ target_label: <labelname> ]

# Regular expression against which the extracted value is matched.
[ regex: <regex> | default = (.*) ]

# Modulus to take of the hash of the source label values.
[ modulus: <uint64> ]

# Replacement value against which a regex replace is performed if the
# regular expression matches. Regex capture groups are available.
[ replacement: <string> | default = $1 ]

# Action to perform based on regex matching.
[ action: <relabel_action> | default = replace ]

先了解一下action,通过action的不同实现不同的功能。

action:

重写target标签,根据源标签的值新增target标签,根据指定的target标签和标签的值保留target或删除target, 删除target标签,保留指定的target标签。

  • replace: Match regex against the concatenated source_labels. Then, set target_label to replacement, with match group references (${1}, ${2}, ...) in replacement substituted by their value. If regex does not match, no replacement takes place.
  • keep: Drop targets for which regex does not match the concatenated source_labels.
  • drop: Drop targets for which regex matches the concatenated source_labels.
  • hashmod: Set target_label to the modulus of a hash of the concatenated source_labels.
  • labelmap: Match regex against all label names. Then copy the values of the matching labels to label names given by replacement with match group references (${1}, ${2}, ...) in replacement substituted by their value.
  • labeldrop: Match regex against all label names. Any label that matches will be removed from the set of labels.
  • labelkeep: Match regex against all label names. Any label that does not match will be removed from the
replace:

重写label。 叫重写有点不适合。 因为它只是把 指定label正则匹配出来的值,放入target_labe中。 如果target_label是一个新的label, 那么源label不变。生成了一个新的label。如果是源label,则覆盖。
对指定的源label的值做正则匹配,把匹配到的值放入target_labe指定的label中。如果匹配不到,则什么也不做。

  • 匹配label的值
keep:

对指定的源label的值做正则匹配。 只保留正则匹配成功的target。

  • 匹配label的值
drop:

对指定的源label的值做正则匹配。 只丢弃正则匹配成功的target。

  • 匹配label的值
labeldrop:

正则匹配所有label的名称。删除匹配成功的label。

  • 匹配label的名称
labelkeep:

正则匹配所有label的名称。只保留匹配成功的label。

  • 匹配label的名称

剩下的两个也没有关注过。


source_labels:

选取的源标签,可以有多个。如:

    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_container_name,__meta_kubernetes_pod_container_port_number]

separator:

两个标签值的分隔符。
指定多个标签的话,两个标签的值就会连在一起,使用正则匹配的时候不容易匹配,所有就有了分隔符。

target_label

目的标签,只有action: replace的情况下有用。 把正则匹配到的值放到target_label指定的label中。

regex:

正则匹配,写正则表达式。

replacement:

只有action: replace的情况下有用, 指定value给target_label。 可以使用regex中的分组。
如:下面的修改端口。

    relabel_configs:
    - source_labels: [ __address__ ]
      target_label: __address__
      regex: (.*):10250
      replacement: $1:9101
      action: replace

例子:

1、修改端口
    relabel_configs:
    - source_labels: [ __address__ ]
      target_label: __address__
      regex: (.*):10250
      replacement: $1:9101
      action: replace

image-ae221a19

可以看到访问的target端口变了。而黑框里的是relabel之前的值。

2、修改地址以及添加标签
    relabel_configs:
    - source_labels: [__address__]
      target_label: __address__
      regex: .*
      replacement: 127.0.0.1:9090
      action: replace
    - source_labels: [__address__, __metrics_path__]
      target_label: test
      separator: '-'
      regex: (.*)
      replacement: $1
      action: replace

image-fde832bd
原来是localhost, 改为了127.0.0.1, 获取__address__与__metrics_path__的值,添加了test标签。
也可以看到新的__address__成了127.0.0.1。

两个relabel从上到下顺序执行,如果后面的放到上面,test的值就成了localhost:9090-/metrics

3、只保留符合的target。
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_container_name,__meta_kubernetes_pod_container_port_number]
      separator: ;
      regex: coredns;9153
      action: keep

在自动发现的target很多的情况下, 指定符合规则的。

4、删除标签
    relabel_configs:
    - regex: ^.*(annotation|label|uid|job).*$
      action: labeldrop

image-170d53a8
web页面只显示了target label, 不显示自动发现的label, 所以这里也只能看到job没了, 其他的也看不出来。


static_configs:

静态配置target。

# List of labeled statically configured targets for this job.
static_configs:
  [ - <static_config> ... ]

<static_config>配置段:

# The targets specified by the static config.
targets:
  [ - '<host>' ]

# Labels assigned to all metrics scraped from the targets.
labels:
  [ <labelname>: <labelvalue> ... ]

targets指定目标, labels可以给目标添加label。
例子:

  - job_name: 'mariadb-01'
    static_configs:
    - targets: ['172.100.102.100:9104']
      labels:
        alias: mariadb-01

  - job_name: 'mariadb-02'
    static_configs:
    - targets: ['172.100.102.102:9104']
      labels:
        alias: mariadb-02

image-a819bbf3

file_sd_configs:

可以通过文件自动发现target, 但是文件里还是要指定target,其实就是添加target以后不用重载配置了。

# List of file service discovery configurations.
file_sd_configs:
  [ - <file_sd_config> ... ]

<file_sd_config>配置段:

# Patterns for files from which target groups are extracted.
files:
  [ - <filename_pattern> ... ]

# Refresh interval to re-read the files.
[ refresh_interval: <duration> | default = 5m ]

<filename_pattern>指定的文件可以是yml文件,也可以是json文件。
如果是json, 就是这种格式:

[
  {
    "targets": [ "<host>", ... ],
    "labels": {
      "<labelname>": "<labelvalue>", ...
    }
  },
  ...
]

如果是yml文件,就是跟static_configs一样的配置。
例子:

  - job_name: 'test'
    file_sd_configs:
    - files:
      - discovery/*node.yml
      refresh_interval: 1m

files下面指定的是文件路径,文件名可以使用通配符。

[root@k8s-op prometheus]# cat discovery/k8s-node.yml 
- targets:
  - '172.100.101.190:9101'
  - '172.100.101.192:9101'
  - '172.100.101.193:9101'
  - '172.100.101.194:9101'
  - '172.100.102.140:9101'
  - '172.100.102.141:9101'
  - '172.100.101.196:9101'
  - '172.100.101.197:9101'
  - '172.100.101.198:9101'
  labels: 
    role: k8s

image-b9d9f901

kubernetes_sd_configs:

自动发现kubernetes环境的target。
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#kubernetes_sd_config

官网给的一些例子:
https://github.com/prometheus/prometheus/blob/release-2.14/documentation/examples/prometheus-kubernetes.yml

# List of Kubernetes service discovery configurations.
kubernetes_sd_configs:
  [ - <kubernetes_sd_config> ... ]

<kubernetes_sd_config>配置段:

# The information to access the Kubernetes API.

# The API server addresses. If left empty, Prometheus is assumed to run inside
# of the cluster and will discover API servers automatically and use the pod's
# CA certificate and bearer token file at /var/run/secrets/kubernetes.io/serviceaccount/.
[ api_server: <host> ]

# The Kubernetes role of entities that should be discovered.
role: <role>

# Optional authentication information used to authenticate to the API server.
# Note that `basic_auth`, `bearer_token` and `bearer_token_file` options are
# mutually exclusive.
# password and password_file are mutually exclusive.

# Optional HTTP basic authentication information.
basic_auth:
  [ username: <string> ]
  [ password: <secret> ]
  [ password_file: <string> ]

# Optional bearer token authentication information.
[ bearer_token: <secret> ]

# Optional bearer token file authentication information.
[ bearer_token_file: <filename> ]

# Optional proxy URL.
[ proxy_url: <string> ]

# TLS configuration.
tls_config:
  [ <tls_config> ]

# Optional namespace discovery. If omitted, all namespaces are used.
namespaces:
  names:
    [ - <string> ]

api_server : 就是能连接到k8s apiserver的地址。

rule

表示要发现的内容,包含 node, pod, service, endpoints , ingress
每个类型在发现的时候都会对target生成不同的label(discover label), 因为标签都是__开头的,所以叫做元标签。在下面粘一下有哪些标签。


页面翻译的意思有点诡异,大家自己翻译吧。

node

The node role discovers one target per cluster node with the address defaulting to the Kubelet's HTTP port. The target address defaults to the first existing address of the Kubernetes node object in the address type order of NodeInternalIP, NodeExternalIP, NodeLegacyHostIP, and NodeHostName.

Available meta labels:

  • __meta_kubernetes_node_name: The name of the node object.
  • __meta_kubernetes_node_label_<labelname>: Each label from the node object.
  • __meta_kubernetes_node_labelpresent_<labelname>: true for each label from the node object.
  • __meta_kubernetes_node_annotation_<annotationname>: Each annotation from the node object.
  • __meta_kubernetes_node_annotationpresent_<annotationname>: true for each annotation from the node object.
  • __meta_kubernetes_node_address_<address_type>: The first address for each node address type, if it exists.

In addition, the instance label for the node will be set to the node name as retrieved from the API server.


service

The service role discovers a target for each service port for each service. This is generally useful for blackbox monitoring of a service. The address will be set to the Kubernetes DNS name of the service and respective service port.

Available meta labels:

  • __meta_kubernetes_namespace: The namespace of the service object.
  • __meta_kubernetes_service_annotation_<annotationname>: Each annotation from the service object.
  • __meta_kubernetes_service_annotationpresent_<annotationname>: "true" for each annotation of the service object.
  • __meta_kubernetes_service_cluster_ip: The cluster IP address of the service. (Does not apply to services of type ExternalName)
  • __meta_kubernetes_service_external_name: The DNS name of the service. (Applies to services of type ExternalName)
  • __meta_kubernetes_service_label_<labelname>: Each label from the service object.
  • __meta_kubernetes_service_labelpresent_<labelname>: true for each label of the service object.
  • __meta_kubernetes_service_name: The name of the service object.
  • __meta_kubernetes_service_port_name: Name of the service port for the target.
  • __meta_kubernetes_service_port_protocol: Protocol of the service port for the target.

pod

The pod role discovers all pods and exposes their containers as targets. For each declared port of a container, a single target is generated. If a container has no specified ports, a port-free target per container is created for manually adding a port via relabeling.

Available meta labels:

  • __meta_kubernetes_namespace: The namespace of the pod object.
  • __meta_kubernetes_pod_name: The name of the pod object.
  • __meta_kubernetes_pod_ip: The pod IP of the pod object.
  • __meta_kubernetes_pod_label_<labelname>: Each label from the pod object.
  • __meta_kubernetes_pod_labelpresent_<labelname>: truefor each label from the pod object.
  • __meta_kubernetes_pod_annotation_<annotationname>: Each annotation from the pod object.
  • __meta_kubernetes_pod_annotationpresent_<annotationname>: true for each annotation from the pod object.
  • __meta_kubernetes_pod_container_init: true if the container is an InitContainer
  • __meta_kubernetes_pod_container_name: Name of the container the target address points to.
  • __meta_kubernetes_pod_container_port_name: Name of the container port.
  • __meta_kubernetes_pod_container_port_number: Number of the container port.
  • __meta_kubernetes_pod_container_port_protocol: Protocol of the container port.
  • __meta_kubernetes_pod_ready: Set to true or false for the pod's ready state.
  • __meta_kubernetes_pod_phase: Set to Pending, Running, Succeeded, Failed or Unknown in the lifecycle.
  • __meta_kubernetes_pod_node_name: The name of the node the pod is scheduled onto.
  • __meta_kubernetes_pod_host_ip: The current host IP of the pod object.
  • __meta_kubernetes_pod_uid: The UID of the pod object.
  • __meta_kubernetes_pod_controller_kind: Object kind of the pod controller.
  • __meta_kubernetes_pod_controller_name: Name of the pod controller.

endpoints

The endpoints role discovers targets from listed endpoints of a service. For each endpoint address one target is discovered per port. If the endpoint is backed by a pod, all additional container ports of the pod, not bound to an endpoint port, are discovered as targets as well.

Available meta labels:

  • __meta_kubernetes_namespace: The namespace of the endpoints object.
  • __meta_kubernetes_endpoints_name: The names of the endpoints object.
  • For all targets discovered directly from the endpoints list (those not additionally inferred from underlying pods), the following labels are attached:
    • __meta_kubernetes_endpoint_hostname: Hostname of the endpoint.
    • __meta_kubernetes_endpoint_node_name: Name of the node hosting the endpoint.
    • __meta_kubernetes_endpoint_ready: Set to true or false for the endpoint's ready state.
    • __meta_kubernetes_endpoint_port_name: Name of the endpoint port.
    • __meta_kubernetes_endpoint_port_protocol: Protocol of the endpoint port.
    • __meta_kubernetes_endpoint_address_target_kind: Kind of the endpoint address target.
    • __meta_kubernetes_endpoint_address_target_name: Name of the endpoint address target.
  • If the endpoints belong to a service, all labels of the role: service discovery are attached.
  • For all targets backed by a pod, all labels of the role: pod discovery are attached.

ingress

The ingress role discovers a target for each path of each ingress. This is generally useful for blackbox monitoring of an ingress. The address will be set to the host specified in the ingress spec.

Available meta labels:

  • __meta_kubernetes_namespace: The namespace of the ingress object.
  • __meta_kubernetes_ingress_name: The name of the ingress object.
  • __meta_kubernetes_ingress_label_<labelname>: Each label from the ingress object.
  • __meta_kubernetes_ingress_labelpresent_<labelname>: true for each label from the ingress object.
  • __meta_kubernetes_ingress_annotation_<annotationname>: Each annotation from the ingress object.
  • __meta_kubernetes_ingress_annotationpresent_<annotationname>: true for each annotation from the ingress object.
  • __meta_kubernetes_ingress_scheme: Protocol scheme of ingress, https if TLS config is set. Defaults to http.
  • __meta_kubernetes_ingress_path: Path from ingress spec. Defaults to /.

basic_auth:

http basic认证,与上面的basic_auth一个意思, 只是上面使用来向target获取数去时候的认证, 这里是发现target时候的认证。

bearer_token:

都一个意思, token认证, 用于发现时候的认证。

bearer_token_file:

跟bearer_token一个意思。只不过是指定文件。
如果prometheus跑在k8s集群里,就是指定serviceaccount的token文件。

tls_config:

双向认证,用于发现target时候认证。

proxy_url

代理, 没有测过, 可能就跟一般的代理一样把, 把连接apiserver的请求发给代理。

namespaces:

只发现kubernetes的某个namespace的target。 不设置,就是发现所有namespace的。


这部分prometheus的内容其实不多, 主要是K8S内容, 有兴趣可以看看这个:
https://www.yxingxing.net/articles/2019/12/03/1575356398124.html

介绍一下那几个rule都发现的是什么。
node: 发现的就是k8s的节点, 通过relabel_config修改端口,以及过滤标签,实现监控k8s集群的各节点性能, 单独监控某些节点, 监控apiserver, 监控kubelet以及pod资源消耗。

service: 发现的就是各个service的域名,如果运行的业务容器支持metrics接口, 就可以实现在service层面监控各个业务的服务了。发现的域名不是完整的,ServiceName.NameSpace.svc , 如果prometheus运行在外部,需要用relabel_config修改一下,如添加上cluster.local。

pod: 跟service一样,只不过发现的是各个pod的ip+port, 如果pod中的容器支持,可以直接对pod监控。 还有,可以实现对k8s里的dns服务监控, coredns有metrics接口。

endpoints: 这个看起来发现的跟pod一样

ingress: 发现的就是在ingress里设置的域名。


通过修改端口实现监控。发现的target端口是10250, 要修改成10255.

  - job_name: 'k8s-node-kubelet'
    kubernetes_sd_configs:
    - role: node
      api_server: https://172.100.101.195:8443
      tls_config:
        ca_file: /home/qfpay/prometheus/ssl/cacert.pem
        cert_file: /home/qfpay/prometheus/ssl/prometheus.crt
        key_file: /home/qfpay/prometheus/ssl/prometheus.key
    relabel_configs:
    - source_labels: [ __address__ ]
      target_label: __address__
      regex: (.*):10250
      replacement: $1:10255
      action: replace

主机部署了node_export, 端口是9101

  - job_name: 'k8s-node-host'
    kubernetes_sd_configs:
    - role: node
      api_server: https://172.100.101.195:8443
      tls_config:
        ca_file: /home/qfpay/prometheus/ssl/cacert.pem
        cert_file: /home/qfpay/prometheus/ssl/prometheus.crt
        key_file: /home/qfpay/prometheus/ssl/prometheus.key
    relabel_configs:
    - source_labels: [ __address__ ]
      target_label: __address__
      regex: (.*):10250
      replacement: $1:9101
      action: replace

来个整体的配置作为结尾吧。

配置文件

[root@k8s-op prometheus]# cat prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 1m # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
#  alert_relabel_configs:
#  - regex: ^(id|image|pod|job)$
#    action: labeldrop
  alertmanagers:
  - static_configs:
    - targets:
      - 127.0.0.1:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
   - "rules/test.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # ------------kubernetes-------------------------------------------
  - job_name: 'k8s-node-kubelet'
    kubernetes_sd_configs:
    - role: node
      api_server: https://172.100.101.195:8443
      tls_config:
        ca_file: /home/qfpay/prometheus/ssl/cacert.pem
        cert_file: /home/qfpay/prometheus/ssl/prometheus.crt
        key_file: /home/qfpay/prometheus/ssl/prometheus.key
    relabel_configs:
    - source_labels: [ __address__ ]
      target_label: __address__
      regex: (.*):10250
      replacement: $1:10255
      action: replace

  - job_name: 'k8s-kubelet-pod'
    metrics_path: /metrics/cadvisor
    kubernetes_sd_configs:
    - role: node
      api_server: https://172.100.101.195:8443
      tls_config:
        ca_file: /home/qfpay/prometheus/ssl/cacert.pem
        cert_file: /home/qfpay/prometheus/ssl/prometheus.crt
        key_file: /home/qfpay/prometheus/ssl/prometheus.key
    relabel_configs:
    - source_labels: [ __address__ ]
      target_label: __address__
      regex: (.*):10250
      replacement: $1:10255
      action: replace

  - job_name: 'k8s-proxy'
    metrics_path: /metrics
    kubernetes_sd_configs:
    - role: node
      api_server: https://172.100.101.195:8443
      tls_config:
        ca_file: /home/qfpay/prometheus/ssl/cacert.pem
        cert_file: /home/qfpay/prometheus/ssl/prometheus.crt
        key_file: /home/qfpay/prometheus/ssl/prometheus.key
    relabel_configs:
    - source_labels: [ __address__ ]
      target_label: __address__
      regex: (.*):10250
      replacement: $1:10249
      action: replace

  - job_name: 'k8s-node-host'
    kubernetes_sd_configs:
    - role: node
      api_server: https://172.100.101.195:8443
      tls_config:
        ca_file: /home/qfpay/prometheus/ssl/cacert.pem
        cert_file: /home/qfpay/prometheus/ssl/prometheus.crt
        key_file: /home/qfpay/prometheus/ssl/prometheus.key
    relabel_configs:
    - source_labels: [ __address__ ]
      target_label: __address__
      regex: (.*):10250
      replacement: $1:9101
      action: replace

  - job_name: 'k8s-resource-info'
    static_configs:
    - targets: ['10.0.243.185:8080']

  - job_name: 'k8s-apiserver'
    scheme: https
    tls_config:
      ca_file: /home/qfpay/prometheus/ssl/cacert.pem
      cert_file: /home/qfpay/prometheus/ssl/prometheus.crt
      key_file: /home/qfpay/prometheus/ssl/prometheus.key
    kubernetes_sd_configs:
    - role: node
      api_server: https://172.100.101.195:8443
      tls_config:
        ca_file: /home/qfpay/prometheus/ssl/cacert.pem
        cert_file: /home/qfpay/prometheus/ssl/prometheus.crt
        key_file: /home/qfpay/prometheus/ssl/prometheus.key
    relabel_configs:
    - source_labels: [__meta_kubernetes_node_label_kubernetes_io_k8s_apiserver]
      regex: ^true$
      action: keep
    - source_labels: [ __address__ ]
      target_label: __address__
      regex: (.*):10250
      replacement: $1:6443
      action: replace

  - job_name: 'CoreDNS'
    kubernetes_sd_configs:
    - role: pod
      api_server: https://172.100.101.195:8443
      tls_config:
        ca_file: /home/qfpay/prometheus/ssl/cacert.pem
        cert_file: /home/qfpay/prometheus/ssl/prometheus.crt
        key_file: /home/qfpay/prometheus/ssl/prometheus.key
      namespaces:
        names:
        - kube-system
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_container_name,__meta_kubernetes_pod_container_port_number]
      separator: ;
      regex: coredns;9153
      action: keep

#  - job_name: 'k8s-service'
#    kubernetes_sd_configs:
#    - role: service
#      api_server: https://172.100.101.195:8443
#      tls_config:
#        ca_file: /home/qfpay/prometheus/ssl/cacert.pem
#        cert_file: /home/qfpay/prometheus/ssl/prometheus.crt
#        key_file: /home/qfpay/prometheus/ssl/prometheus.key
#      namespaces:
#        names:
#        - kube-system
#    relabel_configs:
#    - source_labels: [__meta_kubernetes_service_name]
#      separator: ;
#      regex: ^kube-dns|kube-state-metrics$
#      action: drop
#    - source_labels: [ __address__ ]
#      target_label: __address__
#      regex: ^(.*):([0-9]+)$
#      replacement: $1.cluster.local:$2
#      action: replace



  # ------------end----------------------------------------------


  # -----------------------------------------------------
  #
  - job_name: 'prometheus'
    scrape_interval: 15s
    scrape_timeout: 10s
    metrics_path: /metrics
    static_configs:
    - targets: ['localhost:9090']



今天发现一个写的不错的网址, 记录一下:
https://yunlzheng.gitbook.io/prometheus-book/