Kafka 모니터링
Kafka 모니터링 with Prometheus, Grafana
Kafka 정보를 수집하기 위한 모듈은 jmx_exporter, kafka_exporter, kminion, Burrow 등이 있다.
-
jmx_exporter: kafka는 jvm 기반으로 동작하기 때문에 jvm 상태를 모니터링할 필요가 있다.
- kafka_exporter와 kminion: kafka metric 정보를 수집하기 위한 라이브러리이며, kafka_exporter와 kminion 중 하나만 설치하면 된다.
- Burrow: kafka cosumer lag을 모니터링하기 위한 라이브러리
jmx_exporter
https://github.com/prometheus/jmx_exporter
# cd /usr/local/kafka
# wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.19.0/jmx_prometheus_javaagent-0.19.0.jar
# wget https://grafana.com/docs/grafana-cloud/monitor-infrastructure/integrations/integration-reference/integration-kafka/kafka_broker.yml
환경 변수 추가
KAFKA_OPTS="-javaagent:/usr/local/kafka/jmx_prometheus_javaagent-0.19.0.jar=7071:/usr/local/kafka/kafka_broker.yml"
systemd로 kafka를 등록한 경우 service 파일에 아래와 같이 추가
[Service]
...
SyslogIdentifier = kafka-server
WorkingDirectory = /usr/local/kafka
Environment="KAFKA_HEAP_OPTS=-Xmx2G -Xms2G"
Environment="KAFKA_OPTS=-javaagent:/usr/local/kafka/jmx_prometheus_javaagent-0.19.0.jar=7071:/usr/local/kafka/kafka_broker.yml"
Environment="KAFKA_JMX_OPTS=-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=kafka1 -Djava.net.preferIPv4Stack=true"
...
카프카 재시작
systemctl daemon-reload
systemctl restart kafka-server
kafka metric 확인
curl http://localhost:7071/metrics
...
# TYPE jvm_memory_pool_allocated_bytes_created gauge
jvm_memory_pool_allocated_bytes_created{pool="CodeHeap 'profiled nmethods'",} 1.692682361551E9
jvm_memory_pool_allocated_bytes_created{pool="G1 Old Gen",} 1.692682361555E9
jvm_memory_pool_allocated_bytes_created{pool="G1 Eden Space",} 1.692682361553E9
jvm_memory_pool_allocated_bytes_created{pool="CodeHeap 'non-profiled nmethods'",} 1.692682361553E9
jvm_memory_pool_allocated_bytes_created{pool="G1 Survivor Space",} 1.692682361553E9
jvm_memory_pool_allocated_bytes_created{pool="Compressed Class Space",} 1.692682361553E9
jvm_memory_pool_allocated_bytes_created{pool="Metaspace",} 1.692682361553E9
jvm_memory_pool_allocated_bytes_created{pool="CodeHeap 'non-nmethods'",} 1.692682361553E9
실제 kafka heap size 확인해보기
# jhsdb jmap --pid 72233 --heap
JVM version is 11.0.11+9
using thread-local object allocation.
Garbage-First (G1) GC with 4 thread(s)
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 1073741824 (1024.0MB)
NewSize = 1363144 (1.2999954223632812MB)
MaxNewSize = 643825664 (614.0MB)
OldSize = 5452592 (5.1999969482421875MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 1048576 (1.0MB)
Heap Usage:
G1 Heap:
regions = 1024
capacity = 1073741824 (1024.0MB)
used = 605777560 (577.7145004272461MB)
free = 467964264 (446.2854995727539MB)
56.41743168234825% used
G1 Young Generation:
Eden Space:
regions = 305
capacity = 668991488 (638.0MB)
used = 319815680 (305.0MB)
free = 349175808 (333.0MB)
47.80564263322884% used
Survivor Space:
regions = 7
capacity = 7340032 (7.0MB)
used = 7340032 (7.0MB)
free = 0 (0.0MB)
100.0% used
G1 Old Generation:
regions = 267
capacity = 397410304 (379.0MB)
used = 278621848 (265.7145004272461MB)
free = 118788456 (113.2854995727539MB)
70.10936686734725% used
방화벽 추가
# sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address=192.168.10.0/24 port port="7071" protocol="tcp" accept'
# sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address=192.168.10.0/24 port port="9100" protocol="tcp" accept'
## 방화벽 재시작
# firewall-cmd --reload
success
# firewall-cmd --list-all
prometheus config 추가
- job_name: 'kafka'
static_configs:
- targets: ['192.168.10.174:7071']
prometheus 재시작
systemctl restart prometheus
grafana dashboard import
https://grafana.com/grafana/dashboards/721-kafka/
kafka_exporter
https://github.com/danielqsj/kafka_exporter 에서 릴리즈 버전을 다운로드 (Docker 버전도 지원 함)
# cd /usr/local
# wget https://github.com/danielqsj/kafka_exporter/releases/download/v1.7.0/kafka_exporter-1.7.0.linux-amd64.tar.gz
# tar -xvf kafka_exporter-1.7.0.linux-amd64.tar.gz
# cd /usr/local/kafka_exporter-1.7.0.linux-amd64
# ls
LICENSE kafka_exporter
config 파일 작성
# vim /usr/local/kafka_exporter-1.7.0.linux-amd64/kafka_exporter_config
OPTIONS="--kafka.server=kafka1:9092"
systemd 작성
vim /etc/systemd/system/kafka_exporter.service
[Unit]
Description=Kafka Exporter
After=syslog.target network.target
[Service]
Type=simple
User=root
Group=root
EnvironmentFile=/usr/local/kafka_exporter-1.7.0.linux-amd64/kafka_exporter_config
ExecStart=/usr/local/kafka_exporter-1.7.0.linux-amd64/kafka_exporter $OPTIONS
Restart=always
[Install]
WantedBy=multi-user.target
# systemctl enable kafka_exporter
# systemctl daemon-reload
# systemctl start kafka_exporter
# systemctl status kafka_exporter
● kafka_exporter.service - Kafka Exporter
Loaded: loaded (/etc/systemd/system/kafka_exporter.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2023-08-18 13:37:41 KST; 4s ago
Main PID: 59373 (kafka_exporter)
Tasks: 6 (limit: 23672)
Memory: 3.1M
CGroup: /system.slice/kafka_exporter.service
└─59373 /usr/local/kafka_exporter-1.7.0.linux-amd64/kafka_exporter --kafka.server=kafka1:9092
kafka export 확인
# curl GET http://localhost:9308/metrics > curl.log
# vim curl.log
...
go_gc_duration_seconds{quantile="0"} 4.6017e-05
go_gc_duration_seconds{quantile="0.25"} 4.6017e-05
go_gc_duration_seconds{quantile="0.5"} 4.9114e-05
go_gc_duration_seconds{quantile="0.75"} 6.0338e-05
go_gc_duration_seconds{quantile="1"} 6.0338e-05
go_gc_duration_seconds_sum 0.000155469
go_gc_duration_seconds_count 3
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 8
...
prometheus가 kafka export 정보를 pulling 하기 위해서 방화벽 설정
## 9308 : Kafka_export 기본 포트
# sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address=192.168.10.0/24 port port="9308" protocol="tcp" accept'
## 방화벽 재시작
# firewall-cmd --reload
success
# firewall-cmd --list-all
kminion(키미니언)
kafka_exporter, kminion 둘 다 괜찮은 라이브러리이지만 kminion이 좀 더 나은 선택인것 같다. 그 이유는 다음과 같다.
- github 릴리즈 주기 및 commiter 인원 및 상태가 더 좋아보임.
- 기타: kafka_exporter는 2017년 부터 시작된 프로젝트이고, kminion는 2019년 부터 시작된 프로젝트이다.
- grafana dashboard에서 kafka dashboard 인기 및 다운로드 수가 kminion이 높다.
설치는 https://github.com/redpanda-data/kminion에서 최신 릴리즈 버전을 다운 받는다. (Docker 버전도 지원 함)
# cd /usr/local
# wget https://github.com/redpanda-data/kminion/releases/download/v2.2.5/kminion_2.2.5_linux_amd64.tar.gz
# mkdir kminion_2.2.5
# tar -xvf kminion_2.2.5_linux_amd64.tar.gz -C ./kminion_2.2.5
# ls kminion_2.2.5
LICENSE README.md kminion
kiminion 설정 파일 다운로드
# cd /usr/local/kminion_2.2.5
# wget https://raw.githubusercontent.com/redpanda-data/kminion/master/docs/reference-config.yaml
# cp reference-config.yaml ./kminion.yml
kminion.yml에는 kminion 실행 포트(8080) 등 다양한 설정정보 가 있는데, 아래와 같이 기본포트 8080을 8585로 변경해서 사용할수 있다.
...
exporter:
# Namespace is the prefix for all exported Prometheus metrics
namespace: "kminion"
# Host that shall be used to bind the HTTP server on
host: ""
# Port that shall be used to bind the HTTP server on
port: 8585
systemd 작성
vim /etc/systemd/system/kminion.service
[Unit]
Description=Kminion Kafka Metric
After=syslog.target network.target
[Service]
Type=simple
User=root
Group=root
Environment=KAFKA_BROKERS=kafka1:9092,kafka2:9092,kafka3:9092
Environment=CONFIG_FILEPATH=/usr/local/kminion_2.2.5/kminion.yml
ExecStart=/usr/local/kminion_2.2.5/kminion
Restart=always
[Install]
WantedBy=multi-user.target
# systemctl enable kminion
# systemctl daemon-reload
# systemctl start kminion
# systemctl status kminion
● kminion.service - Kminion Kafka Metric
Loaded: loaded (/etc/systemd/system/kminion.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2023-08-21 11:15:48 KST; 7s ago
Main PID: 70685 (kminion)
Tasks: 8 (limit: 23672)
Memory: 7.7M
CGroup: /system.slice/kminion.service
└─70685 /usr/local/kminion_2.2.5/kminion
방화벽
# sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address=192.168.10.0/24 port port="8585" protocol="tcp" accept'
## 방화벽 재시작
# firewall-cmd --reload
success
# firewall-cmd --list-all
Prometheus에 Kminion 정보 등록
vim /usr/local/src/prometheus-2.27.1.linux-amd64/prometheus.yml
...
scrape_configs:
- job_name: 'kafka'
static_configs:
- targets: ['192.168.10.174:8585']
...
prometheus 재시작
sudo systemctl restart prometheus
Grafana Dashboard 연동
https://grafana.com/grafana/dashboards/ 에서 kafka 검색 (direct url: https://grafana.com/grafana/dashboards/14012-kminion-cluster/)
Grafana에서 Dashboard Import
Load 버튼 클릭
Import 클릭
Kimon Topic Dashboad
grafana dashboard에서 KMinion Topic Dashboard도 import한다.
Burrow
https://github.com/linkedin/Burrow
https://blog.voidmainvoid.net/244
References
https://github.com/oded-dd/prometheus-jmx-kafka/blob/master/README.md
https://grafana.com/docs/grafana-cloud/monitor-infrastructure/integrations/integration-reference/integration-kafka/
https://sarc.io/index.php/miscellaneous/2251-kafka-prometheus
댓글남기기