Kafka 모니터링

Kafka 모니터링 with Prometheus, Grafana

Kafka 정보를 수집하기 위한 모듈은 jmx_exporter, kafka_exporter, kminion, Burrow 등이 있다.

  • jmx_exporter: kafka는 jvm 기반으로 동작하기 때문에 jvm 상태를 모니터링할 필요가 있다.

  • kafka_exporter와 kminion: kafka metric 정보를 수집하기 위한 라이브러리이며, kafka_exporter와 kminion 중 하나만 설치하면 된다.
  • Burrow: kafka cosumer lag을 모니터링하기 위한 라이브러리

jmx_exporter

https://github.com/prometheus/jmx_exporter

# cd /usr/local/kafka
# wget https://repo1.maven.org/maven2/io/prometheus/jmx/jmx_prometheus_javaagent/0.19.0/jmx_prometheus_javaagent-0.19.0.jar
# wget https://grafana.com/docs/grafana-cloud/monitor-infrastructure/integrations/integration-reference/integration-kafka/kafka_broker.yml

환경 변수 추가

KAFKA_OPTS="-javaagent:/usr/local/kafka/jmx_prometheus_javaagent-0.19.0.jar=7071:/usr/local/kafka/kafka_broker.yml"

systemd로 kafka를 등록한 경우 service 파일에 아래와 같이 추가

[Service]
...
SyslogIdentifier = kafka-server
WorkingDirectory = /usr/local/kafka
Environment="KAFKA_HEAP_OPTS=-Xmx2G -Xms2G"
Environment="KAFKA_OPTS=-javaagent:/usr/local/kafka/jmx_prometheus_javaagent-0.19.0.jar=7071:/usr/local/kafka/kafka_broker.yml"
Environment="KAFKA_JMX_OPTS=-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=kafka1 -Djava.net.preferIPv4Stack=true"

...

카프카 재시작

systemctl daemon-reload

systemctl restart kafka-server

kafka metric 확인

curl http://localhost:7071/metrics

...
# TYPE jvm_memory_pool_allocated_bytes_created gauge
jvm_memory_pool_allocated_bytes_created{pool="CodeHeap 'profiled nmethods'",} 1.692682361551E9
jvm_memory_pool_allocated_bytes_created{pool="G1 Old Gen",} 1.692682361555E9
jvm_memory_pool_allocated_bytes_created{pool="G1 Eden Space",} 1.692682361553E9
jvm_memory_pool_allocated_bytes_created{pool="CodeHeap 'non-profiled nmethods'",} 1.692682361553E9
jvm_memory_pool_allocated_bytes_created{pool="G1 Survivor Space",} 1.692682361553E9
jvm_memory_pool_allocated_bytes_created{pool="Compressed Class Space",} 1.692682361553E9
jvm_memory_pool_allocated_bytes_created{pool="Metaspace",} 1.692682361553E9
jvm_memory_pool_allocated_bytes_created{pool="CodeHeap 'non-nmethods'",} 1.692682361553E9

실제 kafka heap size 확인해보기

# jhsdb jmap --pid 72233 --heap


JVM version is 11.0.11+9                                    
                                                            
using thread-local object allocation.                       
Garbage-First (G1) GC with 4 thread(s)                      
                                                            
Heap Configuration:                                         
   MinHeapFreeRatio         = 40                            
   MaxHeapFreeRatio         = 70                            
   MaxHeapSize              = 1073741824 (1024.0MB)         
   NewSize                  = 1363144 (1.2999954223632812MB)
   MaxNewSize               = 643825664 (614.0MB)           
   OldSize                  = 5452592 (5.1999969482421875MB)
   NewRatio                 = 2                             
   SurvivorRatio            = 8                             
   MetaspaceSize            = 21807104 (20.796875MB)        
   CompressedClassSpaceSize = 1073741824 (1024.0MB)         
   MaxMetaspaceSize         = 17592186044415 MB             
   G1HeapRegionSize         = 1048576 (1.0MB)               
                                                            
Heap Usage:                                                 
G1 Heap:                                                    
   regions  = 1024                                          
   capacity = 1073741824 (1024.0MB)                         
   used     = 605777560 (577.7145004272461MB)               
   free     = 467964264 (446.2854995727539MB)               
   56.41743168234825% used                                  
G1 Young Generation:                                        
Eden Space:                                                 
   regions  = 305                                           
   capacity = 668991488 (638.0MB)                           
   used     = 319815680 (305.0MB)                           
   free     = 349175808 (333.0MB)                           
   47.80564263322884% used                                  
Survivor Space:                                             
   regions  = 7                                             
   capacity = 7340032 (7.0MB)                               
   used     = 7340032 (7.0MB)                               
   free     = 0 (0.0MB)                                     
   100.0% used                                              
G1 Old Generation:                                          
   regions  = 267                                           
   capacity = 397410304 (379.0MB)                           
   used     = 278621848 (265.7145004272461MB)               
   free     = 118788456 (113.2854995727539MB)               
   70.10936686734725% used                                  

방화벽 추가

# sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address=192.168.10.0/24 port port="7071" protocol="tcp" accept'

# sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address=192.168.10.0/24 port port="9100" protocol="tcp" accept'


## 방화벽 재시작
# firewall-cmd --reload
success
# firewall-cmd --list-all

prometheus config 추가

- job_name: 'kafka'
  static_configs:
  - targets: ['192.168.10.174:7071']

prometheus 재시작

systemctl restart prometheus

grafana dashboard import

https://grafana.com/grafana/dashboards/721-kafka/

kafka_exporter

https://github.com/danielqsj/kafka_exporter 에서 릴리즈 버전을 다운로드 (Docker 버전도 지원 함)

# cd /usr/local
# wget https://github.com/danielqsj/kafka_exporter/releases/download/v1.7.0/kafka_exporter-1.7.0.linux-amd64.tar.gz
# tar -xvf kafka_exporter-1.7.0.linux-amd64.tar.gz
# cd /usr/local/kafka_exporter-1.7.0.linux-amd64
# ls
LICENSE  kafka_exporter

config 파일 작성

# vim /usr/local/kafka_exporter-1.7.0.linux-amd64/kafka_exporter_config
OPTIONS="--kafka.server=kafka1:9092"

systemd 작성

vim /etc/systemd/system/kafka_exporter.service

[Unit]
Description=Kafka Exporter
After=syslog.target network.target

[Service]
Type=simple
User=root
Group=root
EnvironmentFile=/usr/local/kafka_exporter-1.7.0.linux-amd64/kafka_exporter_config
ExecStart=/usr/local/kafka_exporter-1.7.0.linux-amd64/kafka_exporter $OPTIONS
Restart=always

[Install]
WantedBy=multi-user.target
# systemctl enable kafka_exporter
# systemctl daemon-reload
# systemctl start kafka_exporter
# systemctl status kafka_exporter
● kafka_exporter.service - Kafka Exporter
   Loaded: loaded (/etc/systemd/system/kafka_exporter.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2023-08-18 13:37:41 KST; 4s ago
 Main PID: 59373 (kafka_exporter)
    Tasks: 6 (limit: 23672)
   Memory: 3.1M
   CGroup: /system.slice/kafka_exporter.service
           └─59373 /usr/local/kafka_exporter-1.7.0.linux-amd64/kafka_exporter --kafka.server=kafka1:9092

kafka export 확인

# curl GET http://localhost:9308/metrics > curl.log
# vim curl.log
...
go_gc_duration_seconds{quantile="0"} 4.6017e-05
go_gc_duration_seconds{quantile="0.25"} 4.6017e-05
go_gc_duration_seconds{quantile="0.5"} 4.9114e-05
go_gc_duration_seconds{quantile="0.75"} 6.0338e-05
go_gc_duration_seconds{quantile="1"} 6.0338e-05
go_gc_duration_seconds_sum 0.000155469
go_gc_duration_seconds_count 3
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 8
...

prometheus가 kafka export 정보를 pulling 하기 위해서 방화벽 설정

## 9308 : Kafka_export 기본 포트
# sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address=192.168.10.0/24 port port="9308" protocol="tcp" accept'

## 방화벽 재시작
# firewall-cmd --reload
success
# firewall-cmd --list-all

kminion(키미니언)

kafka_exporter, kminion 둘 다 괜찮은 라이브러리이지만 kminion이 좀 더 나은 선택인것 같다. 그 이유는 다음과 같다.

  1. github 릴리즈 주기 및 commiter 인원 및 상태가 더 좋아보임.
    • 기타: kafka_exporter는 2017년 부터 시작된 프로젝트이고, kminion는 2019년 부터 시작된 프로젝트이다.
  2. grafana dashboard에서 kafka dashboard 인기 및 다운로드 수가 kminion이 높다.

설치는 https://github.com/redpanda-data/kminion에서 최신 릴리즈 버전을 다운 받는다. (Docker 버전도 지원 함)

# cd /usr/local
# wget https://github.com/redpanda-data/kminion/releases/download/v2.2.5/kminion_2.2.5_linux_amd64.tar.gz
# mkdir kminion_2.2.5
# tar -xvf kminion_2.2.5_linux_amd64.tar.gz -C ./kminion_2.2.5
# ls kminion_2.2.5
LICENSE  README.md  kminion

kiminion 설정 파일 다운로드

# cd /usr/local/kminion_2.2.5

# wget https://raw.githubusercontent.com/redpanda-data/kminion/master/docs/reference-config.yaml

# cp reference-config.yaml ./kminion.yml

kminion.yml에는 kminion 실행 포트(8080) 등 다양한 설정정보 가 있는데, 아래와 같이 기본포트 8080을 8585로 변경해서 사용할수 있다.

...
exporter:
  # Namespace is the prefix for all exported Prometheus metrics
  namespace: "kminion"
  # Host that shall be used to bind the HTTP server on
  host: ""
  # Port that shall be used to bind the HTTP server on
  port: 8585

systemd 작성

vim /etc/systemd/system/kminion.service

[Unit]
Description=Kminion Kafka Metric
After=syslog.target network.target

[Service]
Type=simple
User=root
Group=root
Environment=KAFKA_BROKERS=kafka1:9092,kafka2:9092,kafka3:9092
Environment=CONFIG_FILEPATH=/usr/local/kminion_2.2.5/kminion.yml
ExecStart=/usr/local/kminion_2.2.5/kminion
Restart=always

[Install]
WantedBy=multi-user.target
# systemctl enable kminion
# systemctl daemon-reload
# systemctl start kminion
# systemctl status kminion
● kminion.service - Kminion Kafka Metric
   Loaded: loaded (/etc/systemd/system/kminion.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2023-08-21 11:15:48 KST; 7s ago
 Main PID: 70685 (kminion)
    Tasks: 8 (limit: 23672)
   Memory: 7.7M
   CGroup: /system.slice/kminion.service
           └─70685 /usr/local/kminion_2.2.5/kminion

방화벽

# sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address=192.168.10.0/24 port port="8585" protocol="tcp" accept'

## 방화벽 재시작
# firewall-cmd --reload
success
# firewall-cmd --list-all

Prometheus에 Kminion 정보 등록

vim /usr/local/src/prometheus-2.27.1.linux-amd64/prometheus.yml

...
scrape_configs:
  - job_name: 'kafka'
    static_configs:
      - targets: ['192.168.10.174:8585']
...

prometheus 재시작

 sudo systemctl restart prometheus

Grafana Dashboard 연동

https://grafana.com/grafana/dashboards/ 에서 kafka 검색 (direct url: https://grafana.com/grafana/dashboards/14012-kminion-cluster/)

image-20230821122509044

image-20230821122630847

Grafana에서 Dashboard Import

image-20230821122913568

Load 버튼 클릭

image-20230821123028082

Import 클릭

image-20231108165827228

Kimon Topic Dashboad

grafana dashboard에서 KMinion Topic Dashboard도 import한다.

image-20231108165929424

Burrow

https://github.com/linkedin/Burrow

https://blog.voidmainvoid.net/244

References

https://github.com/oded-dd/prometheus-jmx-kafka/blob/master/README.md

https://grafana.com/docs/grafana-cloud/monitor-infrastructure/integrations/integration-reference/integration-kafka/

https://sarc.io/index.php/miscellaneous/2251-kafka-prometheus

댓글남기기