Introducing the all-new TIBCO Community site!

For current users, please click "Sign In" to reset your password and access the enhanced features. If you're a first-time visitor, we extend a warm welcome—click "Sign Up" to become a part of the TIBCO Community!

If you're seeking alternative community sites, explore ibi, Jaspersoft, and Spotfire.

Jump to content
  • Monitoring Dockerized Kafka Using TIBCO Hawk JMX Plug-in - Community Edition


    Manoj Chaurasia

    Table of Contents


    TIBCO Hawk® provides the industry's best, sophisticated tool for monitoring and managing distributed applications and systems throughout the enterprise. With Hawk, system administrators can monitor application parameters, behavior, and loading activities for all nodes in a local or wide-area network and take action when pre-defined conditions occur. In many cases, runtime failures or slowdowns can be repaired automatically within seconds of their discovery, reducing unscheduled outages and slowdowns of critical business systems. 


    This article talks about how you can use TIBCO Hawk Container Edition and TIBCO Hawk® Plug-in for JMX - Community Edition to monitor your dockerized Kafka deployments.

    Introduction to Kafka

    Apache Kafka is an open-source publish-subscribe based messaging system. It is used for building real-time streaming data pipelines and streaming applications. Here is in short how Kafka works:

    • Kafka is run as a cluster on one or more servers that can span multiple datacenters. These Kafka servers are also called as Brokers.
    • The Kafka cluster stores stream of records in categories called topics.
    • The Kafka topics are divided into distributed partitions for load balancing and replication.
    • The Kafka producers send records to the topics
    • The Kafka consumers read data published to the particular topic.

    Monitoring Dockerized Kafka Using Hawk JMX Plug-in

    Kafka provides many performance metrics via JMX. We can use Hawk JMX plug-in to convert these metrics into Hawk microagents and extend Hawk?s monitoring and management capability to the Kafka world. The metrics exposed by Kafka include metrics for Kafka broker, producer, and consumer. The details about these metrics can be found at the official page of Kafka: http://kafka.apache.org/documentation/#monitoring

    TIBCO Apache Docker Distribution: https://hub.docker.com/r/wurstmeister/kafka/

    Prerequisites

    Adding Hawk JMX Plug-in in the hkce_agent container

    To use Hawk JMX Plug-in, we need to add this plugin into the hkce_agent container. The steps to add any custom plugin in hkce_agent are available at https://docs.tibco.com/pub/hkce/1.0.0/doc/html/GUID-F12317DD-CC65-42FD-9F8D-A42B5DD14F5D.html

    To add Hawk JMX Plug-in in the hkce_agent container, please follow these steps:

    • Download Hawk JMX Plug-in community edition from here.
    • Extract the zip file and copy all the files (including hawkjmxhma.jar, JMXPluginConfig.xml, and JMXServiceMA.hma) and paste into the folder <TEMP_DIRECTORY>/tibco.home/hkce/1.0/plugin where <TEMP_DIRECTORY> is the location where you have extracted the Hawk Container Edition software package.
    • Edit JMXPluginConfig.xml file and set the JMXServiceURL parameter to the JMX endpoint of your Kafka container. Since Kafka provides a number of MBeans, you can use MBeanFilter parameter to filter out the MBeans that you want as a Hawk microagent.
    • Build the hkce_agent docker image. Follow the documentation steps here.
    • Run all Hawk Container Edition containers in the standalone mode using docker-compose.
    • If you have Hawk console container running, you can access it at http://<Console_host_IP>:<Host_port>/HawkConsole. Here you should be able to see the Kafka Mbeans available as the Hawk microagents and you can start creating rulebases for monitoring Kafka.

    The JMXPluginConfig.xml provided with the Hawk JMX Plug-in community edition has a configuration for only one MBeanServer. Since Kafka has multiple components (Kafka broker, producer, consumer) which expose the JMX metrics. We can use the same JMXPluginConfig.xml file to monitor all these components. For that you can add multiple <MBeanServer> configurations under <MBeanServerList>.

    Screenshot2022-11-05at2_41_47PM.thumb.png.f56befbaff7d37a7b84f5206a1746dde.png

    Kafka Metrics to Monitor

    Kafka provides a number of MBeans. But they all may not be relevant from the monitoring perspective. Here is the list of sample Kafka MBeans, their equivalent Hawk microagent-methods and sample Hawk Rule test conditions for generating an alert. There are lots of Kafka MBeans and corresponding Hawk microagents available. The table below lists only a few samples.

    Kafka broker metrics:
    MetricMBeanHawk MicroagentHawk MethodSample Hawk Rule Test
    Number of offline partitionskafka.controller:type=KafkaController,name=OfflinePartitionsCountkafka.controller:type=KafkaController,name=OfflinePartitionsCount_getValueValue>0
    The average fraction of time the network processors are idlekafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercentkafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent_getValueValue < 0.3(As suggested by official Kafka doc)
    FetchConsumer request rateskafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchConsumekafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchConsume_getMean 
    FetchFollower request rateskafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetcFollowerkafka.network:type=RequestMetrics,name=RequestsPerSec,request=FetchFollower_getMean 
    Produce request rateskafka.network:type=RequestMetrics,name=RequestsPerSec,request=Producekafka.network:type=RequestMetrics,name=RequestsPerSec,request=Produce_getMean 
    Total time spent for FetchConsumer requestskafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumerkafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer_getMean 
    Total time spent for FetchFollower requestskafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollowerkafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower_getMean 
    Total time spent for Produce requestskafka.network:type=RequestMetrics,name=TotalTimeMs,request=Producekafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce_getMean 
    Count of under replicated partitionskafka.server:type=ReplicaManager,name=UnderReplicatedPartitionskafka.server:type=ReplicaManager,name=UnderReplicatedPartitions_getValueValue>0
    Count of under minIsr partitions, i.e. (|ISR| < min.insync.replicas)kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCountkafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount_getValueValue>0
    Offline replica countkafka.server:type=ReplicaManager,name=OfflineReplicaCountkafka.server:type=ReplicaManager,name=OfflineReplicaCount_getValueValue>0
    ISR shrink ratekafka.server:type=ReplicaManager,name=IsrShrinksPerSeckafka.server:type=ReplicaManager,name=IsrShrinksPerSec_getOneMinuteRateIf a broker goes down, ISR for some of the partitions will shrink. When that broker is up again, ISR will be expanded once the replicas are fully caught up. Other than that, the expected value for both ISR shrink rate and expansion rate is 0.
    ISR expansion ratekafka.server:type=ReplicaManager,name=IsrExpandsPerSeckafka.server:type=ReplicaManager,name=IsrExpandsPerSec_getOneMinuteRateSee above
    Number of incoming messages per secondkafka.server:type=BrokerTopicMetrics,name=MessagesInPerSeckafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec_getCount(For total messages),_getOneMinuteRate(messages/minute) 
    Byte in rate from clientskafka.server:type=BrokerTopicMetrics,name=BytesInPerSeckafka.server:type=BrokerTopicMetrics,name=BytesInPerSec_getOneMinuteRate 
    Byte out rate to clientskafka.server:type=BrokerTopicMetrics,name=BytesOutPerSeckafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec_getOneMinuteRate 
    Kafka consumer metrics:
    MetricMBeanHawk MicroagentHawk MethodSample Hawk Rule Test
    The average time taken for a commit requestkafka.consumer:type=consumer-coordinator-metrics,client-id={client_id}kafka.consumer:type=consumer-coordinator-metrics,client-id={client_id}_getcommit-latency-avg 
    The average number of heartbeats per secondkafka.consumer:type=consumer-coordinator-metrics,client-id={client_id}kafka.consumer:type=consumer-coordinator-metrics,client-id={client_id}_getheartbeat-rate 
    The number of seconds since the last controller heartbeat.kafka.consumer:type=consumer-coordinator-metrics,client-id={client_id}kafka.consumer:type=consumer-coordinator-metrics,client-id={client_id}_getlast-heartbeat-seconds-ago 
    Number of messages the consumer lags behind the producer by. Published by the consumer, not broker.kafka.consumer:type=consumer-fetch-manager-metrics,client-id={client-id}kafka.consumer:type=consumer-fetch-manager-metrics,client-id={client-id}_getrecords-lag-maxRecords-lag-max > some_predefined_value
    The average number of records consumed per second for a topickafka.consumer:type=consumer-fetch-manager-metrics,client-id={client-id},topic={topic}kafka.consumer:type=consumer-fetch-manager-metrics,client-id="client-id",topic="topic"_getrecords-consumed-rate 
    Max request latency between consumer and the broker nodekafka.consumer_type=consumer-node-metrics,client-id={client-id},node-id={node-id}kafka.consumer_type=consumer-node-metrics,client-id={client-id},node-id={node-id}_getrequest-latency-max 

    Note:

    1. In the above table, {client-id} represents the id of the kafka consumer. So for each consumer in your application there will be corresponding MBean and microagent.

    2. Similarly {topic} represents the kafka topic that your kafka consumer is consuming. So for each such consumer-topic pair, there will be corresponding MBean and microagent.

    3. Similar is the case for {node-id} which represents the broker node id. So there will be MBean and microagent for each consumer-node pair.

    Kafka producer metrics:
    MetricMBeanHawk MicroagentHawk MethodSample Hawk Rule Test
    The maximum time in ms a request was throttled by a brokerkafka.producer:type=producer-metrics,client-id={client-id}kafka.producer:type=producer-metrics,client-id={client-id}_getproduce-throttle-time-max 
    The total number of connections closedkafka.producer:type=producer-metrics,client-id={client-id}kafka.producer:type=producer-metrics,client-id={client-id}_getconnection-close-total 
    The number of network operations (reads or writes) on all connections per secondkafka.producer:type=producer-metrics,client-id={client-id}kafka.producer:type=producer-metrics,client-id={client-id}_getnetwork-io-rate 
    The average request latency in mskafka.producer_type=producer-node-metrics,client-id={client-id},node-id={node-id}kafka.producer_type=producer-node-metrics,client-id={client-id},node-id={node-id}_getrequest-latency-avg 
    The average per-second number of retried record sends for a topickafka.producer:type=producer-topic-metrics,client-id={client-id},topic={topic}kafka.producer:type=producer-topic-metrics,client-id={client-id},topic={topic}_getrecord-retry-rateRule to check if retry rate is high
    The average per-second number of record sends that resulted in errors for a topickafka.producer:type=producer-topic-metrics,client-id={client-id},topic={topic}kafka.producer:type=producer-topic-metrics,client-id={client-id},topic={topic}_getrecord-error-rateRule to check if error rate is high

    Note:

    1. In the above table, {client-id} represents the id of the kafka producer. So for each producer in your application there will be corresponding MBean and microagent
    2. Similarly {topic} represents the kafka topic that your kafka producer is sending the data to. So for each such producer-topic pair, there will be corresponding MBean and microagent.

    3. Similar is the case for {node-id} which represents the broker node id. So there will be MBean and microagent for each producer-node pair.

    Similar Articles

    Monitoring your applications using TIBCO Hawk®

     

     

    Monitoring TIBCO BusinessWorks? Container Edition Infrastructure and Applications using TIBCO Hawk® Container Edition

     

     

    Monitoring TIBCO BusinessWorks? Container Edition applications with TIBCO Hawk® Container Edition

     

     

    Welcome to the TIBCO Hawk® Community Wiki

     

     

    Sending Docker Container logs to TIBCO LogLogic Log Management Intelligence using TIBCO Hawk Container Edition

     

     


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...