eG Monitoring
 

Measures reported by AWSMSKCluInfTest

In a AWS MSK cluster, one of the brokers serves as the controller, which is responsible for managing the states of partitions and replicas and for performing administrative tasks like reassigning partitions.

This test monitors the partitions managed by the active controller in the broker and reports the partitions that are in offline state that leads to a lag in the read/write operations.

Outputs of the test: One set of results for each cluster executing in the target AWS Managed Service Kafka server.

Descriptor: Cluster

The measures made by this test are as follows:

Measurement Description Measurement Unit Interpretation
State Indicates whether/not the cluster is active.  

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
Active 100
Updating 90
Creating 80
Maintenance 70
Healing 60
Rebooting broker 50
Deleting 40
Failed 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not the broker cluster is active.

numberOfBrokerNodes Indicates the number of broker nodes. Number

 

enhancedMonitoring Indicates whether/not the target broker is ready for monitoring.  

The values reported by this measure and its numeric equivalents are mentioned in the table below:

Measure Value Numeric Value
Default 100
Per broker 75
Per topic per broker 50
Per topic per partition 25
No monitoring 0

Note:

By default, this measure reports the Measure Values listed in the table above to indicate whether/not the target broker is ready for monitoring.

activeControllerCount Indicates the number of active controllers in the target broker. Number

Only one controller per cluster should be active at any given time.

globalpartitionCount Indicates the number of partitions across all topics in the cluster. Number

Global Partition is updated when the Controller Event Thread gets a Topic Change, Topic Deletion, and Partition Reassignment request and is purged on Controller failover.

Since Global Partition Count does not include replicas, the sum of the Partition Count values can be higher than Global Partition Count if the replication factor for a topic is greater than 1.

offlinePartitionsCount Indicates the number of partitions that are offline in the cluster. Number

Alert will be given when this value is greater than 0.

globalTopicCount Indicates the number of topics across all brokers in the cluster. Number

 

kafkaDataLogsDiskUsed Indicates the percentage of disk space used for data logs. Percent