Documentation is now available for the Fall 2020 Update release!

Kubernetes Metrics

List of the metrics used to monitor Kubernetes resources.

Leave Feedback

Introduction

Master agent deployment helps to collect k8s-apiserver, k8s-controller, k8s-scheduler, k8s-kube-state, k8s-metrics-server, k8s-coreDNS / kubeDNS metrics required to monitor Kubernetes.

Metrics for Docker

Metrics for Docker
MetricsDisplay NameDescriptionUnits
docker.containers.running_totalDocker Container Running TotalThe total number of containers running on the host machine-
docker.containers.stopped_totalTotal Containers StoppedThe total number of containers stopped (not running) on the host machine-
docker.container.statesDocker Container statesThe state of Container-
docker.containers.runningContainers Running by ImageThe number of containers running on host plotted with image as instance-
docker.containers.stoppedContainers Stopped by ImageThe number of containers stopped on host plotted with image as instance-
docker.image.sizeImage SizeThe amount of data (on disk) that is used for the writable layer of each containerMB
docker.image.virtual_sizeImage Virtual SizeThe total amount of disk-space used for the read-only image data used (shared) by each container and the writable layer of each containerMB
docker.images.availableImages AvailableThe number of top-level images-
docker.images.intermediateImages intermediateThe number of intermediate images, which are intermediate layers that make up other images-
docker.container.size_rootfsRoot Filesystem SizeThe total size of all the files in the containerMB
docker.container.size_rwTotal Files SizeThe total size of the files (plotted as MB) that has been changed or newly created if you compare the container to its base image. This indicates that just after the container creation, size should be zero and as you modify (or create) files, size will increase.MB
docker.cpu.usageCPU UsageThe percentage of CPU time obtained by container with regard to to all CPUs%
docker.cpu.usage.overlimitCPU Usage Over LimitThe percentage of CPU time obtained by container over its CPU limit set ( If limit is not set , this metric will not be monitored & even /graph will not be plotted )%
docker.cpu.usage.percpuCPU Usage per CPUThe percentage of CPU time obtained by container with regard to to each CPU%
docker.cpu.sharesShares of CPUShares of CPU usage allocated to the container-
docker.cpu.systemCPU SystemThe percentage of time the CPU is executing system calls on behalf of processes of this container, unnormalized.%
docker.cpu.throttledCPU ThrottledNumber of times the cgroup has been throttled-
docker.cpu.userCPU UserThe percentage of time the CPU is under direct control of processes of this container, unnormalized%
docker.mem.usageMemory UsageThe percentage of used memory out of total node memory%
docker.mem.usage.overlimitMemory Usage Over LimitThe percentage of used memory out of memory limit ( If limit is not set , this metric will not be monitored & even < metric value >/graph will not be plotted )%
docker.mem.in_useMemory In UseThe fraction of used memory to available memory limit if the limit is set. Otherwise, it will be against the node memory-
docker.mem.limitMemory LimitThe memory limit for the container, if setMB
docker.io.read_bytesIO Read BytesBytes read per second from disk by the processes of the containerBytes/Sec
docker.io.write_bytesIO Write BytesBytes written per second to disk by the processes of the containerBytes/Sec
docker.mem.active_anonActive RSS MemoryThe amount of active RSS memory. Active memory is not swapped to diskMB
docker.mem.active_fileActive Cache MemoryThe amount of active cache memory. Active memory is reclaimed by the system only after inactive has been reclaimed.MB
docker.mem.cacheCache SizeThe amount of memory that is being used to cache data from disk (For example, memory content that can be associated precisely with a block on a block device)MB
docker.mem.inactive_anonInactive RSS MemoryThe amount of inactive RSS memory. Inactive memory is swapped to disk when necessary.MB
docker.mem.inactive_fileInactive Cache MemoryThe amount of inactive cache memory. Inactive memory may be reclaimed first when the system needs memory.MB
docker.mem.mapped_fileMemory Mapped by ProcessThe amount of memory mapped by the processes in the control group.MB
docker.mem.pgfaultMemory Page FaultsThe rate that processes in the container trigger page faults by accessing a non-existent or protected part of its virtual address space. Usually a page fault of this type results in a segmentation fault./sec
docker.mem.pgmajfaultMemory Page Faults VirtualThe rate that processes in the container trigger page faults by accessing a part virtual address space that was swapped out or corresponded to a mapped file. Usually, a page fault of type results in fetching the data from disk instead of memory./sec
docker.mem.pgpginPages Charged RateThe rate at which pages are charged (added to the accounting) of a cgroup/sec
docker.mem.pgpgoutPages Uncharged RateThe rate at which pages are uncharged (removed from the accounting) of a cgroup/sec
docker.mem.rssRSS MemoryThe amount of non-cache memory that belongs to the container's processes. For example, used for stacks and heaps.MB
docker.mem.soft_limitMemory Reservation LimitThe memory reservation limit for the container, when set.MB
docker.mem.sw_in_useSwap Memory In UseThe fraction of used swap + memory to available swap + memory if the limit is set.-
docker.mem.sw_limitSwap Memory LimitThe swap + memory limit for the container, when setMB
docker.container.interface.traffic.inNetwork Rx Bytes per SecNetwork Rx Bytes per SecondBytes/Sec
docker.container.interface.traffic.outNetwork Tx Bytes per SecNetwork Tx Bytes per SecondBytes/Sec
docker.container.interface.packets.inNetwork Rx Packets per SecNetwork Rx Packets per Second/sec
docker.container.interface.packets.outNetwork Tx Packets per SecNetwork Tx Packets per Second/sec
docker.container.interface.errors.inNetwork Rx Errors per SecNetwork Rx Errors per Second/sec
docker.container.interface.errors.outNetwork Tx Errors per SecNetwork Tx Errors per Second/sec
docker.container.interface.discards.inNetwork Rx Drops per SecNetwork Rx Drops per Second/sec
docker.container.interface.discards.outNetwork Tx Drops per SecNetwork Tx Drops per Second/sec

Metrics for ContainerD

Metrics for ContainerD
MetricsDisplay NameDescriptionUnits
containerd_hugetlb_failcntContainerD HugeTLB fail RateRate of allocation failure due to HugeTLB limit-
containerd_hugetlb_maxContainerD HugeTLB max usagemax hugepagesize hugetlb usage recordedBytes
containerd_hugetlb_usageContianerD HugeTLB usageCurrent usage for hugepagesize hugetlbBytes
containerd_memory_usageContinaerD Memory UsageMemory Usage in BytesBytes
containerd_memory_usage_failcntContainerD Memory Usage fail RateRate of number of times the cgroup limit exceeded-
containerd_memory_usage_limitContainerD Memory Usage Limitlimit of memory usageBytes
containerd_memory_usage_maxContainerD Memory Usage Maxshow maximum memory usage recordedBytes
containerd_memory_cacheContainerD Memory Cachebytes of page cache memoryBytes
containerd_memory_rssContainerD Memory RSSbytes of anonymous and swap cache memory (includes transparent huge pages)Bytes
containerd_memory_rss_hugeContainerD Memory RSS Hugebytes of anonymous transparent huge pagesBytes
containerd_memory_dirtyContainerD Memory Dirtybytes that are waiting to get written back to the diskBytes
containerd_memory_swap_usageContinaerD Swap Usageswap Usage in BytesBytes
containerd_memory_swap_failcntDisplContainerD Swap Usage fail RateRate of number of times the cgroup swap limit exceeded-
containerd_memory_swap_limitContainerD Swap Usage Limitlimit of swap usageBytes
containerd_memory_swap_maxContainerD Swap Usage Maxshow maximum swap usage recordedBytes
containerd_memory_kernel_usageContainerD Kernel Usage Namecurrent kernel memory allocationBytes
containerd_memory_kernel_failcntContainerD Kernel fail countrate of the number of kernel memory usage hits limits-
containerd_memory_kernel_limitContainerD Kernel Limithard limit for kernel memoryBytes
containerd_memory_kernel_maxContainerD Kernel Maxmax kernel memory usage recordedBytes
containerd_memory_kernel_tcp_usageContainerD Kernel TCP Usagecurrent TCP buffer memory allocationBytes
containerd_memory_kernel_tcp_failcntContainerD Kernel TCP fail raterate of number of tcp buf memory usage hits limits-
containerd_memory_kernel_tcp_limitContainerD Kernel TCP Limitshow hard limit for TCP buffer memoryBytes
containerd_memory_kernel_tcp_maxContainerD Kernel TCP Maxmaximum TCP buffer memory usage recordedBytes
containerd_cpu_throttling_throttledTimeContainerD CPU Throttled TimeCPU throttled time%
containerd_cpu_usage_systemContainerD CPU System Usagesystem CPU usage of container with repect to host system%
containerd_cpu_usage_totalContainerD CPU Total Usagetotal CPU usage of container with repect to host system%
containerd_cpu_usage_userContainerD CPU User Usageuser CPU usage of container with repect to host system%
containerd_blkio_service_bytes_recursiveContainerD BlkIO Service BytesNumber of bytes transferred to/from the diskBytes
containerd_blkio_serviced_recursiveContainerD BlkIO ServicedNumber of IOs (bio) issued to the disk by the groupBytes
containerd_blkio_queued_recursiveContainerD BlkIO QueuedTotal number of requests queued up at any given instant for the cgroupBytes
containerd_blkio_service_time_recursiveContainerD BlkIO Service TimeTotal amount of time between request dispatch and request completion for the IOsBytes
containerd_blkio_wait_time_recursiveContainerD BlkIO Wait TimeTotal amount of time the IOs for this cgroup spent waiting in the scheduler queues for serviceBytes
containerd_blkio_merged_recursiveContainerD BlkIO MergedTotal number of bios/requests merged into requests belonging to this cgroupBytes
containerd_blkio_time_recursiveContainerD BlkIO Timedisk time allocated to cgroup per device in millisecondsBytes
containerd_blkio_sectors_recursiveContainerD BlkIO Sectorsnumber of sectors transferred to/from disk by the groupBytes
containerd_proc_open_fdsContainerD number of open fdNumber of open file descriptors-
containerd_container_uptimeContainerD Container UptimeUptime of the Current ContainerSeconds
containerd_containers_runningContainerD Running ContainersTotal number of running containers-
containerd_containers_stoppedContainerD Stopped ContainersTotal number of Stopped Containers-
containerd_image_sizeContainerD Image SizeImage sizes of different container imagesBytes

Metrics for Kubelet

Metrics for Kubelet
MetricsDisplay NameDescriptionUnits
kube_pods_runningPods RunningThe number of running pods-
MetricsDisplay NameDescription-
kube_containers_runningContainers RunningThe number of running containers-
kube_containers_restartsContainers RestartsThe number of times the container has been restarted-
kube_cpu_load_10s_avgCpu Load 10S AvgContainer CPU load average over the last 10 seconds-
kube_cpu_system_totalCpu System TotalSystem CPU time consumed in seconds./sec
kube_cpu_user_totalCpu User TotalUser cpu time consumed in seconds./sec
kube_cpu_cfs_periodsCpu Cfs PeriodsNumber of elapsed enforcement period intervals/sec
kube_cpu_cfs_throttled_periodsCpu Cfs Throttled PeriodsNumber of throttled period intervals/sec
kube_cpu_cfs_throttled_secondsCpu Cfs Throttled SecondsTotal duration of the container being throttled/sec
kube_node_cpu_capacityNode Cpu CapacityCPU capacity of Node (Plotted in Millicores)Millicores
kube_node_memory_capacityNode Memory CapacityMemory capacity of node (Plotted in Megabytes)Megabytes
kube_node_cpu_usage_percentageNode Cpu Usage PercentageCPU usage percentage of node%
kube_node_memory_usage_percentageNode Memory Usage PercentageMemory usage percentage of node%
kube_node_cpu_allocatableNode Cpu AllocatableCPU allocatable of nodeMillicores
kube_node_memory_allocatableNode Memory AllocatableMemory allocatable of nodeMegabytes
kube_node_cpu_usageNode Cpu UsageCPU usage of node (Plotted in Millicores)Millicores
kube_node_memory_usageNode Memory UsageMemory usage of node (Plotted in Megabytes)Megabytes
kube_cpu_usage_totalCpu Usage TotalCPU time consumed in seconds./sec
kube_cpu_limitsCpu LimitsThe limit of CPU cores setMillicores
kube_cpu_requestsCpu RequestsThe requested CPU coresMillicores
kube_filesystem_usageFilesystem UsageNumber of megabytes that are consumed by the container on this filesystem.Megabytes
kube_filesystem_usage_pctFilesystem Usage PctNumber of megabytes that can be consumed by the container on this filesystem.Fraction
kube_io_read_bytesIo Read BytesThe amount of bytes read from the diskBytes / Second
kube_io_write_bytesIo Write BytesThe amount of bytes written to the diskBytes / Second
kube_memory_limitsMemory LimitsMemory limit for the container.Megabytes
kube_memory_sw_limitMemory Sw LimitMemory swap limit for the container.Bytes
kube_memory_requestsMemory RequestsThe requested memoryMegabytes
kube_memory_usageMemory UsageCurrent memory usage in bytes including all memory regardless of when it was accessedBytes
kube_memory_working_setMemory Working SetCurrent working set in megabytes, for which the OOM killer is watching forMegabytes
kube_memory_cacheMemory CacheNumber of bytes of page cache memory.Bytes
kube_memory_rssMemory RssSize of RSS in bytesBytes
kube_memory_swapMemory SwapContainer swap usage in bytes.Bytes
kube_network_rx_bytesNetwork Rx BytesThe amount of bytes received per secondBytes / Second
kube_network_rx_droppedNetwork Rx DroppedThe amount of Rx packets dropped per secondPackets / Second
kube_network_rx_errorsNetwork Rx ErrorsThe amount of Rx errors per secondErrors / Second
kube_network_tx_bytesNetwork Tx BytesThe amount of bytes transmitted per secondBytes / Second
kube_network_tx_droppedNetwork Tx DroppedThe amount of tx packets dropped per secondPackets / Second
kube_network_tx_errorsNetwork Tx ErrorsThe amount of tx errors per secondErrors / Second
kube_apiserver_certificate_expirationApiserver Certificate ExpirationAverage distribution of the remaining lifetime on the certificate used to authenticate a request since last pool.Seconds
kube_rest_client_requestsRest Client RequestsThe number of HTTP requestsOperations / Second
kube_rest_client_latencyRest Client LatencyAverage Request latency in seconds. Broken down by verb and URL since last pool.Seconds
kube_kubelet_runtime_operationsKubelet Runtime OperationsThe number of runtime operationsOperations / Second
kube_kubelet_runtime_errorsKubelet Runtime ErrorsThe number of runtime operations errorsOperations / Second
kube_kubelet_network_plugin_latencyKubelet Network Plugin LatencyAverage latency in seconds of network plugin operations. Broken down by operation type since the last pool.Seconds
kube_kubelet_volume_stats_available_bytesKubelet Volume Stats Available BytesThe number of available bytes in the volumeBytes
kube_kubelet_volume_stats_capacity_bytesKubelet Volume Stats Capacity BytesThe capacity in bytes of the volumeBytes
kube_kubelet_volume_stats_used_bytesKubelet Volume Stats Used BytesThe number of used bytes in the volumeBytes
kube_kubelet_volume_stats_inodesKubelet Volume Stats InodesThe maximum number of inodes in the volumeInode
kube_kubelet_volume_stats_inodes_freeKubelet Volume Stats Inodes FreeThe number of free inodes in the volumeInode
kube_kubelet_volume_stats_inodes_usedKubelet Volume Stats Inodes UsedThe number of used inodes in the volumeInode
kube_ephemeral_storage_usageEphemeral Storage UsageEphemeral storage usage of the PODMegabytes
kube_kubelet_evictionsKubelet EvictionsThe number of pods that have been evicted from the kubelet (ALPHA in kubernetes v1.16)-
kube_kubelet_cpu_usageKubelet Cpu UsageThe number of cores used by kubeletMillicores
kube_kubelet_memory_rssKubelet Memory RssSize of kubelet RSS in megabytesMegabytes
kube_runtime_cpu_usageRuntime Cpu UsageThe number of cores used by the runtimeMillicores
kube_runtime_memory_rssRuntime Memory RssSize of runtime RSS in megabytesMegabytes
kube_kubelet_container_log_filesystem_used_bytesKubelet Container Log Filesystem Used BytesBytes used by the container's logs on the filesystem (requires kubernetes 1.14+)Bytes

Metrics for Kube State

Metrics for Kube State
MetricsDisplay NameDescriptionUnits
kubernetes_state.container.cpu_limitContainer Cpu LimitThe limit on CPU cores to be used by a containercpu
kubernetes_state.container.cpu_requestedContainer Cpu RequestedThe number of requested CPU cores by a containercpu
kubernetes_state.container.memory_limitContainer Memory LimitThe limit on memory to be used by a containerbyte
kubernetes_state.container.memory_requestedContainer Memory RequestedThe number of requested memory bytes by a containerbyte
kubernetes_state.container.readyContainer ReadyDescribes whether the containers readiness check succeeded.-
kubernetes_state.container.ready.totalTotal Containers ReadyTotal containers whose readiness check succeeded-
kubernetes_state.container.restartsContainer RestartsThe number of restarts per container-
kubernetes_state.container.restarts.totalTotal Containers Restarts CountTotal containers restarts count-
kubernetes_state.container.runningContainer RunningDescribes whether the container is currently in running state.-
kubernetes_state.container.running.totalTotal Containers RunningTotal containers currently in running state.-
kubernetes_state.container.terminatedContainer TerminatedDescribes whether the container is currently in terminated state.-
kubernetes_state.container.terminated.totalTotal Containers TerminatedTotal containers currently in terminated state.-
kubernetes_state.container.waitingContainer WaitingWhether the container is currently in waiting state.-
kubernetes_state.container.waiting.totalTotal Containers WaitingTotal containers currently in waiting state.-
kubernetes_state.daemonset.desiredDaemonset DesiredThe number of nodes that should be running the daemon pod.-
kubernetes_state.daemonset.misscheduledDaemonset MisscheduledThe number of nodes running a daemon pod but are not expected to.-
kubernetes_state.daemonset.readyDaemonset ReadyThe number of nodes that should be running the daemon pod and have one or more of the daemon pods running and ready.-
kubernetes_state.daemonset.scheduledDaemonset ScheduledThe number of nodes running at least one daemon pod as expected.-
kubernetes_state.deployment.pausedDeployment PausedThe deployment is paused and will not be processed by the deployment controller.-
kubernetes_state.deployment.replicasDeployment ReplicasThe number of replicas per deployment.-
kubernetes_state.deployment.replicas_availableDeployment Replicas AvailableThe number of available replicas per deployment.-
kubernetes_state.deployment.replicas_desiredDeployment Replicas DesiredThe number of desired replicas per deployment.-
kubernetes_state.deployment.replicas_unavailableDeployment Replicas UnavailableThe number of unavailable replicas per deployment.-
kubernetes_state.deployment.replicas_updatedDeployment Replicas UpdatedThe number of updated replicas per deployment.-
kubernetes_state.deployment.rollingupdate.max_unavailableDeployment Rollingupdate Max UnavailableMaximum number of unavailable replicas during a rolling update of a deployment.-
kubernetes_state.node.cpu_allocatableNode Cpu AllocatableThe CPU resources of a node that are available for scheduling.-
kubernetes_state.node.cpu_capacityNode Cpu CapacityThe total CPU resources of the node.cpu
kubernetes_state.node.memory_allocatableNode Memory AllocatableThe memory resources of a node that are available for scheduling.byte
kubernetes_state.node.memory_capacityNode Memory CapacityThe total memory resources of the node.byte
kubernetes_state.node.pods_allocatableNode Pods AllocatableThe pod resources of a node that are available for scheduling.-
kubernetes_state.node.pods_capacityNode Pods CapacityThe total pod resources of the node.-
kubernetes_state.node.statusNode StatusThe condition of a cluster node plotted with node as an instance. This metric gives status of each node with values either 0 or 1.-
kubernetes_state.pod.readyPod ReadyDescribes whether the pod is ready to serve requests. In association with the condition tag, whether the pod is ready to serve requests. For example, condition:true keeps the pods that are in a ready state-
kubernetes_state.pod.scheduledPod ScheduledDescribes the status of the scheduling process for the pod.-
kubernetes_state.replicaset.fully_labeled_replicasReplicaset Fully Labeled ReplicasThe number of fully labeled replicas per ReplicaSet.-
kubernetes_state.replicaset.replicasReplicaset ReplicasThe number of replicas per ReplicaSet.-
kubernetes_state.replicaset.replicas_desiredReplicaset Replicas DesiredNumber of desired pods for a ReplicaSet-
kubernetes_state.replicaset.replicas_readyReplicaset Replicas ReadyThe number of ready replicas per ReplicaSet-
kubernetes_state.resourcequota.limits.cpu.limitResourcequota Limits Cpu LimitHard limit on the sum of CPU core limits for a resource quotacpu
kubernetes_state.resourcequota.limits.cpu.usedResourcequota Limits Cpu UsedObserved sum of limits for CPU cores for a resource quotacpu
kubernetes_state.resourcequota.limits.memory.limitResourcequota Limits Memory LimitHard limit on the sum of memory bytes limits for a resource quotabyte
kubernetes_state.resourcequota.limits.memory.usedResourcequota Limits Memory UsedObserved sum of limits for memory bytes for a resource quotabyte
kubernetes_state.resourcequota.persistentvolumeclaims.limitResourcequota Persistentvolumeclaims LimitHard limit of the number of PVC for a resource quota-
kubernetes_state.resourcequota.persistentvolumeclaims.usedResourcequota Persistentvolumeclaims UsedObserved number of persistent volume claims used for a resource quota-
kubernetes_state.resourcequota.pods.limitResourcequota Pods LimitHard limit of the number of pods for a resource quota-
kubernetes_state.resourcequota.pods.usedResourcequota Pods UsedObserved number of pods used for a resource quota-
kubernetes_state.resourcequota.requests.cpu.limitResourcequota Requests Cpu LimitHard limit on the total of CPU core requested for a resource quotacpu
kubernetes_state.resourcequota.requests.cpu.usedResourcequota Requests Cpu UsedObserved sum of CPU cores requested for a resource quotacpu
kubernetes_state.resourcequota.requests.memory.limitResourcequota Requests Memory LimitHard limit on the total of memory bytes requested for a resource quotabyte
kubernetes_state.resourcequota.requests.memory.usedResourcequota Requests Memory UsedObserved sum of memory bytes requested for a resource quotabyte
kubernetes_state.resourcequota.requests.storage.limitResourcequota Requests Storage LimitHard limit on the total of storage bytes requested for a resource quotabyte
kubernetes_state.resourcequota.requests.storage.usedResourcequota Requests Storage UsedObserved sum of storage bytes requested for a resource quotabyte
kubernetes_state.resourcequota.services.limitResourcequota Services LimitHard limit of the number of services for a resource quota-
kubernetes_state.resourcequota.services.loadbalancers.limitResourcequota Services Loadbalancers LimitHard limit of the number of load balancers for a resource quota-
kubernetes_state.resourcequota.services.loadbalancers.usedResourcequota Services Loadbalancers UsedObserved number of load balancers used for a resource quota-
kubernetes_state.resourcequota.services.nodeports.limitResourcequota Services Nodeports LimitHard limit of the number of node ports for a resource quota-
kubernetes_state.resourcequota.services.nodeports.usedResourcequota Services Nodeports UsedObserved number of node ports used for a resource quota-
kubernetes_state.resourcequota.services.usedResourcequota Services UsedObserved number of services used for a resource quota-

Metrics for CoreDNS

Metrics for CoreDNS
MetricsDisplay NameDescription
coredns.panicsTotal PanicsTotal number of panics.
coredns.query.countQuery countTotal query count.
coredns.request_duration.seconds.sumRequest Duration Seconds SumDuration to process each query.
coredns.request_duration.seconds.countRequest Duration Seconds CountDuration per upstream interaction
coredns.response_size.bytes.sumResponse Size Bytes SumSize of the returns response in bytes.

Metrics for KubeDNS

Metrics for KubeDNS
MetricsDisplay NameDescription
kubedns.cachemiss_countCachemiss CountNumber of DNS cache misses (from start of process)
kubedns.error_countError CountNumber of DNS requests resulting in an error.
kubedns.request_countRequest CountTotal number of DNS requests made.
kubedns.request_duration.seconds.countRequest Duration Seconds CountNumber of requests on which the kubedns.request_duration.seconds.sum metric is evaluated.
kubedns.request_duration.seconds.sumRequest Duration Seconds SumTime (in seconds) taken to resolve each request.
kubedns.response_size.bytes.countResponse Size Bytes CountNumber of responses on which the kubedns.response_size.bytes.sum metric is evaluated.
kubedns.response_size.bytes.sumResponse Size Bytes SumSize of the returns response in bytes.

Metrics for Kube Controller

Metrics for Kube Controller
MetricsDisplay NameDescriptionUnits
controller.workqueue.work_duration.sumKube Controller Workqueue Work Duration Seconds SumDuration taken in seconds to process an item from workqueueSeconds
controller.workqueue.work_duration.countKube Controller Workqueue Work Duration Seconds CountTotal time taken in seconds to process an item from workqueueSeconds
controller.workqueue.work_unfinished_durationKube Controller Workqueue Unfinished Work SecondsTime in seconds taken for the work in progress and has not been observed by work_duration. Large values indicate stuck threadsSeconds
controller.workqueue.work_longest_durationKube Controller Workqueue Longest Running Processor SecondsTime in seconds for which the longest running processor for workqueue has been running-
controller.workqueue.queue_duration.sumKube Controller Workqueue Queue Duration Seconds SumDuration in seconds for whichan item remains in workqueue before being requested-
controller.workqueue.queue_duration.countKube Controller Workqueue Queue Duration Seconds CountTotal duration in seconds for which an item remains in workqueue before being requested-
controller.workqueue.nodes.countKube Controller Registered NodesNumber of registered Nodes per zone-
controller.workqueue.nodes.unhealthyKube Controller Node Collector Unhealthy Nodes in ZoneNumber of Nodes not ready per zone-
controller.workqueue.nodes.evictionsKube Controller Node Collector Evictions NumberNumber of Node evictions that happened since current instance of NodeController started-
controller.workqueue.depthKube Controller Workqueue DepthCurrent depth of workqueue-
controller.workqueue.addsKube Controller Workqueue Adds TotalTotal number of additions/insertions handled by workqueue-
controller.workqueue.retriesKube Controller Workqueue Retries TotalTotal number of retries handled by workqueue-
controller.rate_limiter.useKube Controller Node Lifecycle Controller Rate Limiter UseA metric measuring the saturation of the rate limiter for node_lifecycle_controller-
controller.go.goroutinesKube Controller Go GoroutinesNumber of goroutines that currently exist-
controller.threadsKube Controller Os ThreadsNumber of OS threads created-
controller.process.max_fdsKube Controller Process Max FdsMaximum number of open file descriptors-
controller.process.open_fdsKube Controller Process Open FdsNumber of open file descriptors-

Metrics for Kube Scheduler

Metrics for Kube Scheduler
MetricsDisplay NameDescriptionUnits
scheduler.binding.duration.countKube Scheduler Binding Duration Seconds CountTotal Binding duration in secondsSeconds
scheduler.binding.duration.secondsKube Scheduler Binding Duration Seconds SumBinding duration in secondsSeconds
scheduler.binding.latency.countKube Scheduler Binding Latency Microseconds CountTotal Binding latency in microsecondsMicroseconds
scheduler.binding.latency.sumKube Scheduler Binding Latency MicrosecondsBinding latency in microseconds sumMicroseconds
scheduler.cache.lookupsKube Scheduler Equiv Cache Lookups TotalTotal number of equivalent cache lookups, by whether a cache entry was found-
scheduler.client.http.requestsKube Scheduler Rest Client Requests TotalNumber of HTTP requests, partitioned by status code, method, and host-
scheduler.client.http.requests_duration.countKube Scheduler Rest Client Request Latency Seconds CountTotal request latency in seconds. Broken down by verb and URLSeconds
scheduler.client.http.requests_duration.sumKube Scheduler Rest Client Request Latency Seconds SumRequest latency in seconds. Broken down by verb and URLSeconds
scheduler.gc_duration_seconds.countKube Scheduler Go GC Duration Seconds CountA summary of the GC invocation durations-
scheduler.gc_duration_seconds.quantileKube Scheduler Go GC Duration SecondsA summary of the GC invocation durations-
scheduler.gc_duration_seconds.sumKube Scheduler Go GC Duration Seconds SumA summary of the GC invocation durations-
scheduler.go.goroutinesKube Scheduler Go GoroutinesNumber of goroutines that currently exist-
scheduler.process.max_fdsKube Scheduler Process Max FdsMaximum number of open file descriptors-
scheduler.process.open_fdsKube Scheduler Process Open FdsNumber of open file descriptors-
scheduler.pod_preemption.victimsKube Scheduler Pod Preemption VictimsNumber of selected preemption victims-
scheduler.pod_preemption.attemptsKube Scheduler Total Preemption AttemptsTotal preemption attempts in the cluster till now-
scheduler.schedule_attempts.totalKube Scheduler Schedule Attempts TotalNumber of attempts to schedule pods, by the result. unschedulable means a pod could not be scheduled, while error means an internal scheduler problem-
scheduler.scheduling.algorithm_duration.countKube Scheduler Scheduling Algorithm Duration Seconds CountTotal Scheduling algorithm latency in secondsSeconds
scheduler.scheduling.algorithm_duration.sumKube Scheduler Scheduling Algorithm Duration Seconds SumScheduling algorithm latency in secondsSeconds
scheduler.scheduling.algorithm_latency.countKube Scheduler Scheduling Algorithm Latency Microseconds CountTotal Scheduling algorithm latency in microsecondsMicroseconds
scheduler.scheduling.algorithm_latency.sumKube Scheduler Scheduling Algorithm Latency Microseconds SumScheduling algorithm latency in microsecondsMicroseconds
scheduler.scheduling.algorithm.predicate_duration.countKube Scheduler Scheduling Algorithm Predicate Evaluation CountScheduling algorithm predicate evaluation duration-
scheduler.scheduling.algorithm.predicate_duration.sumKube Scheduler Scheduling Algorithm Predicate Evaluation SumScheduling algorithm predicate evaluation duration-
scheduler.scheduling.algorithm.preemption_duration.countKube Scheduler Scheduling Algorithm Preemption Evaluation CountScheduling algorithm preemption evaluation duration-
scheduler.scheduling.algorithm.preemption_duration.sumKube Scheduler Scheduling Algorithm Preemption Evaluation SumScheduling algorithm preemption evaluation durationUnits
scheduler.scheduling.algorithm.priority_duration.countKube Scheduler Scheduling Algorithm Priority Evaluation CountScheduling algorithm priority evaluation duration-
scheduler.scheduling.algorithm.priority_duration.sumKube Scheduler Scheduling Algorithm Priority Evaluation SumScheduling algorithm priority evaluation duration-
scheduler.e2e.scheduling_duration.countKube Scheduler E2E Scheduling Duration Seconds CountTotal E2e scheduling latency in seconds (scheduling algorithm + binding)-
scheduler.e2e.scheduling_duration.sumKube Scheduler E2E Scheduling Duration Seconds SumE2e scheduling latency in seconds (scheduling algorithm + binding)-
scheduler.e2e.scheduling_latency.countKube Scheduler E2E Scheduling Latency Microseconds CountTotal E2e scheduling latency in microseconds (scheduling algorithm + binding)-
scheduler.e2e.scheduling_latency.sumKube Scheduler E2E Scheduling Latency Microseconds SumE2e scheduling latency in microseconds (scheduling algorithm + binding)-
scheduler.scheduling.scheduling_duration.countKube Scheduler Scheduling Duration Seconds CountScheduling latency in seconds split by sub-parts of the scheduling operation-
scheduler.scheduling.scheduling_duration.quantileKube Scheduler Scheduling Duration SecondsScheduling latency in seconds split by sub-parts of the scheduling operation-
scheduler.scheduling.scheduling_duration.sumKube Scheduler Scheduling Duration Seconds SumScheduling latency in seconds split by sub-parts of the scheduling operation-
scheduler.scheduling.scheduling_latency.countKube Scheduler Scheduling Latency Seconds CountScheduling latency in seconds split by sub-parts of the scheduling operation-
scheduler.scheduling.scheduling_latency.quantileKube Scheduler Scheduling Latency SecondsScheduling latency in seconds split by sub-parts of the scheduling operation-
scheduler.scheduling.scheduling_latency.sumKube Scheduler Scheduling Latency Seconds SumScheduling latency in seconds split by sub-parts of the scheduling operation-
scheduler.threadsKube Scheduler OS ThreadsNumber of OS threads created-
scheduler.volume_scheduling_duration.sumscheduler.volume_scheduling_duration.sum Kube Scheduler Volume Scheduling Duration Seconds SumVolume scheduling stage latency sum-
scheduler.volume_scheduling_duration.countKube Scheduler Volume Scheduling Duration Seconds CountVolume scheduling stage latency count-

Metrics for Server

Metrics for Server
MetricsDisplay NameDescription
metrics_server.go_gc_duration_seconds_sumGo GC Duration Seconds SumA summary of the GC invocation durations.
metrics_server.authenticated_user_requestsAuthenticated User RequestsCounter of authenticated requests broken out by username.
metrics_server.go_goroutinesGo GoroutinesNumber of goroutines that currently exist.
metrics_server.manager_tick_duration_sumManager Tick Duration SumThe total time spent collecting and storing metrics in seconds.
metrics_server.scraper_duration_countScraper Duration CountTime spent scraping sources in seconds.
metrics_server.scraper_duration_sumScraper Duration SumTime spent scraping sources in seconds.
metrics_server.scraper_last_timeScraper Last TimeLast time metrics-server performed a scrape since unix epoch in seconds.
metrics_server.go_gc_duration_seconds_quantileGo GC Duration Seconds QuantileA summary of the GC invocation durations in seconds.
metrics_server.kubelet_summary_request_duration_sumKubelet Summary Request Duration SumThe Kubelet summary request latencies in seconds.
metrics_server.kubelet_summary_scrapes_totalKubelet Summary Scrapes TotalTotal number of attempted Summary API scrapes done by Metrics Server.
metrics_server.manager_tick_duration_countManager Tick Duration CountThe total time in seconds spent collecting and storing metrics.
metrics_server.process_max_fdsProcess Max FdsMaximum number of open file descriptors.
metrics_server.process_open_fdsProcess Open FdsNumber of open file descriptors.
metrics_server.go_gc_duration_seconds_countGo GC Duration Seconds CountA summary of the GC invocation durations.
metrics_server.kubelet_summary_request_duration_countKubelet Summary Request Duration CountThe Kubelet summary request latencies in seconds.
metrics_server.process_cpu_seconds_totalProcess Cpu Seconds TotalTotal user and system CPU time spent in seconds.

Metrics for Kube API server

Metrics for Kube API Server
MetricsDisplay NameDescription
apiserver.go.threads.totalKube apiserver Go Threads TotalNumber of OS threads created
apiserver.authenticated.user.requestsKube apiserver Authenticated User RequestsCounter of authenticated requests broken out by username.
apiserver.http.requests.total.countKube apiserver HTTP Requests Total CountTotal number of HTTP requests made.
apiserver.authenticated.user.requests.countKube apiserver Authenticated User Requests CountCounter of authenticated requests broken out by username.
apiserver.dropped.requests.totalKube apiserver Dropped Requests TotalAccumulated number of requests dropped with Try-again-later response
apiserver.http.requests.totalKube apiserver HTTP Requests TotalTotal number of HTTP requests made.
apiserver.audit.event.totalKube apiserver Audit Event TotalCounter of audit events generated and sent to the audit backend.
apiserver.rest.client.requests.totalKube apiserver Rest Client Requests TotalNumber of HTTP requests, partitioned by status code, method, and host.
apiserver.request.countKube apiserver Request CountCounter of API server requests broken out for each verb, group, version, resource, scope, component, client, and HTTP response contentType and code.
apiserver.request.count.countKube apiserver Request Count CountCounter of API server requests broken out for each verb, group, version, resource, scope, component, client, and HTTP response contentType and code.
apiserver.dropped.requests.total.countKube apiserver Dropped Requests Total CountMonotonic count of requests dropped with Try-again-later response
apiserver.inflight.requestsKube apiserver Inflight RequestsMaximal number of currently used inflight request limit of this API server per request kind in the last second.
apiserver.go.goroutinesKube apiserver GoroutinesNumber of goroutines that currently exist.
apiserver.APIServiceRegistrationController.depthKube apiserver APIService Registration Controller DepthCurrent depth of workqueue: APIServiceRegistrationController
apiserver.etcd.object.countsKube apiserver ETCD Object CountsNumber of stored objects at the time of last check split by kind.
apiserver.rest.client.requests.total.countKube apiserver Rest Client Requests Total CountNumber of HTTP requests, partitioned by status code, method, and host.