The status of a node in Kubernetes is a critical aspect of managing a Kubernetes cluster. In this article, we'll cover the basics of monitoring and maintaining node status to ensure a healthy and stable cluster.
A Node's status contains the following information:
You can use kubectl to view a Node's status and other details:
kubectl describe node <insert-node-name-here>
Each section of the output is described below.
The usage of these fields varies depending on your cloud provider or bare metal configuration.
--hostname-override parameter.The conditions field describes the status of all Running nodes. Examples of conditions include:
| Node Condition | Description |
|---|---|
Ready |
True if the node is healthy and ready to accept pods, False if the node is not healthy and is not accepting pods, and Unknown if the node controller has not heard from the node in the last node-monitor-grace-period (default is 50 seconds) |
DiskPressure |
True if pressure exists on the disk size—that is, if the disk capacity is low; otherwise False |
MemoryPressure |
True if pressure exists on the node memory—that is, if the node memory is low; otherwise False |
PIDPressure |
True if pressure exists on the processes—that is, if there are too many processes on the node; otherwise False |
NetworkUnavailable |
True if the network for the node is not correctly configured, otherwise False |
SchedulingDisabled. SchedulingDisabled is not a Condition in the Kubernetes API; instead,
cordoned nodes are marked Unschedulable in their spec.In the Kubernetes API, a node's condition is represented as part of the .status
of the Node resource. For example, the following JSON structure describes a healthy node:
"conditions": [
{
"type": "Ready",
"status": "True",
"reason": "KubeletReady",
"message": "kubelet is posting ready status",
"lastHeartbeatTime": "2019-06-05T18:38:35Z",
"lastTransitionTime": "2019-06-05T11:41:27Z"
}
]
When problems occur on nodes, the Kubernetes control plane automatically creates
taints that match the conditions
affecting the node. An example of this is when the status of the Ready condition
remains Unknown or False for longer than the kube-controller-manager's NodeMonitorGracePeriod,
which defaults to 50 seconds. This will cause either an node.kubernetes.io/unreachable taint, for an Unknown status,
or a node.kubernetes.io/not-ready taint, for a False status, to be added to the Node.
These taints affect pending pods as the scheduler takes the Node's taints into consideration when
assigning a pod to a Node. Existing pods scheduled to the node may be evicted due to the application
of NoExecute taints. Pods may also have tolerations that let
them schedule to and continue running on a Node even though it has a specific taint.
See Taint Based Evictions and Taint Nodes by Condition for more details.
Describes the resources available on the node: CPU, memory, and the maximum number of pods that can be scheduled onto the node.
The fields in the capacity block indicate the total amount of resources that a Node has. The allocatable block indicates the amount of resources on a Node that is available to be consumed by normal Pods.
You may read more about capacity and allocatable resources while learning how to reserve compute resources on a Node.
Describes general information about the node, such as kernel version, Kubernetes version (kubelet and kube-proxy version), container runtime details, and which operating system the node uses. The kubelet gathers this information from the node and publishes it into the Kubernetes API.
Kubernetes v1.35 [alpha](disabled by default)This field lists specific Kubernetes features that are currently enabled on the
node's kubelet via feature gates.
The features are reported by the kubelet as a list of strings in the
.status.declaredFeatures field of the Node object.
This field is intended for newer features under active development; features that have graduated and no longer require a feature gate are considered baseline and are not declared in this field. This reflects the enablement of Kubernetes features, not the underlying operating system or kernel capabilities of the node.
See Node Declared Features for more details.
Heartbeats, sent by Kubernetes nodes, help your cluster determine the availability of each node, and to take action when failures are detected.
For nodes there are two forms of heartbeats:
.status of a Nodekube-node-lease
namespace.
Each Node has an associated Lease object.Compared to updates to .status of a Node, a Lease is a lightweight resource.
Using Leases for heartbeats reduces the performance impact of these updates
for large clusters.
The kubelet is responsible for creating and updating the .status of Nodes,
and for updating their related Leases.
.status either when there is change in status
or if there has been no update for a configured interval. The default interval
for .status updates to Nodes is 5 minutes, which is much longer than the 40
second default timeout for unreachable nodes..status. If the Lease update fails, the kubelet retries,
using exponential backoff that starts at 200 milliseconds and capped at 7 seconds.