Node Pressure Eviction in Kubernetes: When Your Node Says "Enough is Enough"
What Is Node-Pressure Eviction?
Node-pressure eviction is the process where the kubelet (the Kubernetes agent on each node) proactively terminates Pods to reclaim resources on that node. The goal is to prevent the node from becoming unstable or completely unresponsive when key resources are running low.
Key points:
- It is automatic and driven by the kubelet on the node.
- It is different from API-initiated eviction (like
kubectl drain) and does not honor PodDisruptionBudgets. - It focuses on protecting node health first, individual Pods second.
When eviction happens, the kubelet marks the selected Pods as Failed and terminates them so that the scheduler can place new replicas on healthier nodes.
What Triggers Node-Pressure Eviction?
The kubelet continuously monitors a set of resource signals for each node. When those signals cross certain thresholds, the node is considered under pressure, and eviction can begin.
Common pressure types:
Memory pressure
- Signal: memory.available
- Node condition: MemoryPressure
- Example: Available memory on the node falls below a configured limit, like memory.available<500Mi.
Disk pressure
- Signals: nodefs.available, nodefs.inodesFree, imagefs.available, imagefs.inodesFree
- Node condition: DiskPressure
- Example: Root filesystem free space drops below a threshold (e.g., nodefs.available<10%).
PID pressure
- Signal: pid.available
- Node condition: PIDPressure
- Example: The node is running so many processes that there are too few PIDs left for new workloads.
These thresholds are configured as eviction thresholds. When a signal crosses its threshold, the kubelet knows it has to reclaim resources by evicting Pods.
Soft vs Hard Eviction Thresholds
Kubernetes supports two kinds of eviction thresholds: soft and hard. Understanding them helps you tune how aggressive evictions should be.
Soft eviction thresholds
- Example: memory.available<500Mi with a grace period like 1m30s.
- The kubelet waits for a configured grace period before evicting Pods.
- Evictions use a non-zero termination grace period.
Hard eviction thresholds
- Example: memory.available<100Mi or nodefs.available<10%.
- Trigger immediate eviction once crossed, with a 0s grace period by default.
- Designed as a last-resort safety net to save the node from crashing.
You configure these thresholds in the kubelet's configuration. Here is a typical kubelet config snippet:
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
evictionHard:
memory.available: "100Mi"
nodefs.available: "10%"
evictionSoft:
memory.available: "500Mi"
evictionSoftGracePeriod:
memory.available: "1m30s"
How Kubernetes Chooses Which Pods to Evict
Once a node is under pressure, the kubelet must pick which Pods to remove. The selection is not random: it relies on Pod priority, QoS class, and a few other factors.
1. QoS Classes
Pods are assigned a QoS (Quality of Service) class based on their resource requests and limits:
- Guaranteed - All containers have memory and CPU requests equal to their limits. Evicted last under memory pressure.
- Burstable - At least one container has requests set, but not all are equal to limits. Evicted after BestEffort, before Guaranteed.
- BestEffort - No resource requests or limits specified. Evicted first when memory runs low.
2. Priority
Kubernetes Pod Priority and Preemption also influence eviction order. Higher-priority Pods are protected; lower-priority Pods are more likely to be evicted first.
3. Age and Usage
Factors like Pod age, recent resource usage, and critical system Pods can influence eviction decisions. System-critical Pods with special annotations or priority classes are usually protected from eviction.
Node Conditions and Oscillation
When a threshold is crossed, the kubelet sets the corresponding node condition (like MemoryPressure: True). The control plane and scheduler can then see that the node is unhealthy and may avoid placing new Pods there.
However, nodes can oscillate around soft thresholds: they repeatedly go above and below the limit in a short time. That can cause node conditions to flap and lead to poor eviction decisions. To prevent this, Kubernetes provides eviction-pressure-transition-period, which forces a minimum time before flipping a node condition again (default is 5 minutes).
Example: Memory Pressure in Action
Consider a node with 10 GiB of memory. You configure:
evictionSoft: memory.available<500Miwith a 90s grace period.evictionHard: memory.available<100Mi.
If your workloads spike and free memory drops to 400 MiB:
- The soft threshold is crossed, so the kubelet marks the node under memory pressure and starts the soft eviction countdown.
- If memory stays below 500 MiB for 90 seconds, the kubelet begins evicting low-priority, BestEffort Pods first.
- If memory plunges below 100 MiB, hard eviction kicks in, and Pods can be terminated almost immediately to save the node.
How to Reduce Unexpected Evictions
To survive Node-pressure evictions in production, you need both good configuration and good application design. Here are practical tips:
- Set realistic resource requests and limits - Avoid running everything as BestEffort; give critical workloads proper requests and limits so they get at least Burstable, preferably Guaranteed QoS.
- Use Pod Priority wisely - Assign higher priority to business-critical services, lower to batch or test workloads.
- Tune kubelet eviction thresholds - Align soft and hard thresholds with your node size and typical usage patterns, not just defaults.
- Monitor node conditions and evictions - Watch
MemoryPressure,DiskPressure,PIDPressure, and track Podreason: Evictedin your monitoring stack. - Clean up disk usage - Use log rotation and image garbage collection to avoid disk pressure evictions.
Quick Reference Table
| Aspect | What It Means | Example / Notes |
|---|---|---|
| Node-pressure eviction | Kubelet evicts Pods to protect node when resources are low. | Automatically triggered on resource pressure. |
| Key signals | memory.available, nodefs.available, imagefs.available, pid.available |
Mapped to Memory/Disk/PID pressure conditions. |
| Soft threshold | Eviction after grace period if pressure continues. | Example: memory.available<500Mi for 1m30s. |
| Hard threshold | Immediate eviction with 0s grace by default. | Example: memory.available<100Mi. |
| QoS eviction order | BestEffort -> Burstable -> Guaranteed. | BestEffort gets evicted first under memory pressure. |
| Priority impact | Lower priority Pods evicted before higher priority. | Protects critical workloads. |
| Node condition flapping | Rapid toggling of pressure conditions. | Controlled with eviction-pressure-transition-period. |
| Typical eviction reason | Pod status shows Evicted with explanation in events. |
Check kubectl describe pod. |
Wrapping Up
Node-pressure eviction is Kubernetes' way of saying enough is enough. By understanding the triggers, thresholds, and eviction priorities, you can design your workloads and cluster configurations to handle pressure gracefully. Set your QoS classes wisely, monitor your node conditions, and tune those eviction thresholds to match your environment. Your Pods (and your on-call pager) will thank you.
Have you ever had a Pod mysteriously evicted in production? Drop a comment below and share your war stories!
No comments