Problem Overview
The k3s service/Pods on Nextgen Gateway virtual machines (VMs) may restart repeatedly due to time synchronization issues. This is especially prevalent in multi-node gateway environments, where even a slight discrepancy in system time between nodes can cause instability and cluster failures.
How to identify?
Step 1: Check k3s Logs on the Host
Run the following command on the host machine (VM or node) where the gateway is deployed:
journalctl -u k3sThis will display logs from the k3s systemd service, including warnings and info logs related to cluster behavior.
Step 2: Look for “clock drift” Warnings
You are specifically looking for log lines that mention:
"prober found high clock drift"This message indicates that a node’s clock differs significantly from its peers, which may lead to raft communication issues and cluster instability.
Sample Log Output
Here is a sample snippet indicating clock drift issues:
May 29 14:32:09 az01loprmnwp01 k3s[3436081]: {"level":"info","ts":"2025-05-29T14:32:09.885588Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":33171221}
May 29 14:32:09 az01loprmnwp01 k3s[3436081]: {"level":"info","ts":"2025-05-29T14:32:09.925854Z","caller":"mvcc/kvstore_compaction.go:68","msg":"finished scheduled compaction","compact-revision":33171221,"took":"39.719829ms","hash":953028177,"current-db-size-bytes":31735808,"current-db-size":"32 MB","current-db-size-in-use-bytes":6545408,"current-db-size-in-use":"6.5 MB"}
May 29 14:32:09 az01loprmnwp01 k3s[3436081]: {"level":"info","ts":"2025-05-29T14:32:09.925886Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":953028177,"revision":33171221,"compact-revision":33169927}
May 29 14:32:32 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:32:32.812427Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"f923a016411d53bc","clock-drift":"59.311189639s","rtt":"4.147522ms"}
May 29 14:32:32 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:32:32.812443Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"f923a016411d53bc","clock-drift":"59.312742337s","rtt":"1.142367ms"}
May 29 14:33:02 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:33:02.813282Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"f923a016411d53bc","clock-drift":"59.311002896s","rtt":"5.108188ms"}
May 29 14:33:02 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:33:02.813269Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"f923a016411d53bc","clock-drift":"59.313569613s","rtt":"1.125167ms"}
May 29 14:33:32 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:33:32.813373Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"f923a016411d53bc","clock-drift":"59.313391202s","rtt":"1.104257ms"}
May 29 14:33:32 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:33:32.813385Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"f923a016411d53bc","clock-drift":"59.311738912s","rtt":"4.172173ms"}
May 29 14:34:02 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:34:02.813682Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"f923a016411d53bc","clock-drift":"59.311778609s","rtt":"4.922604ms"}
May 29 14:34:02 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:34:02.813696Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"f923a016411d53bc","clock-drift":"59.313586078s","rtt":"1.047753ms"}
May 29 14:34:32 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:34:32.814117Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"f923a016411d53bc","clock-drift":"59.312262985s","rtt":"4.236741ms"}You may see several similar logs like:
"ROUND_TRIPPER_RAFT_MESSAGE""ROUND_TRIPPER_SNAPSHOT"
Each of these messages indicates a node’s clock is drifting by nearly 60 seconds, which is unacceptable for a distributed cluster.
Root Cause
- Kubernetes (
k3s) relies on synchronized time across nodes for proper communication and coordination. - If the system clocks are not in sync, nodes will misinterpret each other’s states, leading to repeated restarts and potential deployment issues.
Step-by-Step Solution
Prerequisites
- Public NTP server: Ensure UDP port 123 is allowed on the network. See prerequisites
- Custom NTP server: Ensure the NTP server is reachable from the Gateway, then proceed directly to Step 2.
1. Verify Time Synchronization Status
Log in to each Gateway VM and run the following command:
timedatectlSample output:
Local time: Mon 2025-07-14 17:37:39 UTC
Universal time: Mon 2025-07-14 17:37:39 UTC
RTC time: Mon 2025-07-14 17:37:39
Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
NTP service: active
RTC in local TZ: noKey Fields to Check:
- System clock synchronized:
yes→ Time sync is working - NTP service:
active→ NTP service is running
If System clock synchronized is no, proceed to configure or troubleshoot the time synchronization service.
2. Configure or Update NTP Servers
By default, Ubuntu uses its own NTP servers. However, if the system clock is not synchronized, or if custom/internal NTP servers are needed, follow the steps below:
a. Enable NTP Sync (if disabled)
If the following is shown:
System clock synchronized: noEnable NTP sync using:
sudo timedatectl set-ntp trueRe-run timedatectl to confirm the sync is active:
timedatectlb. Configure Custom NTP Servers
By default, Ubuntu uses its own NTP servers. However, if you have custom NTP servers, you can configure them by editing the systemd-timesyncd configuration.
Edit Configuration File:
sudo vi /etc/systemd/timesyncd.confSample Configuration:
[Time]
NTP=216.239.35.4 45.32.199.189
FallbackNTP=ntp.ubuntu.comYou can replace the IPs with your organization’s NTP servers as needed.
3. Restart Time Sync Service
After making changes to the configuration:
sudo systemctl restart systemd-timesyncdVerify the sync status again:
timedatectlEnsure System clock synchronized is yes.
4. Manual Time Update (Only If NTP Fails)
If the system cannot access NTP servers (e.g., due to network issues or firewall blocks), manually set the time.
Check Current Time:
dateSet Correct Time (Format: YYYY-MM-DD HH:MM:SS):
sudo date -s "2025-07-14 17:37:39"Note
This is a temporary workaround. NTP sync should be re-enabled once connectivity is restored.5. Restart k3s Service
After time synchronization is confirmed or corrected, restart the k3s service:
sudo service k3s restartThis ensures the Kubernetes components align with the new system time.
Best Practices
- Always enable automatic time synchronization on all nodes.
- Use the same NTP source for all VMs in a multi-node setup.
Info
If you see persistent clock drift in logs despite NTP being active:
- You may be running in an environment where time sync is blocked by firewall
- Or VM snapshot/resume has desynced the guest OS clock
- Manually setting time should be a temporary fix only; always aim for automated NTP sync across nodes