Problem Overview

The k3s service/Pods on Nextgen Gateway virtual machines (VMs) may restart repeatedly due to time synchronization issues. This is especially prevalent in multi-node gateway environments, where even a slight discrepancy in system time between nodes can cause instability and cluster failures.

How to identify?

Step 1: Check k3s Logs on the Host

Run the following command on the host machine (VM or node) where the gateway is deployed:

journalctl -u k3s

This will display logs from the k3s systemd service, including warnings and info logs related to cluster behavior.

Step 2: Look for “clock drift” Warnings

You are specifically looking for log lines that mention:

"prober found high clock drift"

This message indicates that a node’s clock differs significantly from its peers, which may lead to raft communication issues and cluster instability.

Sample Log Output

Here is a sample snippet indicating clock drift issues:

May 29 14:32:09 az01loprmnwp01 k3s[3436081]: {"level":"info","ts":"2025-05-29T14:32:09.885588Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":33171221}
May 29 14:32:09 az01loprmnwp01 k3s[3436081]: {"level":"info","ts":"2025-05-29T14:32:09.925854Z","caller":"mvcc/kvstore_compaction.go:68","msg":"finished scheduled compaction","compact-revision":33171221,"took":"39.719829ms","hash":953028177,"current-db-size-bytes":31735808,"current-db-size":"32 MB","current-db-size-in-use-bytes":6545408,"current-db-size-in-use":"6.5 MB"}
May 29 14:32:09 az01loprmnwp01 k3s[3436081]: {"level":"info","ts":"2025-05-29T14:32:09.925886Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":953028177,"revision":33171221,"compact-revision":33169927}
May 29 14:32:32 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:32:32.812427Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"f923a016411d53bc","clock-drift":"59.311189639s","rtt":"4.147522ms"}
May 29 14:32:32 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:32:32.812443Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"f923a016411d53bc","clock-drift":"59.312742337s","rtt":"1.142367ms"}
May 29 14:33:02 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:33:02.813282Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"f923a016411d53bc","clock-drift":"59.311002896s","rtt":"5.108188ms"}
May 29 14:33:02 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:33:02.813269Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"f923a016411d53bc","clock-drift":"59.313569613s","rtt":"1.125167ms"}
May 29 14:33:32 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:33:32.813373Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"f923a016411d53bc","clock-drift":"59.313391202s","rtt":"1.104257ms"}
May 29 14:33:32 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:33:32.813385Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"f923a016411d53bc","clock-drift":"59.311738912s","rtt":"4.172173ms"}
May 29 14:34:02 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:34:02.813682Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"f923a016411d53bc","clock-drift":"59.311778609s","rtt":"4.922604ms"}
May 29 14:34:02 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:34:02.813696Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"f923a016411d53bc","clock-drift":"59.313586078s","rtt":"1.047753ms"}
May 29 14:34:32 az01loprmnwp01 k3s[3436081]: {"level":"warn","ts":"2025-05-29T14:34:32.814117Z","caller":"rafthttp/probing_status.go:82","msg":"prober found high clock drift","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"f923a016411d53bc","clock-drift":"59.312262985s","rtt":"4.236741ms"}

You may see several similar logs like:

  • "ROUND_TRIPPER_RAFT_MESSAGE"
  • "ROUND_TRIPPER_SNAPSHOT"

Each of these messages indicates a node’s clock is drifting by nearly 60 seconds, which is unacceptable for a distributed cluster.

Root Cause

  • Kubernetes (k3s) relies on synchronized time across nodes for proper communication and coordination.
  • If the system clocks are not in sync, nodes will misinterpret each other’s states, leading to repeated restarts and potential deployment issues.

Step-by-Step Solution

Prerequisites

  • Public NTP server: Ensure UDP port 123 is allowed on the network. See prerequisites
  • Custom NTP server: Ensure the NTP server is reachable from the Gateway, then proceed directly to Step 2.

1. Verify Time Synchronization Status

Log in to each Gateway VM and run the following command:

timedatectl

Sample output:

Local time: Mon 2025-07-14 17:37:39 UTC
Universal time: Mon 2025-07-14 17:37:39 UTC
RTC time: Mon 2025-07-14 17:37:39
Time zone: Etc/UTC (UTC, +0000)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no

Key Fields to Check:

  • System clock synchronized: yes → Time sync is working
  • NTP service: active → NTP service is running

If System clock synchronized is no, proceed to configure or troubleshoot the time synchronization service.

2. Configure or Update NTP Servers

By default, Ubuntu uses its own NTP servers. However, if the system clock is not synchronized, or if custom/internal NTP servers are needed, follow the steps below:

a. Enable NTP Sync (if disabled)

If the following is shown:

System clock synchronized: no

Enable NTP sync using:

sudo timedatectl set-ntp true

Re-run timedatectl to confirm the sync is active:

timedatectl

b. Configure Custom NTP Servers

By default, Ubuntu uses its own NTP servers. However, if you have custom NTP servers, you can configure them by editing the systemd-timesyncd configuration.

Edit Configuration File:

sudo vi /etc/systemd/timesyncd.conf

Sample Configuration:

[Time]
NTP=216.239.35.4 45.32.199.189
FallbackNTP=ntp.ubuntu.com

You can replace the IPs with your organization’s NTP servers as needed.

3. Restart Time Sync Service

After making changes to the configuration:

sudo systemctl restart systemd-timesyncd

Verify the sync status again:

timedatectl

Ensure System clock synchronized is yes.

4. Manual Time Update (Only If NTP Fails)

If the system cannot access NTP servers (e.g., due to network issues or firewall blocks), manually set the time.

Check Current Time:

date

Set Correct Time (Format: YYYY-MM-DD HH:MM:SS):

sudo date -s "2025-07-14 17:37:39"

5. Restart k3s Service

After time synchronization is confirmed or corrected, restart the k3s service:

sudo service k3s restart

This ensures the Kubernetes components align with the new system time.

Best Practices

  • Always enable automatic time synchronization on all nodes.
  • Use the same NTP source for all VMs in a multi-node setup.