Basic info about kube-aws and problem
We’re running our clusters in
eu-west-1 region and at some point we started getting problems with syncing time on our node. One day it resulted outage for our ETCd instances. After some investigation I found out that sometimes
timesyncd is not able to synchronize time with NTP pool server:
Jul 04 13:02:51 ip-172-16-0-79.eu-west-1.compute.internal systemd-timesyncd: Timed out waiting for reply from 220.127.116.11:123 (0.coreos.pool.ntp.org). Jul 04 13:02:51 ip-172-16-0-79.eu-west-1.compute.internal systemd-timesyncd: Synchronized to time server 18.104.22.168:123 (0.coreos.pool.ntp.org).
This is only one example of such ntp servers. So we decided to go with AWS NTP server. Because we were not sure if CoreOS would use NTP server from DHCP options and didn’t want to change DHCP options, we just optimized our
cluster.yaml config for our cluster.
To set manually NTP server for timesyncd we need to update part of
cloud-init configuration. For this we need to add two sections to
worker.nodePools.<nodename> in our config.
Here is what we added in the end:
customFiles: - path: /etc/systemd/timesyncd.conf permissions: 644 content: | [Time] NTP=169.254.169.123 customSystemdUnits: - name: systemd-timesyncd.service command: restart
customFiles is used to set AWS NTP server in
customSystemdUnits is used to restart
systemd-timesyncd service after we updated config.
After that everything should be ok. At least for now we don’t see any problems in our monitoring related to NTP servers.