Basic info about kube-aws and problem
Kube-aws is an open-source tool to provision Kubernetes cluster in Amazon AWS cloud. It provision ec2 instances with CoreOS to run Kubernetes.
We’re running our clusters in eu-west-1
region and at some point we started getting problems with syncing time on our node. One day it resulted outage for our ETCd instances. After some investigation I found out that sometimes timesyncd
is not able to synchronize time with NTP pool server:
Jul 04 13:02:51 ip-172-16-0-79.eu-west-1.compute.internal systemd-timesyncd[769]: Timed out waiting for reply from 52.209.118.149:123 (0.coreos.pool.ntp.org).
Jul 04 13:02:51 ip-172-16-0-79.eu-west-1.compute.internal systemd-timesyncd[769]: Synchronized to time server 54.229.222.210:123 (0.coreos.pool.ntp.org).
This is only one example of such ntp servers. So we decided to go with AWS NTP server. Because we were not sure if CoreOS would use NTP server from DHCP options and didn’t want to change DHCP options, we just optimized our cluster.yaml
config for our cluster.
Solution
To set manually NTP server for timesyncd we need to update part of cloud-init
configuration. For this we need to add two sections to etcd
, controller
and worker.nodePools.<nodename>
in our config.
Here is what we added in the end:
customFiles:
- path: /etc/systemd/timesyncd.conf
permissions: 644
content: |
[Time]
NTP=169.254.169.123
customSystemdUnits:
- name: systemd-timesyncd.service
command: restart
customFiles
is used to set AWS NTP server in timesyncd
configuration.
customSystemdUnits
is used to restart systemd-timesyncd
service after we updated config.
After that everything should be ok. At least for now we don’t see any problems in our monitoring related to NTP servers.