Basic info about kube-aws and problem

Kube-aws is an open-source tool to provision Kubernetes cluster in Amazon AWS cloud. It provision ec2 instances with CoreOS to run Kubernetes.

We’re running our clusters in eu-west-1 region and at some point we started getting problems with syncing time on our node. One day it resulted outage for our ETCd instances. After some investigation I found out that sometimes timesyncd is not able to synchronize time with NTP pool server:

Jul 04 13:02:51 ip-172-16-0-79.eu-west-1.compute.internal systemd-timesyncd[769]: Timed out waiting for reply from 52.209.118.149:123 (0.coreos.pool.ntp.org).
Jul 04 13:02:51 ip-172-16-0-79.eu-west-1.compute.internal systemd-timesyncd[769]: Synchronized to time server 54.229.222.210:123 (0.coreos.pool.ntp.org).

This is only one example of such ntp servers. So we decided to go with AWS NTP server. Because we were not sure if CoreOS would use NTP server from DHCP options and didn’t want to change DHCP options, we just optimized our cluster.yaml config for our cluster.

Solution

To set manually NTP server for timesyncd we need to update part of cloud-init configuration. For this we need to add two sections to etcd, controller and worker.nodePools.<nodename> in our config.

Here is what we added in the end:

  customFiles:
    - path: /etc/systemd/timesyncd.conf
      permissions: 644
      content: |
        [Time]
        NTP=169.254.169.123        
  customSystemdUnits:
    - name: systemd-timesyncd.service
      command: restart

customFiles is used to set AWS NTP server in timesyncd configuration.

customSystemdUnits is used to restart systemd-timesyncd service after we updated config.

After that everything should be ok. At least for now we don’t see any problems in our monitoring related to NTP servers.