flanneld: failed to register network: failed to acquire lease: out of subnets

flannel uses a DHCP like system to give each peer in flannel a segment of your Network.

Let’s say that you have a 172.18.1.0/16 Network and you configured flannel to give each node a /24 SubnetLen. This gives you a maximum of 255 nodes on your flannel Network.

If you have a big Kubernetes or Openshift cluster and Autoscaling Group or Spots Instances, sooner or later you will find that sometimes new nodes will fail to join your cluster with the following error.

1
2
3
4
Jan 23 13:56:51 ip-172-27-13-41 flanneld[13207]: I0123 13:56:51.176167   13207 local_manager.go:179] Picking subnet in range 172.18.1.0 ... 172.18.255.0
Jan 23 13:56:51 ip-172-27-13-41 flanneld[13207]: E0123 13:56:51.176565 13207 network.go:102] failed to register network: failed to acquire lease: out of subnets
Jan 23 13:56:52 ip-172-27-13-41 flanneld[13207]: I0123 13:56:52.193929 13207 local_manager.go:179] Picking subnet in range 172.18.1.0 ... 172.18.255.0
Jan 23 13:56:52 ip-172-27-13-41 flanneld[13207]: E0123 13:56:52.194215 13207 network.go:102] failed to register network: failed to acquire lease: out of subnets

The not-so easy solution will be to decrease the SubnetLen value to something smaller than 24, in my use case since I have big nodes with a low pod count. 26 in my case is a more sensible value to have.

But this case will imply that you need to restart both flanneld and kubelet/openshift-node on all nodes.

A faster solution that might give you more time to plan this, is to force all flannel DHCP leases to renew, this will
free the unused but assigned DHCP leases.

You can use etcdctl to explore and edit keys in a etcd cluster (which is the backend for Kubernetes and flanneld )

Use etcdctl to find where your flannel keys are stored, in my case they where in /openshift.com/network/subnets

1
2
3
root@openshift-etcd057c45d5cf0d40f77:~# etcdctl ls /openshift.com/network/subnets
/openshift.com/network/subnets/172.18.73.0-24
/openshift.com/network/subnets/172.18.85.0-24

etcdctl can give you whats stored in that keys, but it won’t tell you when is going to expire:

1
2
root@openshift-etcd057c45d5cf0d40f77:~# etcdctl get /openshift.com/network/subnets/172.18.73.0-24
{"PublicIP":"172.27.11.216","BackendType":"vxlan","BackendData":{"VtepMAC":"d2:c1:06:90:6e:04"}}

The get command has an option to get more information:

1
2
3
4
5
6
root@openshift-etcd057c45d5cf0d40f77:~# etcdctl -o extended get /openshift.com/network/subnets/172.18.73.0-24
Key: /openshift.com/network/subnets/172.18.73.0-24
Created-Index: 94470178
Modified-Index: 94470178
TTL: 85873
Index: 94481824

Flanneld by default sets a 24 hours TTL for each key in etcd, at this time of writing this seems to be not configurable or at least my google-fu didn’t find me anything useful.

https://github.com/coreos/flannel/issues/875

To free the unused leases by forcing nodes to refresh their leases you can edit the TTL on a particular key:

1
etcdctl set -ttl 120 /openshift.com/network/subnets/172.18.73.0-24 $(etcdctl get /openshift.com/network/subnets/172.18.73.0-24)

To do this on all keys:

1
for i in $(etcdctl ls /openshift.com/network/subnets/); do etcdctl set -ttl 120 $i $(etcdctl get $i);done