Many Kubernetes Clusters
Posted: | More posts about kubernetesAs a reply to Zalando's "Running 80+ clusters in production" post, someone asked whether this (80+ clusters) would not defeat "the purpose"? My answer would not fit in a Tweet, so here it is as a blog post.
RnL writes on Twitter:
"I don’t know any reasoning behind running 80 clusters I am sure smarter people than me thought about it. However, wasn’t the purpose of schedulers to scale nodes and use resources efficiently?"
This was not the first question I got, here a LinkedIn comment by Daniel D'Abate regarding our Zalando Kubernetes story from 4 months ago:
"I have a question and your insight will be very helpful for our next steps. Why are you using 100 clusters instead of 1 huge cluster? Is it related with better cost isolation or are any other reasons involved?"
Zalando runs 100+ Kubernetes clusters on AWS. Each cluster runs in its own AWS account. We always create a pair of prod/non-prod clusters per "product community" [1], i.e. only half of our clusters (50+) are marked as "production" and have full 24x7 on-call support.
We decided to go with "many" (that's relative) clusters for various reasons:
Kubernetes has no strong story for multi-tenancy, having "smaller" clusters mitigates part of this problem
some infrastructure is shared per cluster, e.g. Prometheus and the Ingress proxy (Skipper) --- this requires appropriate (vertical) scaling of these components, smaller clusters make this easier to handle
the blast radius is limited --- anything going wrong in one cluster (outage, security incident, ..) does not necessarily affect the whole organization
cost attribution is easier (every cluster belongs to a cost center) [2]
the cluster (and its AWS account) serves as a natural trust boundary for access control (you can either deploy via CI/CD to a cluster or not)
It just overall better fits our world view. Smaller clusters would also be possible, but produce too much overhead [3] and do not leverage the advantages (in-cluster communication, better utilization).
You can find some Kubernetes Failure Stories mentioning cluster size, e.g.:
On Infrastructure at Scale: A Cascading Failure of Distributed Systems - Target
Breaking Kubernetes: How We Broke and Fixed our K8s Cluster - Civis Analytics
My colleague Sandor reminded me of the fact that AWS rate limits are another reason against "too large" clusters: all AWS integrations call AWS APIs and count towards these rate limits: Controller-Manager, kube2iam, External DNS, kube-ingress-aws-controller, kube-static-egress-controller, ..
To summarize:
you want big clusters for utilization, in-cluster communication, and reduced overhead
you want small clusters for isolation, reduced blast radius, and less challenges around scaling
We have 1200+ developers at Zalando, should they all share the same huge cluster? Probably not! Should every team (~4-8 people) get its own cluster? Probably not! So the truth is somewhere in the middle and the story might be different for your organization.
UPDATE 2019-04-29
I clarified that we always create a pair of clusters and added Sandor's point about AWS rate limits.
UPDATE 2019-05-03
We just hit our configured CIDR limits in one cluster, i.e. the cluster reached 250+ nodes and we had configured 10.2.0.0/16 for --cluster-cidr
and --node-cidr-mask-size=24
.
The Kubernetes controller failed with the error message "Error while processing Node Add/Delete: failed to allocate cidr: CIDR allocation failed; there are no remaining CIDRs left to allocate in the accepted range".
To mitigate hitting this CIDR/node limit, we made the node CIDR mask size configurable. Setting this to /25 would allow around ~510 nodes at the cost of limiting the number of pods per node to ~62.
See also the GKE docs on flexible Pod CIDR and the Kubernetes docs for large clusters (not so helpful).
So in short: tuning the CIDR ranges is another topic to take care of when designing your cluster size(s).