hello! --- So, to kick things off my name is Chris Nesbitt-Smith, I'm based in London and currently work with some well known brands like learnk8s, control plane, and various bits of UK Government I'm also a tinkerer of open source stuff. I've using and abusing Kubernetes in production since it was 0.4, believe me when I say its been a journey! I've definitely got the war wounds to show for it. We should have time for questions and your best heckles at the end, but if we run out of time or you're not watching this live, then please find me on LinkedIn or in the loft labs slack. Right lets get going --- We all came here in search of minimizing wasted overheads in our compute estate, right? Well, lets go with that, and pretend it wasn't just an exercise in technical naval gazing and CV enhancement --- So Ideally we'll want to isolate one workload from another --- We'll want super ease of managing all of those isolated things, because we need that collective spare time to think up new exciting abstractions of running processes on computers or a new job title? Shout out to my DevSecOps, SREs, and Platform Engineers! --- And our ability to reduce waste on the hardware should give us some cost savings, which we'll all need to buy the Apple Vision Pro for some DevAROps. So how do we realize those dreams --- Here comes the science bit. Kubernetes embraces the idea of treating your servers as a single unit and abstracts away how individual computer resources operate. --- Imagine having three servers. --- You can use one of those to install the Kubernetes control plane. --- The remaining can join the cluster as worker nodes. --- Once the setup is completed, the physical servers are mostly abstracted from you. You deal with Kubernetes as a single unit. --- When you want to deploy a container, you submit your request to the cluster. Kubernetes takes care of executing `docker run` and selecting the best server for the job. --- The same happens for --- all other containers. --- For every deployment, Kubernetes finds the best place to run the application. --- Out the box Kubernetes does give you namespaces --- So if you have three applications or teams --- you could create three Kubernetes namespaces and place one in each. and you'd be forgiven for thinking that namespaces might refer to a lower level implementation kernel capability that would provide you some isolation --- However a Kubernetes namespace is just a logical partition of the cluster that is used to group Kubernetes objects together. It doesn't offer any isolation, it's just a convenient way to categorize and compartmentalize objects, and has some performance benefits in scheduling by limiting the scope. --- Lets explore how we can apply namespaces to start with, since its a useful baseline, and how many folk have done it in the past, at least before managing multiple clusters became a thing when we put down Kubernetes the hard way (thanks Kelsey) and remembered we can at the very least script things --- but how do we carve up estate. You could split it up by your three teams --- Or perhaps by lifecycle stage, say dev, test prod --- but what about when your teams need multiple lifecycle stages --- you start to see --- this getting quite busy --- so how big can this get --- --- and another with 50 tenants --- assuming they're all serving some http traffic, you'll probably use an ingress controller like nginx --- which is a single pod that can serve your 10 tenants --- just as easily as your 50 --- Notice how that didn't mean we needed to scale to have 30 nginx ingress for our 10 tenant cluster --- or worse, 150 for our 50 tenant cluster --- so what are the cost implications of this choice --- If we look at the ingress-nginx helm chart config, you'll see we've got a resource request of 100 millicores and 90mb of ram --- Lets put scaling for the volume of traffic aside for a minute and assume we're just looking at the standing still cost. while the numbers might seem low in isolation they are likely going to turn into papercuts --- so our 10 tenant 3 lifecycle stage cluster looks like this --- and our 50 tenant one looks even worse thats just math, and compute is cheap so who cares --- Well someone might when get the bill for your ingress node requirement without it actually doing anything real. 248 bucks a month, just to stand still, before you add monitoring and logging and other stuff. --- So a single ingress might look like a good idea --- but what happens when one of your snowflake teams wants a configuration option that isn't readily made available by the ingress annotations configuration such as tweaking the keep alive. --- But others don't, sure you could use the configuration snippet annotation to tweak this, but thats quite a blunt instrument you shouldn't really give all your teams since it allows any one of them to break ingress for everyone --- So you're stuck having to define that configuration option at a full cluster level --- consequently, those that didn't want that configuration have it forced upon them --- If we look at our patent pending multi-tenancy o'meter you'll see that the configuration of a single shared ingress is tracking nice and cheap --- but lets face it, when was the last time ingress was the only thing you installed in a cluster, those Custom Resources won't define themselves after all --- Something like this likely looks familiar, logos as far as the eye can see. After all, our DevSecFinOps CV needed something new and shiny added to it. --- Lets take prometheus for example, well you don't normally run it like this where you have a shared one for every application workload --- So you're cluster is going to be looking something more like this --- taking a look at our multi-tenancy-o-meter, we can see the costs of this sort of practice accruing --- So, since we're logo loving devops, lets see what more complexity we can add to manage it more easily and give ourselves enough time to anxiously refresh the CNCF landscape for new logos to find what we need to install next. --- first tool we'll look at is the hierarchical namespace controller --- We'll then look at vCluster from our kind friends loft labs, the v stands for virtual, we'll look at what that means in reality since while novel doesn't really align with the word virtual in other computing senses. --- And lastly we'll look at karmada as a means of managing many clusters --- hierarchical Namespace Controller as the name suggests is a controller --- that allows you to define a root, or parent namespace --- and then define a child tree off the back of that --- the rabbit hole can go as deep as you like --- This is the first of my three demos today, because Daniele and Salman who did the previous two talks thought it'd be really funny if I demoed all the hard stuff. So join me in prayers to the demo gods while I try not to mess this up. cd /Users/cns/httpdocs/learnk8s_loft export PATH=$PATH:${PWD} kind create cluster kubectl apply -f https://github.com/kubernetes-sigs/hierarchical-namespaces/releases/download/v1.1.0/default.yaml # setup kubectl -n hnc-system patch deployments.apps hnc-controller-manager --patch-file hnc-patch.yaml # setup kubectl create ns parent kubectl hns create tenant-1 -n parent kubectl hns config describe kubectl hns config set-resource resourcequota --mode Propagate kubectl hns config describe cat resource-quota.yaml kubectl apply -f resource-quota.yaml kubectl get resourcequotas -A cat hnc-50.sh ./hnc-50.sh kubectl get resourcequotas -A kubectl hns set parent --allowCascadingDeletion kubectl delete ns parent kubectl delete -f https://github.com/kubernetes-sigs/hierarchical-namespaces/releases/download/v1.1.0/default.yaml # setup --- As you can see the namespaces aren't truly nested, its all smoke and mirrors --- theres a single controller doing the templating work for you --- but at the end of the day its just regular namespaces --- Our multi-tenancy-o-meter shows its cheap since the standing still cost of the tenants is zero since theres no workload in them. --- in fact, since these are just empty namespaces, a construct stored in etcd theres absolutely no inherent cost associated with it --- So what practical use are these namespaces --- lets have a look at roles so if our team has some roles that they can read and write pods and pvs in the root namespace --- we could cascade those role bindings to all the children --- and we could do the same for many teams --- all allowing them access to the pv --- Our next approach of isolating the control plane --- which is what vCluster does, where by a namespace is created --- And inside that we run a small k3s instance. Those that aren't familiar with k3s, its a lightweight Kubernetes distribution that by default uses sql lite as it's database backend instead of etcd and can lend itself to working very well running as a single binary --- so we can hit the api server with kubectl, and create a pod but this is just an api server, theres no nodes to run the workload --- whats more its the wrong database for the real scheduler to pick it up, so what can we do? --- We could copy the pod spec --- which would magically cause the pod to be scheduled somewhere on the parent cluster --- and thats exactly what the vcluster syncer does, you select the resources and direction to sync them in between the clusters --- and with global resources like an ingress controller, crds, I can just sync the primitive resources like pods, configmaps, secrets, pvs and so on --- so if I apply my pv to the tenant cluster --- it'll get stored in the tenant control plane --- and then sync'd to the parent --- and allow the pvc to bind to that pv and the pods to connect to it --- DEMO cat pv-values.yaml vcluster create tenant1 --upgrade -f pv-values.yaml cat pv.yaml kubectl apply -f pv.yaml vcluster disconnect kubectl get pv vcluster create tenant2 --upgrade -f pv-values.yaml kubectl apply -f pv.yaml kubectl get pv vcluster disconnect kubectl get pv vcluster delete tenant1 vcluster delete tenant2 kubectl delete pv --all cat vcluster-50.sh ./vcluster-50.sh kubectl get ns kubectl get pod -A --- so this gives us a kinda nested control plane --- with a sense that we can give what feels like cluster admin out selectively to tenants, if you squint its a virtual cluster --- but theres a single parent cluster that exists, and can have controls and configuration set there, and we don't have heaps of fully independent clusters, many highly available. --- our multi-tenancy-o-meter is twitching but its still reasonably cheap --- We'll need around an extra 17 nodes to handle our 50 tenants controlplane pods --- and 50 pvs for all the databases --- totalling out around 254 bucks, or $5 per tenant a month --- what does that mean for your workloads though when you apply a pod --- well the pods will share nodes with other tenants --- which introduces a noisy neighbor concern --- but we could create node pools to separate that, so a single control plane could operate multiple pools vcluster makes this a bit easier with the node selector and enforce node selector options --- so that all sounds great, but what about network --- well the inherent rules of Kubernetes networking still exist that any pod can talk to any pod, and that overarching cluster is a flat network. --- depending on your network, you may be able to apply network policy in order to limit what is permitted to talk to what --- in effect giving us an isolated network boundary vcluster makes this easy with the --isolate parameter --- so thats the network isolated --- but if any one of the containers escapes to the host, perhaps through a kernel vulnerability --- well if you've got to the node, then its not a big leap to then take the kubelet --- and in turn take the control plane, at which point your entire multi tenant cluster and everything it connects to has been compromised. its a bad day --- what alternatives do we have left --- Well, we could run a bunch of different clusters --- Enter Karmada to make some sense of that --- which abstracts some of the things you might be used to seeing in Kubernetes with --- some karmada equivalents --- to dig in to the architecture of that --- so our master cluster can command the child clusters agent --- which in turn communicates with the agent's cluster api server --- so a kubectl apply to the master --- will cascade into the child cluster --- This effectively gives us independent clusters with an overarching central management --- with calls made to the manager --- being passed on to child clusters giving me both standard fixed workloads, perhaps some logging and policy engine and config --- along with workloads --- that are individual to --- each team or environment and the child clusters could be anywhere, from all being in the same physical data centre, all the way through to geographically disparate multi cloud vendors and some hybrid cloud thrown in for good measure because fun times --- Last demo god prayer to make, wish me luck DEMO export KUBECONFIG=karmada-apiserver.config kubectl get cluster #deploy something on a worker cat deploy.yaml KUBECONFIG=worker-1-kubeconfig.yaml kubectl apply -f deploy.yaml kubectl get --raw /apis/search.karmada.io/v1alpha1/search/cache/apis/apps/v1/deployments | jq '.items[] | (.metadata.annotations["resource.karmada.io/cached-from-cluster"] + " " + .metadata.name)' ## demo propagating a deploy cat propagating-deploy.yaml kubectl apply -f propagating-deploy.yaml KUBECONFIG=worker-1-kubeconfig.yaml kubectl get pods -A KUBECONFIG=worker-2-kubeconfig.yaml kubectl get pods -A KUBECONFIG=worker-1-kubeconfig.yaml kubectl delete -f deploy.yaml kubectl delete -f propagating-deploy.yaml unset KUBECONFIG --- So we saw a cluster of clusters --- with administrative control being carried down to the tenant clusters --- and we saw there was no sharing of any network, worker nodes. but still some consolidation ultimately a strong isolation --- our multi-tenancy-o-meter is well into the red however --- but what is that in dollars so we need 51 clusters, well the control plane node don't cost us anything with our cloud vendor --- there'll need to be at least 1 node per cluster in order to support the agent --- so that totals out at 612 bucks or 12 per tenant per month --- so by means of a review --- we started with basic Kubernetes, and added some multi-tenancy through namespaces --- we then added in some isolation with node pools giving us runtime isolation --- we saw what happened when we added in some common things like monitoring --- logging --- storage --- ci/cd --- and ingress --- and of course we saw the cost implications of those choices --- We saw the costs ranging from 0 for the Hierarchical namespace controller, vCluster in at 252, and karmada at 612 dollars/month --- on top of that, theres the cost of a dedicated ingress for each of those 50 tenants --- bringing our totals to this --- now add on some monitoring and logging --- and any other tooling and you can see the exponential costs mounting --- some caveats to the costs --- so thats dedicated ingress --- what if we dropped that requirement, and had a single ingress controller well karmada obviously needs one per real cluster, but we can share one with vcluster and hnc --- which gives us a slightly more balanced result on our multi-tenancy-o-meter --- so a full recap on everything we've covered today --- we saw that isolation comes at a cost, only you can be the judge of where you want to shoot for on the multi-tenancy-o-meter. hopefully we've covered some detail that will allow you to make a more informed decision --- we looked at some of the common things folk tend to run in clusters and want to be different such as ingress controllers --- we saw how theres some baseline costs, some linearly increasing costs and ultimately some exponentially increasing costs that can really be a gotcha --- we looked at some tools along the journey of how to manage something that has something resembling the ergonomics of multi cluster, but fundamentally has the limitations of a single cluster under the hood --- and then we covered karmada which is one many products in the space of supporting you wrangling multiple clusters --- Finally a quick thank you to loft labs, for hosting these three sessions with us. If you've not seen them, then I'd consider Daniele and Salman's talks as essential watching, so absolutely go feast on those. Special big thanks to Salman who's hopefully been furiously working in the background to support this session by hopefully giving me some well researched hints to any of your questions; so I'll also use that as an opportunity to blame him for any misinformation I pedal. --- Thank you so much for your time, we've hopefully got some time for questions, if you don't get a chance to ask it, or you're not watching this live then please do drop a line in the loftlabs slack, or hunt me down on linkedin --- --- --- --- --- ---