Travis Van
Contributing Writer

Cutting Kubernetes costs with virtual clusters

feature
Aug 19, 202410 mins
Cloud-NativeContainersKubernetes

vCluster creates lightweight virtual Kubernetes clusters within physical Kubernetes host clusters, dramatically reducing resource consumptionโ€”and increasing agility and control.

shutterstock 324149159 cloud computing building blocks abstract sky with polygons and cumulus clouds
Credit: Shutterstock

Straight out of the webscale playbook, platform engineering was considered a futuristic discipline until just a few years ago. Would platform engineering really trickle down to mainstream enterprise teams? Did companies really want to operate their infrastructure like the major cloud providers? Now itโ€™s a practice that 80% of enterprises will have adopted by 2026,ย according to Gartner.

Platform engineering means different things to different people, and thereโ€™s no golden path that prescribes exactly how to do it right. But the main goals are universally understood. On the one hand, platform engineering strives to boost developer velocity, by removing bottlenecks and adding self-service. On the other hand, it aims to standardize on central controls like security and compliance, so you can keep costs and complexity in check.ย 

Increasing developer velocity has been a clear win for platform engineering. Containers, microservices, Kubernetes, CI/CD, and the modern development workflow undeniably have made software development a faster, more productive, more automated experience for distributed teams.ย 

But the โ€œcentral controlโ€ part of platform engineering? Itโ€™s not so easy to declare victory quite yet. Weโ€™re in the midst of aย multi-year backlashย against the high cost and complexity of the cloud operating model. And today that central control side of platform engineering isnโ€™t just a platform engineering team issue, itโ€™s a CFO issue, as cloud bills soar and companies feel severe pressure to find cost savings. The cloud and Kubernetes are here to stay, but fixing a broken central control plane is a multi-million dollar dilemma that many enterprises are struggling with today.

Thatโ€™s why an open-source project called vCluster is having a breakthrough moment.ย vCluster takes aim at the heart of the Kubernetes operating model, the cluster abstraction, to deliver a range of benefits to organizations building on Kubernetes. vCluster not only dramatically reduces resource overhead, which in turn can add up to significant cost savings, but also brings more agility and more central control to platform engineering teams.

The open-source path to opportunity

vCluster co-creators (and Loft Labs co-founders) Lukas Gentele and Fabian Kramm met in college at the University of Mannheim, where, as computer science students, they shared a similar technology path from Java and graph databases like Neo4j to working with web-focused technologies like PHP and JavaScript, then diving into the Go language when Docker and Kubernetes took off, seeing that Go was the future. When Gentele started an IT consultancy while still in college, Kramm was his first hire.

Within that IT services business, Gentele and Kramm created a project called DevSpaceโ€”essentially a Docker Compose alternative focused on streamlining Kubernetes workflowsโ€”and put it on GitHub. That was their first exposure to developing and maintaining an open-source project. Theyโ€™d both made contributions and fixes to open-source projects, but had never owned a project or driven one as maintainers. Seeing the magic of open source, distributing it, people using it, valuing it, and contributing to itโ€”they were hooked.

After graduating college, the two set out to build a PaaS product (what Gentele says was โ€œlike Heroku for Kubernetesโ€), applied to Y Combinator, got denied but invited to apply again, then parlayed that into participation in the SkyDeck accelerator program at U.C. Berkeley. Ultimately, they concluded that PaaS was a very difficult business to run. They werenโ€™t the first to hit this wallโ€”as Cloud Foundry, Heroku, and the Docker foundersโ€™ struggles to monetize dotCloud demonstrated.ย 

โ€œWe realized we had a lot of free users for our PaaS but not a lot of willingness to pay,โ€ Gentele said. โ€œOK, so what did we learn? We learned that running large Kubernetes clusters and sharing those clusters with users was extremely complicated and expensive, and that there was a much better way to do this that was a much bigger opportunity than the PaaS. An idea that could be useful for anyone running Kubernetes clusters.โ€

Fleets are the wrong abstraction

In the early days of container orchestration, the market got comfortable with the idea of treating servers like โ€œcattleโ€ (interchangeable hardware that can be swapped out), versus โ€œpetsโ€ (each server approached with its own care and feeding) as a core concept of turning servers into clusters.

For years, there was an architectural debate around whether to create a fleet of small clusters or a single massive cluster. โ€œKubernetes itself was designed to run at large scale,โ€ said Gentele. โ€œItโ€™s not as meant to run as a five-node cluster. If you have these small clusters, you get so much duplication and inefficiency.โ€

But as the major cloud providers rolled out their Kubernetes offerings, small single-tenant clusters and fleets of multiple clusters were the units of abstraction sold to the enterprise market, complete with โ€œfleet managementโ€ solutions for coordinating all the moving parts and keeping services in sync in the โ€œclusters of mini clustersโ€ approach.

Gentele attributes a large portion of todayโ€™s cloud cost overruns back to this original sin by the cloud providers.

The first consequence of the fleet approach is the penalty of heavyweight infrastructure components being paid multiple times. Platform teams want to standardize on core services like Istio and Open Policy Agentโ€”services that are designed to run at scale, so if you run a lot of them at small scale itโ€™s super inefficient. In the fleet approach, these services always get installed in each cluster, which creates a massive duplication of core services as the entire platform stack is replicated across multiple small clusters.

The other major consequence is that these clusters run all the time. Nobody turns them off. Thereโ€™s no easy way to turn off an entire cluster on the major cloud offerings with the click of a button. Rather, itโ€™s a manual process that requires a policy to be put in place, and 30 minutes to spin up the entire platform stack of services used to connect, manage, secure, and monitor the cluster. Itโ€™s also hard to tell when a cluster is truly โ€œidleโ€ when all of these platform servicesโ€”security components, policy agents, compliance, backup, monitoring, and loggingโ€”continue running underneath.

vCluster: Addition by subtraction

Gentele and Kramm had the epiphany that the fleet approach to clusters could be vastly improved upon, and that Kubernetes multitenancy could be redefined beyond traditional namespace approaches.

In 2023, they released vCluster and introduced the concept of โ€œvirtual clusters,โ€ an abstraction to create lightweight virtual Kubernetes clusters. Similarly to how virtual private networks create a virtual network over a physical infrastructure, virtual clusters create isolated Kubernetes environments over a shared physical Kubernetes cluster.

vCluster is a certified Kubernetes distribution, so the virtual clusters behave exactly the same way as any other Kubernetes clusterโ€”with one important difference. Whereas each virtual cluster manages its own namespaces, pods, and services, it does not replicate the platform stack. Instead, it shares the heavyweight platform components, such as Istio or Open Policy Agent, run by the underlying physical cluster. With this shared platform stack, virtual clusters are no longer dragging around the albatross of replicating platform services.

And yet each vCluster has its own API server and control plane, providing strong isolation to tenants and giving platform teams the ability to create their own custom policies for security and governance. They can create their own namespaces, deploy their own custom resources, protect cluster data by using their own backing stores, and apply their own access control policies to users.

At the same time, vCluster gives platform teams far greater speed and agility than a physical cluster. A virtual cluster can be spun up in a mere fraction of the time it takes to spin up a physical cluster. A restart of a vCluster takes about six seconds, versus 30 or 45 minutes to restart a physical Kubernetes cluster thatโ€™s running heavyweight platform services like Istio underneath.

โ€œKubernetes is great from an API perspective, a tooling perspective, a standardization perspectiveโ€”but the architecture that the cloud providers advocated in running clusters of small clusters took the industry back to the physical server in terms of cost and heaviness,โ€ Gentele said. โ€œIn the โ€™90s, someone had to actually physically walk into a data center, plug in a server, issue credentials, and take some other manual steps,โ€ he said. โ€œWeโ€™re in a similar boat with Kubernetes today. You have so many enterprises running their entire application stack in each cluster, which creates a lot of duplication.โ€

vClusters makes the Kubernetes cluster more lightweight and ephemeral, similar to what virtual machines did for physical servers and to what containers did for workloads in general.

โ€œSpinning up small single-tenant Kubernetes clusters was a really terrible idea in the first place, because itโ€™s very costly, and itโ€™s very, very hard to manage,โ€ Gentele said. โ€œYouโ€™re going to end up with hundreds of clusters. And then youโ€™ve got to maintain things like ingress controller, cert manager, Prometheus, and metrics across all these clusters. Thatโ€™s a lot of work, and itโ€™s really hard to keep in sync.โ€

vCluster by the numbers

vCluster has more than 6,000 stars on GitHub and more than 120 contributors. The project has drawn the attention of Kubernetes experts such as Rancherโ€™s former CTO and co-founder Darren Shepherd, who has beenย advocating for the use of virtual clusters. Teams fromย Adobe,ย CoreWeave, andย Codefreshย have been outspoken about their use of vCluster at events like KubeCon.

Gentele and Krammโ€™s startupย Loft Labsย was recently funded to extend enterprise capabilities around vCluster. The $24M Series A was led by Khosla Ventures, which is known for being the first institutional investor in companies like GitLab and OpenAI.

The startupโ€™s commercial offering on top of vCluster has generated particular excitement over its โ€œsleep mode,โ€ which turns off inactive virtual clusters automatically. Typically, enterprises that spin up clusters tend to see them run all the time. Loft Labsโ€™ product measures virtual cluster activity by monitoring incoming API requests, using the sleep mode to automatically scale down virtual clusters when theyโ€™re not used to save on cloud resources and overall cost.

vCluster may help to run cloud infrastructure more efficiently and drive down cloud costs but it also gives enterprises a clearer path to winning on central control and developer velocity. In addition to stretching physical cluster resources, vCluster provides each virtual cluster with its own separate API server and control plane, giving platform teams both more flexibility and more control over the management, security, resource allocations, and scaling of their Kubernetes clusters.

Travis Van
Contributing Writer

Travis Van has been following open source and distributed computing for more than 20 years, with a particular focus on cloud and network infrastructure, programming languages, developer frameworks, and platform engineering trends. He is the founder of information technology news aggregation service TechNews.io. As an InfoWorld contributor, he tells the stories of open source creators and maintainers who are tackling the hardest problems of distributed computing and laying the foundations for the next wave of enterprise computing.

More from this author