Farewell to Dangling HTTP(s) Load Balancers (GKE)

credit: https://commons.wikimedia.org/wiki/File:The_Grim_Reaper_-_geograph.org.uk_-_522625.jpg

I do a lot of GKE work.

When doing lots of development on, or automating tasks with GKE, I used to casually bring a cluster up, create resources, delete the cluster, try again. Why not? After all GKE makes this process a breeze.

This works fine in most cases, but for Kubernetes Ingress resources that create HTTP(s) load balancers… well, there’s one thing you need to be aware of: if you delete the cluster (as opposed to, say, delete a Kubernetes namespace), Kubernetes has no chance of cleaning the load balancers, and they are left there forever, until you do something about it.

Well, I decided to do something about it.(I would love to know if there are better ways! Please leave a comment)

(UPDATE: supposedly this has been fixed, but recently I have not been creating/deleting clusters right from left, so have not confirmed)

I would almost like to leave these load balancers dangling as they are, but there are two problems:

  1. They cost you money, and
  2. GCP has a quota on maximum number of resources you can create for forwarding rules, backends, etc.

This means it is crucial for us to clean up after ourselves.

If you don’t care about how load balancers are structured in GCP, then stop reading and head straight over to my repository:

The Anatomy Of A Load Balancer

There is no such resource as a “Load Balancer” on GCP. Instead, a load balancer consists of multiple resources that collectively act as your trusty worker. If you are ever going to actually work with GCP load balancers, you need to know these terms:

  • Forwarding Rules: Determines how you receive your inbound traffic. Sometimes called the “frontend.”
  • Target Proxies: Receives traffic from the frontend, and routes them to appropriate backends. Note: there exists target https proxies and target http proxies, which are managed separately.
  • URL Maps: The actual map containing the routing rules for target proxies.
  • SSL Certificates: The certificates used for HTTPs load balancers.
  • Backend Services: A grouping of VMs that handle the requests. In our case, points to a particular port(s) on an instance group.
  • Instance Groups: The same instance groups that you know from GCE.
  • Instances: The actual VMs.
  • Health Checks: Agents that periodically checks your backend services to make sure they are accessible.

These are the components that make up a load balancer. If you know these terms, the GCP Networking UI will actually make more sense! :)

Finding Dangling Load Balancers

We basically need to find all load balancers whose backend instance group has ceased to be.

So first, we look for forwarding rules whose name starts with k8s-fw , which is the name automatically given to load balancers created by the Kubernetes Ingress resources.

Then we successively search through the structures until we are at the instance groups level. The way wedo this is as follows:

  1. Find applicable forwarding rules
  2. Find the target proxies that can be inferred via the forwarding rules target field.
  3. Find the urlMaps that can be inferred via the target proxies urlMap field.
  4. Find the backend services that can be inferred via the url map’s services field.
  5. Find the instance groups that can be inferred via the backend service’spathMatchers.pathRules fields.

At this point we can list the instances that are associated with this backend service. If the total number of instances that are alive on this backend service is zero, we declare this entire load balancer dangling.

When we do this we can finally start to back pedal a bit and start deleting all the associated resources, plus a few more that we didn’t need to query to get to the instances, such as the SSL certificates and health checks.

Now, we’d be almost done, except there are cases where we are lacking the forwarding rule, but we have everything after the target proxies. This may happen if we deleted the cluster in the middle of initialization, or when we have a faulty setup that prevented the load balancer from properly forming.

To capture these cases, after we do the regular walk of the forwarding rules, we list target proxies and pick ones whose names start with k8s-tp, which have yet to be picked up in the previous walk. Then we do the same check.

Lastly, to prevent accidentally deleting load balancers that are still being initialized, we make sure to check that the target proxy is at least 1 hour old. If your proxy has not initialized after 1 hour, you have a different problem…

And this basically is all that is required to find dangling load balancers.

Deleting Through Google App Engine

Now the above must be checked periodically. Since we’re in the GCP environment, the easiest way to periodically run a task is to use App Engine.

If you are interested in how I actually implement it, please go take a look at my repository above. It’s a bit hackish, but App Engine was the right choice for the job, as the Taskqueue proved itself most useful.

One thing you should know if you are going to read that code is how the Go binding for Google API is organized. If you are not used to that calling convention they sure look odd, but once you get the hang of it, you quickly realize that the API is that way precisely because it is by far the easiest way to generate code that make RPC calls.

If you are interested, I explain how the API is used in one of my repositories where I stole their coding style:

So that was it. I deployed my code to App Engine, set up my cron.yaml to start looking for dangling HTTP(s) load balancers every 10 minutes, and now I can rest easy knowing that I won’t be slapped with a fee or that I won’t have to worry about exceeding GCP quotas.

Thanks for reading, and please let me know if there’s a better way to do this. Happy hacking.

Go/perl hacker; author of peco; works @ Mercari; ex-mastermind of builderscon; Proud father of three boys;