Skip to content

202504030034 Replacing Metallb + Caddy with Tailscale Ingress as Private NLB Provider and Certs Resolver

Current Architecture

  1. External Reverse Proxy as a VM (let’s call it Tailscale Caddy)
    1. VM is a Caddy Reverse Proxy + Tailscale
    2. Cloudflare Public DNS points an A record to Tailscale Caddy private Tailscale Ipv4 address
      1. *.app.mydomain.com -> <your-tailscale-private-ip
    3. (Optional) I configured Adguard home to do DNS rewrites locally so all wildcard requests *.app.mydomain.com gets rerouted to a local private IP. This IP, i.e. 10.0.10.10 is an IP assigned by metallb as my bare-metal load balancer.
      1. Why I chose metallb is that I needed a way to provide a HA way of provisioning IP addresses.

Current Tailscale AGH, Metallb Implementation.png

New Implementation

We will use Tailscale Kubernetes Operator, which we already am for Egresses (cluster to other tailnet devices).

We will use Cloudflare as our cert resolver and its DNS challenge method. With that, we need to create an API token and store it in the cluter which Traefik can use to do the challenges and request for certs.

  1. Uninstall traefik in the cluster
  2. Re-install traefik via helm chart and configure configurations
    1. Configure chown via init-container
    2. Store CF_DNS_API_TOKEN in the cluster
  3. Configure a Tailscale Ingress for traefik service
    1. In the chart configuration (or values.yaml), set service.spec.loadbalancer to tailscale
    2. Tailscale Ingress will assign traefik service an external tailnet + private IP instead of metallb assigning from the configured pool.
  4. Find the Tailscale Ingress provisioned external IP names for the traefik service with kubectl get svc -n <namespace> traefik
  5. Add in External IP names for traefik into the public DNS (cloudflare) as an A Record.
    1. I do mine via Terraform
  6. Delete any AGH DNS Rewrites
  7. Create IngressRoute objects to configure traefik routing for the applications, ensuring they have the correcct spec.entrypoints, spec.routes.match and tls.certResolver and tls.domains
    1. Each app = 1 IngressRoute
  8. Configure Tailscale ACL to allow devices access to the Tailscale Ingress pod
    1. Set ACL for src to the ingress pod tag which can be found in your Tailscale Admin Console
  9. Re-deploy Traefik
  10. Traefik will begin requesting for TLS certs from lets-encrypt

New Tailscale As NLB Implementation.png

Benefits

This new implementation is far simpler and easier to make changes.

  1. All changes are done via Kubernetes
    1. Routes are done via IngressRoute CRD
      1. Declaring routes for apps become more declarative and in a “kubernetes-way” and follows closely to IaC concepts
      2. We no longer need to manage a CaddyFile
      3. We no longer need a separate VM to do reverse proxy, reducing surface attck and also minimzing overhead for resource management
  2. Certs are managed via CloudFlare automatically, with Traefik using Lets-Encrypt as a drop-in replacement for cert-manager which feels more lightweight
  3. Using Traefik as the ingress controller without a separate VM “fronting” it makes the solution more HA.
    1. I no longer have a dependency on the Caddy server to provision the HTTPs certs and routing
    2. I can deploy and provision routes in a “HA-way” and deploy a multi-replica, self-healing instance to configure routes, the way we want to do it with Kubernetes
  4. The architecture becomes far more simple