Local Kubernetes Lab – The Easy Way With podman and kind

Summary

In a few articles, I’ve shared how to provision your own local Kubernetes (K8s) lab using VMs. Over the years this has simplified from Intro To Kubernetes to Spinning Up Kubernetes Easily with kubeadm init!. With modern tools, many of us do not need to manage our own VMs.

These days there are many options but one of my favorites is kind. To use kind, you need either Docker Desktop or Podman Desktop.

Podman is quickly becoming a favorite because instead of the monolithic architecture of Docker Desktop, Podman is broken out a bit and the licensing may be more appeasable to people.

Installation

Installing Podman is relatively easy. Just navigate to https://podman.io and download the installer for your platform. Once installed you will need to provision a Podman Machine. The back end depends on your platform. Windows will use WSL2 if available. MacOS will use a qemu VM. This is nicely abstracted.

Installing kind is very easy. Depending on your platform you may opt for a package manager. I use brew so its a simple

brew install kind

The site https://kind.sigs.k8s.io/docs/user/quick-start/#installation goes through all the options.

Provisioning the Cluster

Once these two dependencies are in play its a simple case of

kind create cluster

The defaults are enough to get you a control plane that is able to schedule workloads. There are custom options such as specific K8s version and multiple control plans and worker nodes to more properly lab up a production environment.

kind create cluster

From here we should be good to go. Kind will provision the node and set up your KUBECONFIG to select the cluster.

As a quick validation we’ll run the recommended command.

We can see success and we’re able to apply manifests as we would expect. We have a working K8s instance for local testing.

Extending Functionality

In IaC & GitOps – Better Together we talked about GitOps. From here we can apply that GitOps repo if we wanted to.

We can clone the specific tag for this post and run the bootstrap

git clone https://github.com/dcatwoohoo/k8-fcos-gitops.git --branch postid_838 --single-branch

cd k8-fcos-gitops

./bootstrap_cluster1.sh

# We can check the status with the following until ingress-nginx is ready
kubectl get pods -A

# From there, check helm
helm ls -A

And here we are. A local testing cluster we stood up and is controlled by GitOps. Once flux was bootstrapped it pulled down and installed the nginx ingress controller.

A Look at Telsa Through the Team of Teams Lens

Summary

Going down a reading rabbit hole, I recently read Team of Teams: New Rules of Engagement for a Complex World by General Stanley McChrystal. I was not sure what I was in for with this. Paraphrasing what General Stanley McChrystal wrote, while he was going through the events in the book, he was not sure if what he experienced was a fluke or there was something more to it.

While this article is not a review as there are plenty around, this book did start to move me. I started thinking about other applicable lenses to view this through. In the book, the automotive industry is cited a couple of times. This opened Pandora’s box for me. In it, one of the examples was the GM Ignition Recall that took nearly 10 years to have fixed for a $2 part. Ultimately it was an organizational structure failure. The low level teams had known about this months after the new ignition switches were sent out into the wild and reports had started coming back.

Why Tesla?

It is easy to get wrapped up in the politics and public displays for which Elon Musk is known. Setting that aside, what Tesla is doing is revolutionary for a few reasons. In these times, it is extremely hard to start a new automobile company. Tesla not only a new automobile company but using a fuel source that is not industry standard.

Tesla is different. They are not just an automobile manufacturing company. Elon himself in numerous interviews cites that Tesla is actually a “hardcore” engineering company. They manufacture numerous parts for the vehicle in house as well as write all of the software (Software Engineering).

Outside the scope of this article, they’re also a data mining company. They have driving details on now millions of their vehicles. This has various uses such as road mapping, driving patterns and improving their autonomous driving.

How Legacy Automotive Companies Operate

Many of the legacy automotive manufacturers are extremely siloed using the “reductionist” methodology of breaking down areas into small teams and pushing them for efficiency. There are many different vendors that make components for legacy car companies. They build them to the Original Equipment Manufacturers specifications to ensure interoperability. These vendors do not typically communicate with each other or all of them to understand the whole picture. What this means is that the Engine Control Module (ECM) may be manufactured by one company and the Transmission Control Module (TCM) may be manufactured by another. The software may then be subcontracted out by those vendors. They use interoperability standards but may have little idea of how the Battery Control Module (BCM) interacts with these two modules.

This allows scale and efficiency. Vendor management is a very strong tool to help mitigate concerns. Many, like Toyota are great at this. They many times will have supply manufacturing happen in the same plant as the cars are assembled. Contracts also tend to indicate suppliers have a certain stock of supplies to weather temporary supply chain issues.

How Tesla Operates?

Many of its key components are manufactured in house, such as its seats. This is not to say it does not outsource any manufacturing. It certainly does. One critical piece that Telsa handles in house is to write its own software. This was instrumental in its adaptability during the computer chip shortages of 2020 and onward.

Chip Shortage

During the chip shortages, OEMs could not get their hands on chips. Many of the big ones had lots filled with unfinished vehicles. They were simply waiting on chips to arrive with no end in sight. Cars were delivered without features, in many cases.

Tesla did deal with a delay in production because of this. Its adaptability in writing its software, allowed it to utilize chips that were available. Not only did it adapt its software, Tesla realized it could in some cases reduce the need for some of them. This is very well documented in https://www.utilitydive.com/news/tesla-chip-semiconductor-shortage/628150/

Wrapping it Up

Traditional car manufacturers are very siloed. They are built this way for efficiency and scalability. With this, they are very inflexible and not very adaptable. Many of them are struggling to become profitable on their Electric Vehicles (EVs). Recently even Ford has started to bring its software development in house. This allows for constant updates that are needed without ever having to go into the dealership. Many of the Tesla recalls have been corrected via OTA (Over the Air) updates to software.

Conclusion

In a modern world of complexity, teams cannot work in isolation. They need to be aware of what other teams are doing to have a shared vision. Cognitive load needs to be minimized or information overload will occur but in this new world of complexity and constant information, silos do not work.

IaC & GitOps – Better Together

Summary

Building on the prior article of Fedora CoreOS + Ansible => K8s we want complete Infrastructure As Code. The newest way of doing this is GitOps where nearly everything is controlled by SCM. For that, flux is one of my favorites but Argo will also work.

The benefit of GitOps and K8s is that developers can have complete but indirect access to various environments. This makes it really easy for a DevOps team to provision the tooling very easily to either spin up environments effortlessly or let the developers do it themselves. That helps us get close to Platform Engineering.

Flux GitOps Repo

For this article, this is the tagged version of the GitOps repo used. At its core, we manually generated the yaml manifests via scripts commands. Namely upgrade_cluster1.sh and generate_cluster1.sh. Combined these create the yaml manifests needed. Upgrade cluster can be run to refresh the yaml during an upgrade but do not let it trick you. It can also be used to generate the initial component yaml. The generate_cluster1.sh should only need to be run once.

The bootstrap_cluster1.sh is used by our ansible playbook to actually apply the yaml.

Bootstrapping Flux

The flux cli has a bootstrap command that can be used but for this, we want disposable K8s clusters that can be torn down and then new ones rebuilt and attached to the same repo. Not only does this allow the workloads running to be treated like cattle but also the infrastructure itself.

To achieve this, we are manually creating the yaml manifests (still using the supported CLI tools) but decoupling that from the initial setup, deploy and running of the environment.

What Did We Get?

From a simple set of changes to pull and deploy flux, we have a sample ingress controller (nginx). In it you can specify any parameter about it and have clear visibility as to what is deployed. In this scenario we are just specifying the version but we could also specify how many instances or whether to deploy via daemonset (one instance per worker node).

Wrapping It All Up – What Is The Big Deal?

It might be a natural question as to what is the big deal about K8s, IaC, GitOps and this entire ecosystem. True IaC combined with GitOps allows complete transparency into what is deployed into production because flux ensures what is in Git is reconciled with the configured cluster. No more, one off configurations that nobody knows about until upgrade to replacement time on the server.

The fact that we have so much automation allows for tearing down and rebuilding as necessary. This allows for easy patching. Instead of applying updates and hoping for the best, just instantiate new instances and tear down the old ones.

Fedora CoreOS + Ansible => K8s

Summary

Kubernetes is a personal passion of mine and I have written a few times about how to standup one of my favorite Container Optimized Operating Systems, PhotonOS. Most recently I wanted to rebuild my lab because it has been a while. While some of my prior posts have served as a Standard Operating Procedure for how to do it, its lost its luster doing it manually.

Because of this, I sought out to automate the deployment of PhotonOS with Ansible. Having already learned and written about SaltStack, I wanted to tool around with Ansible. I thought, great, Photon is highly orchestrated by VMware, this should be simple.

Unfortunately PhotonOS 5 does not work well with Ansible, namely due to the package manager.

Container Optimized Operating Systems

In my search for one that did work well with Ansible, I came across a few. Flatcar was the first. It seemed to have plenty of options. I think came across Fedora CoreOS. These seem to be two of many forks of an older “CoreOS” distribution. Since Ansible and Fedora fall under the RedHat umbrella, I went with FCOS.

The interesting thing about Flatcar and CoreOS is that they use Ignition (and Butane) for bootstrapping. This allows for first time boot provisioning. This is the primary method for adding authentication such as ssh keys.

My Lab

My lab consists of VMware Fusion since I’m on a Mac. For that a lot of my steps are specific to that but I attempted to make them generic enough so that it could be modified for your environment.

Here is my full repo on this – https://github.com/dcatwoohoo/k8-fcos/tree/postid_822

Setting up Ignition

To help ensure your ssh keys are put into the environment, you’ll need to update butane with the appropriate details. Particularly the section “ssh_authorized_keys”

Butane is a yaml based format that is designed to be human read/writable. Ignition is designed to be human readable but not easily writable. For that reason, we use a conversion tool to help.

docker run -i --rm quay.io/coreos/butane:release --pretty --strict < ignition/butane.yaml > ignition/ignition.json

Don’t worry, this is baked into the ovf.sh script

Instantiating the OVF

The first step was acquiring the OVA template (Open Virtual Appliance). On https://fedoraproject.org/coreos/download?stream=stable#arches that would be over here!

For this I scripted it via ovf.sh that instantiates it for a given number of instances. As documented, its 2 nodes, fcos-node01 and fcos-node02

Once they are done and powered one, along with a 45 second pause/sleep, we’re good to run Ansible and get the cluster going.

Running Ansible

Because I can be lazy, I created a shell script called k8s-fcos-playbook.sh that runs the playbook. At this point, sit back and enjoy the wait.

If all goes well you’ll have a list of pods up and running successfully and a bare bones cluster.

kubectl get pods -A

Concluding thoughts

While I did not go specifically into Ansible and the details, it is a public repository and written in a fairly self explanatory way. It’s not the best or most modular but is easy to follow.

Special Thanks!

A special thanks to Péter Vámos and his repository on doing this similarly. He gave me some great ideas, although some of it I went in a different direction.

Bootstrapping a CA for Kubernetes

Summary

The purpose of this article is to walk through bootstrapping a CA for Kubernetes clusters for use in the ingresses and other possible needs like a private docker repository. For this we will use https://cert-manager.io. We will assume you have an operational K8 cluster/node but if not check out https://blog.woohoosvcs.com/2023/06/photon-os-5-0-kubernetes-1-25/ on how to do that.

Use Case

A really good use case for this is when you want to use self-signed certificates in your lab but want the browser to trust it. For certificates to work, they require FQDNs. One could certainly have host file entries for every endpoint they need but I recently came across a more elegant solution “localdev.me”. This was referenced in a few places but namely https://kubernetes.github.io/ingress-nginx/deploy/

The beauty of localdev.me is that any subdomain resolves to 127.0.0.1 so you can easily run

kubectl port-forward svc/ingress-nginx-controller -n ingress-nginx 443:443

To forward all of your ingresses to localhost. Its a neat trick and in today’s world we want to test TLS encryption using HTTPS.

Requirements

For this, we simply need to install cert-manager. There are two main ways. kubectl apply or using the Helm Chart. If you’re not familiar with Helm, please go down that rabbit hole. For this we’ll assume you are just running kubectl apply.

Installation

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml

With any luck you will have some cert-manager related pods running

% kubectl get pods -n cert-manager
NAME                                      READY   STATUS    RESTARTS   AGE
cert-manager-559b5d5b7d-tq7rt             1/1     Running   0          29s
cert-manager-cainjector-f5c6565d4-vv652   1/1     Running   0          29s
cert-manager-webhook-5f44bc85f4-qkg7s     1/1     Running   0          29s

What Next?

Cert-Manager is a fairly extensible framework. It can connect to ACME compatible authorities to request and process certificate creation and renewals but for this we will be using two other configurations for it. We will be using the “CA” ClusterIssuer. In order to bootstrap a CA though we also have to use the “SelfSigned” Issuer.

Show Me the YAML

At the beginning is a Self signed issuer. What this means in our case is that the certificate’s common name matches the issuer name. You will find this in any public certificate authority root as well. There are two types of “issuers” in cert-manager. An “Issuer” which is namespaced and can only issue for that namespace and a “ClusterIssuer” which can issue for the cluster. For labs I like to use ClusterIssuers so do not need to have multiple issuers.

Certificates are namespaced though. My preference is to have a wildcard certificate in each namespace but you can also have the ingress request certificates.

No – Really Show Me the YAML!

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned-issuer
spec:
  selfSigned: {}

Here we have a simple self-signed issuer. An Issuer is just a construct to issuer certificates. We still need to create the CA Certificate.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: my-selfsigned-ca
  namespace: cert-manager
spec:
  isCA: true
  commonName: my-selfsigned-ca
  duration: 43800h
  secretName: root-secret
  privateKey:
    algorithm: ECDSA
    size: 256
  issuerRef:
    name: selfsigned-issuer
    kind: ClusterIssuer
    group: cert-manager.io

Here we are requesting a root certificate that will expire in 5 years. We can’t swap these out too regularly because its a pain getting our OS and other tools to trust them. In the issuerRef we see the self-signed-issuer referenced with many other attributes we’ll use later.

Then we need to create a cluster issuer for the certificates we want to issue. We tell it to use the ca root-secret for the chain.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: my-ca-cluster-issuer
  namespace: cert-manager
spec:
  ca:
    secretName: root-secret 

Next we will issue a certificate in a namespace that will chain off the self-signed root. It is namespaced to the sandbox namespace. They will expire after 90 days and renew 15 days before expiration.

Make sure to create the sandbox namespace first if you want to use this or change it to the namespace you want.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: localdev-me
  namespace: sandbox
spec:
  # Secret names are always required.
  secretName: localdev-me-tls

  # Secret template is optional. If set, these annotations
  # and labels will be copied to the secret named example-com-tls.

  # Note: Labels and annotations from the template are only synced to the Secret at the time when the certificate 
  # is created or renewed. Currently labels and annotations can only be added, but not removed. Removing any 
  # labels or annotations from the template or removing the template itself will have no effect.
  # See https://github.com/cert-manager/cert-manager/issues/4292.
  secretTemplate:
    annotations:
      my-secret-annotation-1: "foo"
      my-secret-annotation-2: "bar"
    labels:
      my-secret-label: foo

  duration: 2160h # 90d
  renewBefore: 360h # 15d
  subject:
    organizations:
      - Woohoo Services
  # The use of the common name field has been deprecated since 2000 and is
  # discouraged from being used.
  commonName: localdev.me
  isCA: false
  privateKey:
    algorithm: RSA
    encoding: PKCS1
    size: 2048
  usages:
    - server auth
    - client auth
  # At least one of a DNS Name, URI, or IP address is required.
  dnsNames:
    - localdev.me
    - "*.localdev.me"
  # Issuer references are always required.
  issuerRef:
    name: my-ca-cluster-issuer
    # We can reference ClusterIssuers by changing the kind here.
    # The default value is Issuer (i.e. a locally namespaced Issuer)
    kind: ClusterIssuer
    # This is optional since cert-manager will default to this value however
    # if you are using an external issuer, change this to that issuer group.
    group: cert-manager.io

We now have a wildcard certificate for use in the sandbox namespace

% kubectl get secret/localdev-me-tls -n sandbox -o yaml
apiVersion: v1
data:
  ca.crt: XXXXX
  tls.crt: XXXXX
  tls.key: XXXXX
kind: Secret
metadata:
  annotations:
    cert-manager.io/alt-names: localdev.me,*.localdev.me
    cert-manager.io/certificate-name: localdev-me
    cert-manager.io/common-name: localdev.me
    cert-manager.io/ip-sans: ""
    cert-manager.io/issuer-group: cert-manager.io
    cert-manager.io/issuer-kind: ClusterIssuer
    cert-manager.io/issuer-name: my-ca-cluster-issuer
    cert-manager.io/subject-organizations: Woohoo Services
    cert-manager.io/uri-sans: ""
    my-secret-annotation-1: foo
    my-secret-annotation-2: bar
  creationTimestamp: "2023-06-25T19:20:05Z"
  labels:
    controller.cert-manager.io/fao: "true"
    my-secret-label: foo
  name: localdev-me-tls
  namespace: sandbox
  resourceVersion: "3711"
  uid: 3fcca4e2-2918-486c-b191-e10bd585259e
type: kubernetes.io/tls

Where is the Trust?

You may be wondering, great but how do I get my browser to trust this? You’re right, this is essentially an untrusted certificate chain. We need to base64decode the ca.crt section and then import it into a few places. The most important is your OS/browser so that it trusts the root.

On MacOS you’ll use keychain to import and set the trust on it to allow it much like this article – https://tosbourn.com/getting-os-x-to-trust-self-signed-ssl-certificates/

On Windows it will look closer to https://techcommunity.microsoft.com/t5/windows-server-essentials-and/installing-a-self-signed-certificate-as-a-trusted-root-ca-in/ba-p/396105

There may be cases where you need your Kubernetes cluster to trust it as well. That will depend on your distribution but for Photon running a stock K8 distribution its fairly trivial.

You’ll simply put a copy in /etc/ssl/certs/ using a unique name. You will need “openssl-c_rehash” as mentioned in https://github.com/vmware/photon/issues/592 to be able to get the OS to trust it.

You will also want to add the PEM to /etc/docker/certs.d/ so that docker itself (or containerd) trusts it. You will need to restart docker/containerd to get it to accept the cert though. The use case for this is that if you want to mount the certificate in the private repository doing something like this you can. In this case the kubelet on the kubernetes node will call docker/containerd and that will need to trust the certificate.

     volumes:
      - name: localdev-me-tls
        secret:
          secretName: localdev-me-tls
      ...
      containers:
        - image: registry:2
          name: private-repository-k8s
          imagePullPolicy: IfNotPresent
          env:
          - name: REGISTRY_HTTP_TLS_CERTIFICATE
            value: "/certs/tls.crt"
          - name: REGISTRY_HTTP_TLS_KEY
            value: "/certs/tls.key"
          ...
          volumeMounts:
          - name: localdev-me-tls
            mountPath: /certs

Final Words

There you have it. A cluster wide CA that you can have your K8 nodes and local machine trust for TLS encryption. Once setup in this manner it makes it easy and portable to using something like letsencrypt when going to production because most of the framework and configuration is there and has been tested.

Photon OS 5.0 & Kubernetes 1.25

Summary

It is 2023 and a lot has changed since my prior Photon OS and Kubernetes posts. Some things have become much easier but also things have changed such as the migration from docker/dockershim to containerd.

Installation

Installation is as simple as ever. It requires having the ISO and installing. It will require that you navigate to https://github.com/vmware/photon/wiki/Downloading-Photon-OS and download 5.0 (full or minimal).

For the VM specifications, you will need at least 2 cores to run the control plane. I set my lab up with 2 cores, 4GB RAM and 30GB HDD.

My local fusion defaults to an IDE for the CD/DVD and Photon OS 5.0 does not do well with that and recommends changing it to SATA.

After boot into the ISO, just a few simple questions and we’re off! For my lab I’m naming it KUBELAB

Once booted, I prefer to access via SSH and since only a root user has been provisioned I will allow root logins via ssh by editing /etc/ssh/sshd_config

That’s it, we’re installed!

Setting up Dependencies

Instead of using the Photon OS Kubernetes packages, we’ll be using the Google ones. They provide better granularity which is required for upgrades. Instructions for adding them are https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

In particular we want to do the following

cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-\$basearch
enabled=1
gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF

Since Photon OS uses tdnf, we’ll want to install via

# Install 1.25.11-0 of Kubernetes!
tdnf install kubeadm-1.25.11-0 kubectl-1.25.11-0 kubelet-1.25.11-0 cri-tools-1.25.0-0

For this, we’re choosing 1.25.11-0 as I’ve exhaustively tested it for the series I am doing. Once installed, we want to disable the /etc/yum.repos.d/kubernetes.repo by setting “enabled=0” so that OS updates do not push Kubernetes update as we want to control that.

After doing that let’s update the OS packages. But first let’s remove docker we not use containerd

# We are using containerd now
tdnf remove docker*

# OS Updates
tdnf --refresh update

Before we reboot there are a few tried and true settings we’ll want to update.

In /etc/sysctl.d/90-kubernetes.conf

# These are required for it to work properly
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1

# This helps an issue later on when we get into running promtail
fs.inotify.max_user_instances = 256


In /etc/modules-load.d/20-kubernetes.conf
# Required to allow the basic networking to work
br_netfilter

In /etc/security/limits.conf
# Bump up the default of 1024 max open files per process to 10000
*		hard	nofile		10000

In /etc/containerd/config.toml
version = 2
[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
   [plugins."io.containerd.grpc.v1.cri".containerd]
      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          runtime_type = "io.containerd.runc.v2"
          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            SystemdCgroup = true


In /etc/crictl.yaml

runtime-endpoint: "unix:///run/containerd/containerd.sock"
image-endpoint: "unix:///run/containerd/containerd.sock"
timeout: 0
debug: false
pull-image-on-create: false
disable-pull-on-run: false

In /etc/hosts
# Change the localhost binding to the IP of the VM
# Without this kubectl get node - won't show ready
#127.0.0.1   KUBELAB
192.168.83.170  KUBELAB

The config.tomly should look like this.

Now we reboot!

Initialize The Cluster

For this we will run the following with –pod-network-cidr. This is the default for flannel which we’ll use for ease.

# Allow kubelet to start when kubeadm init allows it
systemctl enable kubelet.service

# Initialize - this may take a while as the pods are pulled down
kubeadm init --pod-network-cidr=10.244.0.0/16

This will take 5-10 minutes, maybe longer depending on your internet connection. When you come back with any luck you’ll see success!!!

follow the steps listed.

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

# Check to see if pods are up!
root@KUBELAB [ ~ ]# kubectl get pods -A
NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE
kube-system   coredns-565d847f94-6gmrp          0/1     Pending   0          2m31s
kube-system   coredns-565d847f94-wrvqk          0/1     Pending   0          2m31s
kube-system   etcd-kubelab                      1/1     Running   0          2m46s
kube-system   kube-apiserver-kubelab            1/1     Running   0          2m44s
kube-system   kube-controller-manager-kubelab   1/1     Running   0          2m46s
kube-system   kube-proxy-g9kpj                  1/1     Running   0          2m31s
kube-system   kube-scheduler-kubelab            1/1     Running   0          2m44s

Almost there

The cluster(single node) is up but we still need networking which we’ll use Flannel and to remove a taint to allow scheduling on the control plane since this will be a single node lab with control plane and worker nodes on the same VM.

# Reference https://github.com/flannel-io/flannel

# Apply the flannel network overlay
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

# Remove the taint - the minus removes it
kubectl taint nodes kubelab node-role.kubernetes.io/control-plane:NoSchedule-

Validations!

By running kubectl get pods -A and kubectl get nodes – we can see pods are running and the node is “Ready”

My First Tesla Service Center Visit

Summary

Today I had my first Tesla Service Center visit. Rewinding a bit my Tesla journey started on 8/2/21 when I decided to downsize my Yukon and make the plunge into an EV. Fast forward a few months of constantly watching the Estimated Delivery Date and finally receiving ownership on 11/12/2021. The car has been great to drive but over the holidays I started getting a strange error.

The Error – DI_a138

On 12/29/2021 I pulled my Model Y out of the garage and helped my mother in law in to take to the doctors. After getting back in and putting into park I received a strange error.

Power Reduced – OK to drive
Front motor temporarily disabled
DI_a138

Power Reduced – OK to drive
Front motor temporarily disabled
DI_a138 Power Reduced – OK to drive

I figured this was just a fluke but created a technical support case with Tesla on it via https://www.tesla.com/support. They checked and did not see anything but recommended I create a Service Appointment to request a virtual diagnostic if it persisted. It did and I did!

The Visit

After some back and forth chatting on the app, it was requested I drop off the vehicle early before the scheduled appointment for this issue to diagnose the drive unit and it would need to be there for 72-96 hours. I had hoped more could be done virtually but I got it and brought it in.

I spoke with the Service Advisor and he consulted with the lead tech. The Service Advisor then told me he had great news. They had already pulled logs and were ordering a new drive unit just incase and I was free to go with my vehicle and would not need to come back until further notice, likely when parts arrived.

This was really neat. I’ve heard the service visits being a mixed bag. Many times on the groups you only hear complaints but in the marketing materials you hear AI powering automatic parts ordering and virtual diagnosis. In this case while it would have been nice to save the drive I didn’t mind it. It also also nice for them not to need to keep my vehicle until parts arrived, particularly with the shortages happening these days.

More DI_a138 Details

So this was an interesting one. There is a higher probability of this happening when running errands sometimes and quick in and outs. This never happens on a “cold” drive where the car has sat for a bit. It appears that when some pre-drive check is done on the drive unit. It hits an error it doesn’t understand or some safety check and shuts down the front motor.

After finding some other articles in this happening in the past I realized I could just get out and back in and it goes away and drives with both motors enabled.

In some Model 3’s and Y’s from 2019 and 2020 there was a glitch where the motor needed to vent heat and it got confused. If this is the case it may just be a regression on the latest software. At this time it is 2021.44.30

Updates

As I hear more I’ll update this article but overall a great first experience with the service center.

1/19/2022 – Brought it in this morning and the short version is known firmware issue that’s now fixed.

“Performed diagnostics for front drive unit, Performed inspections of harness and all connectors and connector pins. No issues found. This was a past firmware issue that has been resolved with firmware 2021.44.30.6 and vehicle is now 1 firmware higher. test drove vehicle to verify no further issues.”

Vulnerability Management With Greenbone aka OpenVAS

Summary

It seem every week we see a new article of some sort of IT security related incident. While many of these larger companies tend to have security tools to help combat them, many smaller companies do not know where to start. An eye opening experience would be to run an internal vulnerability scan on your corporate network to see what types of vulnerabilities you have.

These scans results may then drive you to more promptly patch and mitigate vulnerabilities. Other times it can help give data to management to ask for resources like security tools and people to manage them.

What is This Article?

This article is a quick and dirty install guide for installing Greenbone Vulnerability Management on Kali Linux. Both have been around for quite some time and are free to install. You can attempt to install Greenbone on any Linux distrubution but Kali tends to be the easier and more stable platform since its very commonly used for vulnerability scanning and management.

What you’ll need

  • Kali Linux Iso – https://www.kali.org/downloads/
  • A Server (Physical or VM) to install that has 2GB RAM, 1 Core and 20GB HDD space
  • Basic understanding of installing Linux

Installing Kali Linux

We won’t get too much into the details of this but if you’re a first time installer, select the Graphical Install and you can’t go too wrong with most of the defaults.

Kali Linux - Install
Kali Linux – Install

You can install a Desktop but for the purpose of this tutorial, I’ll be using command line so I unselected the Desktops. It saves space and a little compute.

Kali - Desktop Environment
Kali – Desktop Environment

Prepping for Greenbone Vulnerability Management

I always like to start out with a freshly updated operating system. Since Kali is based off Debian we’ll be using apt-get. The first step is to configure network access.

# This will tell us the interface name, most likely eth0
ifconfig -a

# This will allow us to edit the interfaces file
sudo nano /etc/network/interfaces

We can then populate it with eth0 information for DHCP

sudo nano /etc/network/interfaces

The new lines are the “auto eth0” and “iface eth0 inet dhcp”.

Control X to save and then a “sudo systemctl restart networking” and your ifconfig -a should show an ip address for eth0.

From here we can update the system. Its always best to update before the install to ensure you’re running the latest fixes.

# We're going to switch to root since the next few commands will need to be under that context
sudo su - root

# This command updates the package metadata
apt-get update

# This command performs the actual update
apt-get upgrade

# Reboot when done to start fresh (optional but should be done at some point)
reboot

Depending on your internet connection, this could take a few minutes.

Installing and GVM

Installing is the simplest part.

sudo apt-get install gvm

And wait a few minutes after answering “Y” to download and install.

Next we’ll run “sudo gvm-setup” which aside from running through the configuration will start pulling down the various feeds. This will take a good 30-40 minutes. It does an rsync on community feeds for CVEs, Vulnerabilities and other scanner options.

Once it has completed this process, we do need to modify it to listen on 0.0.0.0 instead of 127.0.0.1 so that it is network accessible. Before that though it will output the admin username and password which we need to capture before proceeding. It is randomized for every install and since this install will get blown away after this lab, I’ll share it unfiltered.

Kali -GVM – Password

We can do that by running the following to edit the systemd startup file

sudo systemctl edit --full greenbone-security-assistant

This 127.0.0.1 needs to change to 0.0.0.0

systemctctl edit --full greenbone-security-assistant
systemctctl edit –full greenbone-security-assistant

From here a “sudo gvm-start” is all that is needed.

If all went well you should be able to https:// to the IP of the server on port 9392

With any luck aside from a certificate error the following screen should show up.

gsa - login
gsa – login

Configuring a Scan

Now that we have access to the tool, we need to configure a scan. This starts with configuring a target list. Under Configuration / Targets. Then create a new target

gsa - target - new
gsa – target – new
gsa - new -target
gsa – new -target

This requires just a few fields entered. The name is fairly self explanatory and is just a name for the targets to be scanned. For the list of hosts, you can list subnets and/or IPs. In this example I picked the scanner itself.

For the PortList, The default can be used but when possible I like to scan All TCP and Nmap top 100 UDP ports as many times applications run on non standard ports.

If you feel comfortable providing your scanner credentials it can login and perform authenticated vulnerability checks as some may not be able to be detected anonymously.

Under Scans / Tasks we want to create a new Scan Task

GSA - Scan Task
GSA – Scan Task

Once that has been created and saved a scan can be kicked off.

GSA - Run Scan
GSA – Run Scan

When running a scan it can take some time and that is determined by the remote endpoint but it is not uncommon for it to take 30 minutes per IP address depending on how quickly the endpoint can be scanned and what it finds. By default it does have quite a bit of concurrency and in order to fully utilize that you may have to tune your CPU and RAM configurations but that is out of scope for this document.

Once a scan is complete you have a report. Unfortunately this scan did not turn up much due to it being already fairly secure but this is what you can expect to see!

gsa - report
gsa – report

Final Words

That is it! You now have a vulnerability scanner you can use to scan your network for vulnerabilities. If you choose to use this for your vulnerability management needs, tweaks will need to be applied to port ranges, hosts and other settings but in a nutshell this is the nuts and bolts of what you need.

Spinning Up Kubernetes Easily with kubeadm init!

Summary

Some time ago when I was just learning Kubernetes, I wrote a series of articles that started with Intro To Kubernetes. This was on an earlier version (1.14) where many of these manual steps worked. Spinning up a new instance on 1.17, these steps really did not hold up and the cluster was not fully functional. I decided to give kubeadm init a try and it made my life infinitely easier.

Google does have a great install guide here that attempts to work for various operating systems but this article is specific to PhotonOS. Here is Google’s guide – https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

Provisioning

Many of the provisioning steps in Intro To Kubernetes are similar, if not the same. Instead of manually starting services on hardware though, kubeadm spins up a bit of it as pods which saves time.

For this lab we need a VM we will name kcp1 (Kubernetes Control Plane 1). Since most of the master/slave terminology is going by the wayside to be more sensitive to what it represented in the past, what was previously referred to as master is now a control plane. The nodes were almost always referred to as worker nodes so it makes sense.

kcp1 needs 2GB of RAM and 2 cores/vCPUs. kwn1 can get away with 1GB RAM and 1 core. For storage, since this is a test lab, 8GB is more than sufficient. For the OS, I still am using VMware Photon OS version 3. I choose the ISO and its a quick install – https://vmware.github.io/photon/. PhotonOS was chosen because it is highly optimized for being a VM and minimized to be a very light weight OS to run docker containers. These traits make it perfect for running Kubernetes.

Installing Packages

For this we will need the kubernetes and kubernetes-kubeadm packages installed. It also requires iptables and docker but that comes installed even on the minimal install.

tdnf install docker kubernetes kubernetes-kubeadm
tdnf install docker kubernetes kubernetes-kubeadm

Install and we’re off to the races! We’ll need this on kcp1 and kwn1.

Firewall Rules

In order for key parts of it to work, we’ll need to open a few firewall rules. This is done by editing /etc/systemd/scripts/ip4save and adding a few lines and then restarting iptables.

kcp1 needs the following line. This is for the API calls which must of the cluster makes to the control plane nodes.

-A INPUT -p tcp -m tcp --dport 6443 -j ACCEPT

kcp1 & kwn1 need the following. The first two are for Flannel’s Pod Network overlay. The last one is for the Kubelet service which runs on all nodes. Google provides a listing here of all ports but with Photon this is all we need – https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/

-A INPUT -p udp -m udp --dport 8285 -j ACCEPT
-A INPUT -p udp -m udp --dport 8472 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 10250 -j ACCEPT

It should look something like this afterwards

/etc/systemd/scripts/ip4save

Then a simple restart to make the rules take effect.

systemctl restart iptables

kubeadm init Magic!

The magic here is kubeadm init. We need to use something like flannel for the pod network overlay. We don’t get into that in this article but if we don’t pass the right flags for that into kubeadm init, it won’t work. Per flannel

This needs to first be run on the control plane node.

kubeadm init --pod-network-cidr=10.244.0.0/16

On our first run we get some errors about docker not running. We need to enable and start it!

systemctl enable docker
systemctl start docker

Giving kubeadm init another run and its off! It may appear to hang for a minute and then spew out lines of actions its performing.

It will then tell you some next steps. It may seem like a lot.

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.83.15:6443 --token xspmbf.kko3d24t74rpaeyo \
    --discovery-token-ca-cert-hash sha256:483e0e6ef33580b2c8f1e61210e3f50d8163dc6312b5d89940e38236cd2f04b6 

For the most part these are copy and paste. The first three lines literally are so we’ll do that so we can use kubectl.

Minimal Configuration

One of the big remaining steps is to deploy a pod network, usually an overlay. As we mentioned earlier, I prefer flannel and it usually just works.

This just needs to be done via kubectl once. It instantiates a DaemonSet which essentially pushes this out to every node that attaches to the cluster.

kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Over on kwn1 we can run the join command it listed. It will take a little bit of time and then be done. Once done it recommends doing a “kubectl get nodes”.

root@kcp1 [ ~ ]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kcp1 Ready master 6m35s v1.17.11
kwn1 Ready 32s v1.17.11

Validation

You can also check all the running pods to see the health. All is well and pretty turn key.

kubectl get pods -A
kubectl get pods -A

Final Thoughts

We have a working cluster that was fairly turn key and much easier than my earlier attempts. If you are fairly new to Kubernetes I would recommend deploying the dashboard which I outline Kubernetes Dashboard. This is mostly correct except that the version 2.0 is no longer beta and has a final release that works pretty well now.

Demystifying Linux Auditd

Summary

My first real exposure to auditd has been deploying Rapid7 InsightIDR. Its been a great tool but I did not understand why they require auditd to be disabled. Is it not more secure having an audit daemon running? After a server had rebooted to completely disable this, it ran into a ton of performance issues and my path lead me down better understanding auditd.

What is Auditd?

At a high level it is a piece of software that allows you to view and audit actions being performed on the operating system. These actions can be directory listing, program execution, audit policy changes.

How is this Facilitated?

There are two pieces to the puzzle. There is a kernel based kauditd which queues up audit events for consumption. Various syscalls and other events cause an audit event to trigger.

These events are consumed via a userland auditd process. This process enables the auditing (via auditctl) and registers its PID and starts consuming events pushed through a PIPE. by default these events are simply logged to /var/log/audit/audit.log.

In the event that the auditd can’t consume these properly there is an audisp process that handles the overflow. Perhaps the auditd crashes but audit events are still pumped to the PIPE. Audisp will generically pump these to syslog which may push to /var/log/messages instead.

Auditctl

You can use the auditctl command to view the status or add rules. These rules would allow you to monitor extra resources or ignore certain ones.

auditctl -s

Why Does Rapid7 Want This Disabled?

The Rapid7 ir_agent has its own consumer process that scans the auditd events. Unfortunately (unless compatibility mode is enabled), only one process can consume these and Rapid7 wants its agent to be the consumer. It does so in an interesting way though. kauditd knows events can’t always be consumed in realtime so there are provisions for a backlog buffer which Rapid7 increases. It does this so a background job can run periodically to pull these events from the queue instead of listening in realtime. I’m not sure of the benefit of this method but it seems to work.

Without auditd disabled and stopped though, no others can listen, including Rapid7.

# auditctl -s
enabled 0
failure 0
pid 0
rate_limit 0
backlog_limit 65752
lost 0
backlog 0

Here you can see, it is currently disabled and no pid is registered. Importantly though you can see the high backlog_limit (buffer). It is normally under 300 because auditd is running constantly.

So Why Did Our Server Blow Up?

Well, in our case its a server that generates a ton of audit logs because it kicks off tens of processes per second. It was easy to dump those to a audit.log file but a security tool actually parsing those just couldn’t keep up. We will either need to filter some of those auditd events or disable the agent.

Summary

If you ever need to use auditd, hopefully this gives you a basic insight into it.