Hashicorp Vault + Vault Secret Operator + GCP for imagePullSecrets

Summary

The need for this mix of buzz words is for a very specific use case. For all of my production hosting I use Google Cloud. For my local environment its podman+kind provisioned by Terraform.

Usually to load container images I will build them locally and push into kind. I do this to alleviate the requirement of an internet connection to do my work. But it got my thinking, if I wanted to, couldn’t I just pull from my us.gcr.io private repository?

Sure – I could load a static key in place but I’d likely forget and that could be an attack vector for compromise. I decided to play with Vault to see if I could accomplish this. Spoiler, you can but there aren’t great instructions for it!

Why Vault?

There are a great many articles on why Vault or a secret manager is a great idea. What it comes down to is minimizing the time a credential is valid and to do that using more short lived credentials so if it gets compromised, the longevity of that compromise will be minimized.

Vault Setup

I will not go into full details on the setup but Vault was deployed via helm chart into the K8s cluster and using this guide from HashiCorp to enable gcp secrets

Your gcpbindings.hcl will need to look something like this at a minimum. You likely don’t need the roles/viewer.

 resource "//cloudresourcemanager.googleapis.com/projects/woohoo-blog-2414" {
        roles = ["roles/viewer", "roles/artifactregistry.reader"]
      }

For the roleset, I called mine “app-token” which you will see later.

The values I used for vault’s helm chart were simply as follows because I don’t need the injector and I don’t think it would even work for what we’re trying to do.

#vault values.yaml
injector:
  enabled: "false"

For the Vault Secret Operator it was simply these values as vault was installed in the default namespace. I did this for simplicity just to get it up and running. A lot of the steps I will share ARE NOT BEST PRACTICES but will help you get it up quickly and then be able to learn best practices. This includes disabling client caching and encryption on the storage (which is a default BUT NOT BEST PRACTICE). Ideally client caching is enabled to have near zero downtime upgrades and therefore encrypting the cache in transit and at rest.

defaultVaultConnection:
  enabled: true
  address: "http://vault.default.svc.cluster.local:8200"
  skipTLSVerify: false

Vault Operator CRDs

First we will start with a VaultConnection and Vault Auth. This is how the Operator will connect with vault.

apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultConnection
metadata:
  name: vault-connection
  namespace: default
spec:
  # required configuration
  # address to the Vault server.
  address: http://vault.default.svc.cluster.local:8200
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultAuth
metadata:
  name: static-auth
  namespace: default
spec:
  vaultConnectionRef: vault-connection
  method: kubernetes
  mount: kubernetes
  kubernetes:
    role: test
    serviceAccount: default

The test role attaches to a policy called test policy that looks like this

path "gcp/roleset/*" {
    capabilities = ["read"]
}

This allows us to read the “gcp/roleset/app-token/token” path. Above should likely be more specific such as “gcp/roleset/app-token/+” to lock it down to specific tokens wanting to be read.

All of this to get us to the VaultStaticSecret CRD.

apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
  annotations:
    imageRepository: us.gcr.io
  name: vso-gcr-imagepullref
spec:
  # This is important, otherwise it will try to pull from gcp/data/roleset
  type: kv-v1

  # mount path
  mount: gcp

  # path of the secret
  path: roleset/app-token/token

  # dest k8s secret
  destination:
    name: gcr-imagepullref
    create: true
    type: kubernetes.io/dockerconfigjson
    #type: Opaque
    transformation:
      excludeRaw: true
      excludes:
        - .*
      templates:
        ".dockerconfigjson":
          text: |
            {{- $hostname := .Annotations.imageRepository -}}
            {{- $token := .Secrets.token -}}
            {{- $login := printf "oauth2accesstoken:%s" $token | b64enc -}}
            {{- $auth := dict "auth" $login -}}
            {{- dict "auths" (dict $hostname $auth) | mustToJson -}}

  # static secret refresh interval
  refreshAfter: 30s

  # Name of the CRD to authenticate to Vault
  vaultAuthRef: static-auth

The bulk of this is in the transformation.templates section. This is the magic. We can easily pull the token but its not in a format that Kubernetes would understand and use. Most of the template is to format correctly the mirror the dockerconfigjson format.

To make it more clear, we use an annotation to store the repository hostname.

Incase the template text is a little confusing, a more readable version of this template text section would be as follows.

{{- $hostname := "us.gcr.io" -}}
{{- $token := .Secrets.token -}}
{{- $login := printf "oauth2accesstoken:%s" $token | b64enc -}}
{
  "auths": {
    "{{ $hostname}}": {
      "auth": "{{ $login }}"
    }
  }
}

Apply the manifest and if all went well you should have a secret named “gcr-imagepullref” which you can use in your “imagePullSecrets” section of the manifest.

In Closing

In closing, we leveraged gcp secrets engine and kubernetes auth to attain time limited OAuth tokens and inject into a secret to use for pulling images from a private repository. There are a number of times you may want to do something like this such as when you’re multicloud but want to utilize one repository or have on-premise clusters but want to use your cloud repository. Instead of just pulling a long lived key, this will be more secure and minimize attack vector.

Following some of the best practices will also help that as well such as limiting the scope of roles and ACLs and enabling encryption on the storage and transmission of the data.

For more on the transformation templating, you can go here.

Further Down AI Powered Chatbot Rabbit Hole

Summary

In my previous article Chatbots, AI and Docker! I talked about a little of the theory behind this but for this article I wanted to fully go down the rabbit hole and produced my own chatbot. To do this, I had to find an updated chatterbot fork, learn a little more python to handle dependencies better, create my own fork of corpus/training material and learn Google Cloud Run. Ultimately you can skip straight to the source if you like. That’s the great part of GitOps/IaC

And Then Some

Previously, I had a workable local instance that I was able to host in podman/kind but I wanted to put this in my hosting environment on Google. In order to personalize this, I wanted to be able to add some training data and use some better practices. Having previously used Google App Engine, assumedly that would naturally be the landing place for this. I then ran into some hiccups and came across Cloud Run which was not originally available and seemed like a suitable fit as it is built for containerized workflows. It provided me a way to use my existing Dockerfile to unify the build and deploy. For tools, I have a separate build and test workflow in my cloudbuild.yaml.

Get on with Chatbots!

In my last article, I mentioned I had to find a fork of chatterbot because it has not been recently maintained. In reality though it only allowed command line prompting which is not terribly useful for a wider audience to test. I came across this amazing medium post which I have to give full credit for (and do in the html as well). The skin is pretty amazing. It also provides a wealth of in depth details.

For the web framework, I opted to use Flask and gunicorn which was fairly trivial to get going after finding that great medium post above.

Training Data

Without any training data AI/ML does not really exist. It needs to be pre-trained and/or train “on the job”. For this, chatterbot-corpus comes into play. This is a pre-built training data set for the chatterbot library. It has some decent basic training. I wanted to be able to add my own and based on the input of casmith, its in python so shouldn’t it be able to converse with Monty Python quotes? So I did and created my own section for that.

categories:
- Monty
- humor
conversations:
- - What is your name?
  - My name is Sir Lancelot of Camelot.
- - What is your quest?
  - To seek the Holy Grail.
- - What is the air speed velocity of an unladen swallow?
  - What do you mean? An African or European swallow?
- - How do know so much about swallows?
  - Well, you have to know these things when you're a king, you know.

I have the real-time training disabled or rather put my chatterbot into read only mode because the internet can be a cruel place and I don’t need my creation coming home with a foul mouth! For my lab, the training is loaded at image creation time. This is primarily because its using the default sqlite back end. I could easily use a database for this and load the training out of band so it doesn’t require a deploy.

Logic Adapters

You may be thinking this is a simple bot that’s just doing string matching to figure out how to respond. For the most part you’re correct. This is not deep learning and it doesn’t fully understand what you are asking. With that said its very extensible with multiple logic adapters. The default is a “BestMatch” based on text input. Others allow it to report time and do math. It will weigh the confidence of the response on each adapter to let the highest scoring/weighing response win. Pretty neat!

chatbot = ChatBot(
    "Sure, Not",
    logic_adapters=[
        'chatterbot.logic.BestMatch',
        'chatterbot.logic.MathematicalEvaluation',
        'chatterbot.logic.TimeLogicAdapter'
    ],
    read_only=True
)

Over To The Infrastructure

For all of this, it starts with a Dockerfile. I already had this but it was a little bloated with build dependencies. Therefore, I created a multistage image using virtual python environment as guided by https://pythonspeed.com/articles/multi-stage-docker-python. I am not new to multistage images. My golang images use it. I was, however, new to doing this with Python. Not only did it reduce my image size down 100MB but it also removed 30 vulnerabilities from the images because of a dependency on git for some of the python libraries.

Cloud Run

To get deployed to Cloud Run, it was pretty simple although there were a few trial an errors due to permissions. The service account needed Cloud Run Admin access. Aside from that, this pumped everything through and let me keep my singular Dockerfile.

steps:
  # Docker Build
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', 
           'us.gcr.io/${PROJECT_ID}/chatbot:${SHORT_SHA}', '.']

  # Docker push to Google Artifact Registry
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push',  'us.gcr.io/${PROJECT_ID}/chatbot:${SHORT_SHA}']

  # Deploy to Cloud Run
  - name: google/cloud-sdk
    args: ['gcloud', 'run', 'deploy', 'chatbot', 
           '--image=us.gcr.io/${PROJECT_ID}/chatbot:${SHORT_SHA}', 
           '--region', 'us-central1', '--platform', 'managed', 
           '--allow-unauthenticated', '--port', '5000', '--memory', '256Mi',
           '--no-cpu-boost']

# Store images in Google Artifact Registry 
images:
  - us.gcr.io/${PROJECT_ID}/chatbot:${SHORT_SHA}

It really was this simple since I had a working local environment and working Dockerfile. Just don’t look at my commit history 🙂 Quite a few silly mistakes were made if you look deep enough.

Caveat

Google App Engine lets you use custom domain mapping and bring your own certificates. I use Cloudflare to protect my entire environment and for this in GAE I placed a Cloudflare Origin certificate to help prevent it from being accessed by the outside world as no browser would trust it bypassing Cloudflare.

Google Cloud run has a preview feature of custom domain mapping. The easiest of the options doesn’t support custom certificates and therefore wants to issue you a certificate. The temp workaround for this is to not proxy through Cloudflare until the certificate is issued and then turn on proxy. Rinse and repeat yearly when the cert needs to be renewed.

I have to imagine this will get rectified once out of preview to be feature parity with Google App Engine since it seems Cloud Run intends to replace GAE.

Credits

For Multi-stage help with Python Docker Images – https://pythonspeed.com/articles/multi-stage-docker-python

For the entire UI of this demo/test – https://medium.com/@kumaramanjha2901/building-a-chatbot-in-python-using-chatterbot-and-deploying-it-on-web-7a66871e1d9b

Chatbots, AI and Docker!

Summary

I have started my learning journey about AI. With that I started reading Artificial Intelligence & Generative AI for Beginners. One of the use cases it went through for NLP (Natural Language Processing) was Chatbots.

To the internet I went – ready to go down a rabbit hole and came across a Python library called ChatterBox. I knew I did not want to bloat and taint my local environment so I started using a Docker instance in Podman.

Down the Rabbit Hole

I quickly realized the project has not been actively maintained in a number of years and had some specific and dated dependencies. For example, it seemed to do best with python 3.6 whereas the latest at the time if this writing is 3.12.

This is where Docker shines though. It is really easy to find older images and declare which versions you want. The syntax of Dockerfile is such that you can specify the image and layer the commands you want to run on it. It will work every time, no matter where it is deployed from there.

I eventually found a somewhat updated fork of it here which simplified things but it still had its nuances. chatterbox-corpus (the training data) required PyYaml 3.13 but to get this to work it needed 5.

Dockerfile

FROM python:3.6-slim

WORKDIR /usr/src/app

#COPY requirements.txt ./
#RUN pip install --no-cache-dir -r requirements.txt
RUN pip install spacy==2.2.4
RUN pip install pytz pyyaml chatterbot_corpus
RUN python -m spacy download en

RUN pip install --no-cache-dir chatterbot==1.0.8

COPY ./chatter.py .

CMD [ "python", "./chatter.py" ]

Here we can see, I needed a specific version of Python(3.6) whereas at the time of writing the latest is 3.12. It also required a specific spacy package version. With this I have a repeatable environment that I can reproduce and share (to peers or even to production!)

Dockerfile2

Just for grins, when I was able to use the updated fork it did not take much!

FROM python:3.12-slim

WORKDIR /usr/src/app

#COPY requirements.txt ./
#RUN pip install --no-cache-dir -r requirements.txt
RUN pip install spacy==3.7.4 --only-binary=:all:
RUN python -m spacy download en_core_web_sm

RUN apt-get update && apt-get install -y git
RUN pip install git+https://github.com/ShoneGK/ChatterPy

RUN pip install chatterbot-corpus

RUN pip uninstall -y PyYaml
RUN pip install --upgrade PyYaml

COPY ./chatter.py .

CMD [ "python", "./chatter.py" ]

Without Docker

Without Docker(podman) I would have tainted my local environment with many different dependencies. At the point of getting it all working, I couldn’t be sure it would work properly on another machine. Even if it did, was their environment tainted as well? With Docker, I knew I could easily repeat the process from a fresh image to validate.

Previous projects I worked on that were python related could have also tainted my local to cause unexpected results on other machines or excessive hours troubleshooting something unique to my machine. All of that avoided with a container image!

Declarative Version Management

When it becomes time to update to the next version of Python, it will be a really easy feat. Many tools will even parse these types of files and do dependency management like Dependabot or Snyk

Mozilla SOPS To Protect My cloudflared Secrets In Kubernetes

Summary

Aren’t these titles getting ridiculous? When talking about some of these stacks, you need a laundry list of names to drop. In this case I was working on publishing my CloudFlare Tunnels FTW work that houses my kind lab into my public GitHub Repository. I wanted to tie in FluxCD to it and essentially be able to easily blow away the cluster and recreate with secrets all through FluxCD.

I was able to successfully achieve that with all but the private key which needs to be manually loaded into the cluster so it can decrypt the sensitive information.

Why Do We Care About This?

While trying to go fully GitOps for Kubernetes, everything is stored in a Git Repository. This makes change management extremely simple and reduces complexities of compliance. Things like policy bots can automate change approval processes and document. But generally everything in Git is clear text.

Sure, there are private repositories but do all the the developers that work on the project need to read sensitive records like passwords for that project? Its best that they don’t and as a developer you really don’t want that responsibility!

Mozilla SOPS To The Rescue!

Mozilla SOPS is very well documented. In my case I’m using Flux which also has great documentation. For my lab, this work is focusing on “cluster3” which simply deploys my https://www.woohoosvcs.com and https://tools.woohoosvcs.com in my kind lab for local testing before pushing out to production.

Create Key with Age

Age appears to be the preferred encryption tool to use right now. It is pretty simple to use and going by the flux documentation we simply need to run

age-keygen -o age.agekey

This will create a file that contains both the public and private key. The public key will be in the comment and the command line will output the public key. We will need the private key later to add as a secret manually to decrypt. I’m sure there are ways of getting this into the cluster securely but for this blog article this is the only thing done outside of GitOps.

Let’s Get To the Details!

With Flux I have a bootstrap script to load flux into the environment. I also have a generate_cluster3.sh script that creates the yaml.

The pertinent lines to add to it above the standard are the following. The first line indicates that sops is a decryption provider. The second indicates the name of the secret to be stored. Flux requires this to be in the flux-system namespace

    --decryption-provider=sops \
    --decryption-secret=sops-age \

From there you simpley need to run the bootstrap_cluster3.sh which just loads the yaml manifests for flux. With flux you can do this on the command line but I preferred to have this generation and bootstrapping in Git. As you want to upgrade flux there’s also a upgrade_cluster3.sh script that is really a one liner.

flux install --export > ./clusters/cluster3/flux-system/gotk-components.yaml

This will update the components. If you’re already bootstrapped and running flux, you can run this and commit to push out the upgrades to use flux to upgrade itself!

In the root of the cluster3 folder I have .sops.yaml. This tells the kustomization module in flux what to decrypt and which public key to use.

Loading Private Key Via Secret

Once you have run the bootstrap_cluster3.sh you can then load the private key via

cat age.agekey | kubectl create secret generic sops-age \
  --namespace=flux-system --from-file=age.agekey=/dev/stdin

Caveat

This lab won’t work for you out of the box. This is because it requires a few confidential details

  1. My cloudflared secret is encrypted with my public key. You do not have my private key so you cannot load it into your cluster to decrypt it
  2. I have some private applications I am pushing into my kind cluster. You will have to clone and modify for your needs

Kubernetes SSL Configuration

Summary

Picking up where we left off in the Initializing Kubernetes article, we will now be setting up certificates! This will be closely following the Kubernetes “Certificates” article. Specifically using OpenSSL as easyrsa has some dependency issues with Photon.

OpenSSL

Generating Files

We’ll be running the following commands and I keep them in /root/kube/certs. They won’t remain there but its a good staging area that needs to be cleaned up or secured so we don’t have keys laying around.

openssl genrsa -out ca.key 2048
openssl req -x509 -new -nodes -key ca.key -subj "/CN=$192.168.116.174" -days 10000 -out ca.crt
openssl genrsa -out server.key 2048

We then need to generate a csr.conf

[ req ]
default_bits = 2048
prompt = no
default_md = sha256
req_extensions = req_ext
distinguished_name = dn

[ dn ]
C = <country>
ST = <state>
L = <city>
O = <organization>
OU = <organization unit>
CN = <MASTER_IP>

[ req_ext ]
subjectAltName = @alt_names

[ alt_names ]
DNS.1 = kubernetes
DNS.2 = kubernetes.default
DNS.3 = kubernetes.default.svc
DNS.4 = kubernetes.default.svc.cluster
DNS.5 = kubernetes.default.svc.cluster.local
IP.1 = <MASTER_IP>
IP.2 = <MASTER_CLUSTER_IP>

[ v3_ext ]
authorityKeyIdentifier=keyid,issuer:always
basicConstraints=CA:FALSE
keyUsage=keyEncipherment,dataEncipherment
extendedKeyUsage=serverAuth,clientAuth
subjectAltName=@alt_names

In my environment the MASTER_IP is 192.168.116.174 and the cluster IP is usually a default but we can get it by running kubectl

root@kube-master [ ~/kube ]# kubectl get services kubernetes
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.254.0.1   <none>        443/TCP   60m
[ req ]
default_bits = 2048
prompt = no
default_md = sha256
req_extensions = req_ext
distinguished_name = dn

[ dn ]
C = US
ST = Texas
L = Katy
O = Woohoo Services
OU = IT
CN = 192.168.116.174

[ req_ext ]
subjectAltName = @alt_names

[ alt_names ]
DNS.1 = kubernetes
DNS.2 = kubernetes.default
DNS.3 = kubernetes.default.svc
DNS.4 = kubernetes.default.svc.cluster
DNS.5 = kubernetes.default.svc.cluster.local
IP.1 = 192.168.116.174
IP.2 = 10.254.0.1

[ v3_ext ]
authorityKeyIdentifier=keyid,issuer:always
basicConstraints=CA:FALSE
keyUsage=keyEncipherment,dataEncipherment
extendedKeyUsage=serverAuth,clientAuth
subjectAltName=@alt_names

We then run

openssl req -new -key server.key -out server.csr -config csr.conf

openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key \
-CAcreateserial -out server.crt -days 10000 \
-extensions v3_ext -extfile csr.conf

# For verification only
openssl x509  -noout -text -in ./server.crt

Placing Files

I create a /secrets and moved the files in as follows

mkdir /secrets
chmod 700 /secrets
chown kube:kube /secrets

cp ca.crt /secrets/
cp server.crt /secrets/
cp server.key /secrets/
chmod 700 /secrets/*
chown kube:kube /secrets/*

Configure API Server

On the master, edit /etc/kubernetes/apiserver and add the following parameters

--client-ca-file=/secrets/ca.crt
--tls-cert-file=/secrets/server.crt
--tls-private-key-file=/secrets/server.key

KUBE_API_ARGS="--client-ca-file=/secrets/ca.crt --tls-cert-file=/secrets/server.crt --tls-private-key-file=/secrets/server.key"

Restart kube-apiserver. We also need to edit /etc/kubernetes/controller-manager

KUBE_CONTROLLER_MANAGER_ARGS="--root-ca-file=/secrets/ca.crt  --service-account-private-key-file=/secrets/server.key"

Trusting the CA

We need to copy the ca.crt to /etc/ssl/certs/kube-ca.pem on each node and then install the package “openssl-c_rehash” as I found here. Photon is very minimalistic so you will find you keep having to add packages for things you take for granted.

tdnf install openssl-c_rehash

c_rehash
Doing //etc/ssl/certs
link 3513523f.pem => 3513523f.0
link 76faf6c0.pem => 76faf6c0.0
link 68dd7389.pem => 68dd7389.0
link e2799e36.pem => e2799e36.0
.....
link kube-ca.pem => 8e7edafa.0

Final Words

At this point, you have a Kubernetes cluster setup with some basic security. Not very exciting, at least in terms of seeing results but the next article should be meaningful to show how to setup flannel.

Next – Flannel Configuration

Initializing Kubernetes

Summary

In my previous article Intro To Kubernetes, we walked through installing dependencies and setting the stage for initializing Kubernetes. At this point you should have a master and one or two nodes with the required software installed.

A Little More Configuration

Master Config Prep

We have just a little more configuration to do. On kube-master we need to change “/etc/kubenertes/apiserver” lines as follows. This allows other hosts to connect to it. If you don’t want to bind to 0.0.0.0 you could bind to the specific IP but would lose localhost binding.

# From this
KUBE_API_ADDRESS="--insecure-bind-address=127.0.0.1"

# To this
KUBE_API_ADDRESS="--address=0.0.0.0"

Create the Cluster Member Metadata

Save the following as a file, we’ll call it create_nodes.json. When standing up a cluster I like to start out with doing it on the master so I create a /root/kube and put my files in there for reference.

{
     "apiVersion": "v1",
     "kind": "Node",
     "metadata": {
         "name": "kube-master",
         "labels":{ "name": "kube-master-label"}
     },
     "spec": {
         "externalID": "kube-master"
     }
 }

{
     "apiVersion": "v1",
     "kind": "Node",
     "metadata": {
         "name": "kube-node1",
         "labels":{ "name": "kube-node-label"}
     },
     "spec": {
         "externalID": "kube-node1"
     }
 }

{
     "apiVersion": "v1",
     "kind": "Node",
     "metadata": {
         "name": "kube-node2",
         "labels":{ "name": "kube-node-label"}
     },
     "spec": {
         "externalID": "kube-node2"
     }
 }

We can then run kubectl to create the nodes based on that json. Keep in mind this is just creating metadata

root@kube-master [ ~/kube ]# kubectl create -f /root/kube/create_nodes.json
node/kube-master created
node/kube-node1 created
node/kube-node2 created

# We also want to "taint" the master so no app workloads get scheduled.

kubectl taint nodes kube-master key=value:NoSchedule

root@kube-master [ ~/kube ]# kubectl get nodes
NAME          STATUS     ROLES    AGE   VERSION
kube-master   NotReady   <none>   88s   
kube-node1    NotReady   <none>   88s   
kube-node2    NotReady   <none>   88s   

You can see they’re “NotReady” because the services have not been started. This is expected at this point.

All Machine Config Prep

This will be run on all machines, master and node. We need to edit “/etc/kubernetes/kubelet”

KUBELET_ADDRESS="--address=0.0.0.0"
KUBELET_HOSTNAME=""

Also edit /etc/kubernetes/kubeconfig

server: http://127.0.0.1:8080

# Should be

server: http://kube-master:8080

In /etc/kubernetes/config

KUBE_MASTER="--master=http://kube-master:8080"

Starting Services

Master

The VMware Photon Kubernetes guide we have been going by has the following snippit which I want to give credit to. Please run this on the master

for SERVICES in etcd kube-apiserver kube-controller-manager kube-scheduler kube-proxy kubelet docker; do
     systemctl restart $SERVICES
     systemctl enable $SERVICES
     systemctl status $SERVICES
 done

You can then run “netstat -an | grep 8080” to see it is listening. Particularly on 0.0.0.0 or the expected bind address.

Nodes

On the nodes we are only starting kube-proxy, kubelet and docker

for SERVICES in kube-proxy kubelet docker; do 
     systemctl restart $SERVICES
     systemctl enable $SERVICES
     systemctl status $SERVICES 
 done

Health Check

At this point we’ll run kubectl get nodes and see the status

root@kube-master [ ~/kube ]# kubectl get nodes
NAME          STATUS     ROLES    AGE     VERSION
127.0.0.1     Ready      <none>   23s     v1.14.6
kube-master   NotReady   <none>   3m13s   
kube-node1    NotReady   <none>   3m13s   
kube-node2    NotReady   <none>   3m13s   

Oops, we didn’t add 127.0.0.1 – I forgot to clear the hostname override in /etc/kubernetes/kubelet. Fixed that, restarted kubelet and then “kubectl delete nodes 127.0.0.1”

It does take a while for these to start showing up. The provisioning and orchestration processes are not fast but you should slowly show the version show up and then the status to Ready and here we are.

root@kube-master [ ~/kube ]# kubectl get nodes
NAME          STATUS   ROLES    AGE     VERSION
kube-master   Ready    <none>   9m42s   v1.14.6
kube-node1    Ready    <none>   9m42s   v1.14.6
kube-node2    Ready    <none>   9m42s   v1.14.6

Final Words

At this point we could start some pods if we wanted but there are a few other things that should be configured for a proper bare metal(or virtual) install. Many pods are now depending on auto discovery which uses TLS. Service accounts also need and service accounts are using secrets.

For the networking we will go over flannel which will provide our networking overlay using VXLAN. This is needed so that pods running on each node have a unique and routable address space that each node can see. Right now each node has a docker interface with the same address and pods on different nodes cannot communicate with each other.

Flannel uses the TLS based auto discovery to the ClusterIP. Without hacking it too much it is just best to enable SSL/TLS Certificates and also a security best practice.

root@kube-master [ ~/kube ]# kubectl get services
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.254.0.1   <none>        443/TCP   49m
root@kube-master [ ~/kube ]# kubectl describe services/kubernetes
Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP:                10.254.0.1
Port:              https  443/TCP
TargetPort:        6443/TCP
Endpoints:         192.168.116.174:6443
Session Affinity:  None
Events:            <none>

Next – SSL Configuration

Intro To Kubernetes

Summary

This will be part of a multi-part set of posts on Kubernetes. There are many other technical articles on this but I could not find one that got me end to end to my desired state with Kubernetes. These series of posts will help carry you through my journey at standing it up.

What This Is Not

Currently, this series is not a high level architecture overview. It does not go into detail of the various daemons and their function. I may create a separate article on this at a later date.

Why Kubenertes?

Kubernetes aka k8s, is great at provisioning resources and maintaining them for containerized workloads using Docker. Per the site’s tag line, “Production-Grade Container Orchestration”. It was developed in house by Google and shared with the public. Therefore Google Cloud’s Kubernetes offering is one of the better ones. Docker Swarm is Docker’s response to the need this fills.

Let’s Get Started!

For this series I will be using VMware Photon OS. You are more than welcome to use any distribution you wish although many of the commands may not be the same, particularly the package management commands to install software. I use VMware Fusion but any hypervisor or bare metal systems will suffice. We will be standing up 3 total nodes but you can do with 2 if resources are at a minimum.

We will also be following VMware’s Guide to installing Photon on Kubernetes with a minor tweak.

Installation

Install the OS

If you are looking to install something like Kubernetes it is assumed you are fairly familiar with installing an OS. For this we will need 3 instances of Photon. I am provisioning them with 4GB HDD, 1 core, 768 MB of RAM and removing any excess virtual hardware not needed since the machine I am running this on only has 8GB of RAM and dual core.

The machine names will be kube-master, kube-node1 and kube-node2

For Photon, you can pretty much accept the defaults with the kernel type being the only one you may need to think about. Photon can go on bare metal or even other hypervisors, but it does have a VMware optimized kernel with vm tools if you choose.

Photon Linux Kernel - VMware hypervisor optimized

Photon is very proud of their install times, but it is nice not waiting 10-20 mins for an OS install

Photon install in under 30 seconds

Login to the OS

By default, most recent distributions of Linux, including Photon are locked down. You can login to root at the console but not remotely unless you use ssh keys authentication. For production workloads, I would highly recommend not using the root login and instead using another login and sudo but for the purpose of this lab we will just add my local key to root and be on our way.

Temporarily disable prohibit-password to add key remotely

I personally use ssh-copy-id which is a best practice

dwcjr@Davids-MacBook-Pro ~ % ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/Users/dwcjr/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
Password: 

Number of key(s) added:        1

Now try logging into the machine, with:   "ssh '[email protected]'"
and check to make sure that only the key(s) you wanted were added.

Installing Kubernetes on Master and Nodes

Photon uses tdfn so its quite simple. This is also where we deviate slightly from the instructions. We will be enabling all of the node services on the master so that it can run docker images. We do not want to run actual app images but there is a particular system image we will want to run that I will get into later

On Master and Nodes run the following

tdnf install kubernetes iptables docker

# Good idea to run through updates afterwards as well
tdnf update

Preparing Hosts

Next its a good idea to have a hosts file entry since we will not be using DNS for the scope of these tutorials. These are my IPs in this case.

#Kubernetes
192.168.116.174 kube-master
192.168.116.175 kube-node1
192.168.116.177 kube-node2

We then need to set /etc/kubernetes/config on all hosts to specifically update

KUBE_MASTER="--master=http://kube-master:8080"

On the master, we need to edit “/etc/systemd/scripts/ip4save” to add the following lines

-A INPUT -p tcp -m tcp --dport 8080 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 6443 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 10250 -j ACCEPT

#Then restart iptables.  On photon it doesn't appear to save IP tables between reboots so this is how it persists.

systemctl restart iptables

On the nodes you will need to add a similar line and restart iptables but it will be

-A INPUT -p tcp -m tcp --dport 10250 -j ACCEPT

Ending Note

At this point you do not quite have anything near a functional Kubernetes cluster but this was the first part in a few. I decided to break this article at this point as some people may be able to easily get here without these instructions.

For those that made it here, my next article will link here for the initial Kubernetes Configuration

Next – Initializing Kubernetes