December 2023 - Woohoo Services Blog!

A Look at Telsa Through the Team of Teams Lens

Summary

Going down a reading rabbit hole, I recently read Team of Teams: New Rules of Engagement for a Complex World by General Stanley McChrystal. I was not sure what I was in for with this. Paraphrasing what General Stanley McChrystal wrote, while he was going through the events in the book, he was not sure if what he experienced was a fluke or there was something more to it.

While this article is not a review as there are plenty around, this book did start to move me. I started thinking about other applicable lenses to view this through. In the book, the automotive industry is cited a couple of times. This opened Pandora’s box for me. In it, one of the examples was the GM Ignition Recall that took nearly 10 years to have fixed for a $2 part. Ultimately it was an organizational structure failure. The low level teams had known about this months after the new ignition switches were sent out into the wild and reports had started coming back.

Why Tesla?

It is easy to get wrapped up in the politics and public displays for which Elon Musk is known. Setting that aside, what Tesla is doing is revolutionary for a few reasons. In these times, it is extremely hard to start a new automobile company. Tesla not only a new automobile company but using a fuel source that is not industry standard.

Tesla is different. They are not just an automobile manufacturing company. Elon himself in numerous interviews cites that Tesla is actually a “hardcore” engineering company. They manufacture numerous parts for the vehicle in house as well as write all of the software (Software Engineering).

Outside the scope of this article, they’re also a data mining company. They have driving details on now millions of their vehicles. This has various uses such as road mapping, driving patterns and improving their autonomous driving.

How Legacy Automotive Companies Operate

Many of the legacy automotive manufacturers are extremely siloed using the “reductionist” methodology of breaking down areas into small teams and pushing them for efficiency. There are many different vendors that make components for legacy car companies. They build them to the Original Equipment Manufacturers specifications to ensure interoperability. These vendors do not typically communicate with each other or all of them to understand the whole picture. What this means is that the Engine Control Module (ECM) may be manufactured by one company and the Transmission Control Module (TCM) may be manufactured by another. The software may then be subcontracted out by those vendors. They use interoperability standards but may have little idea of how the Battery Control Module (BCM) interacts with these two modules.

This allows scale and efficiency. Vendor management is a very strong tool to help mitigate concerns. Many, like Toyota are great at this. They many times will have supply manufacturing happen in the same plant as the cars are assembled. Contracts also tend to indicate suppliers have a certain stock of supplies to weather temporary supply chain issues.

How Tesla Operates?

Many of its key components are manufactured in house, such as its seats. This is not to say it does not outsource any manufacturing. It certainly does. One critical piece that Telsa handles in house is to write its own software. This was instrumental in its adaptability during the computer chip shortages of 2020 and onward.

Chip Shortage

During the chip shortages, OEMs could not get their hands on chips. Many of the big ones had lots filled with unfinished vehicles. They were simply waiting on chips to arrive with no end in sight. Cars were delivered without features, in many cases.

Tesla did deal with a delay in production because of this. Its adaptability in writing its software, allowed it to utilize chips that were available. Not only did it adapt its software, Tesla realized it could in some cases reduce the need for some of them. This is very well documented in https://www.utilitydive.com/news/tesla-chip-semiconductor-shortage/628150/

Wrapping it Up

Traditional car manufacturers are very siloed. They are built this way for efficiency and scalability. With this, they are very inflexible and not very adaptable. Many of them are struggling to become profitable on their Electric Vehicles (EVs). Recently even Ford has started to bring its software development in house. This allows for constant updates that are needed without ever having to go into the dealership. Many of the Tesla recalls have been corrected via OTA (Over the Air) updates to software.

Conclusion

In a modern world of complexity, teams cannot work in isolation. They need to be aware of what other teams are doing to have a shared vision. Cognitive load needs to be minimized or information overload will occur but in this new world of complexity and constant information, silos do not work.

IaC & GitOps – Better Together

Summary

Building on the prior article of Fedora CoreOS + Ansible => K8s we want complete Infrastructure As Code. The newest way of doing this is GitOps where nearly everything is controlled by SCM. For that, flux is one of my favorites but Argo will also work.

The benefit of GitOps and K8s is that developers can have complete but indirect access to various environments. This makes it really easy for a DevOps team to provision the tooling very easily to either spin up environments effortlessly or let the developers do it themselves. That helps us get close to Platform Engineering.

Flux GitOps Repo

For this article, this is the tagged version of the GitOps repo used. At its core, we manually generated the yaml manifests via scripts commands. Namely upgrade_cluster1.sh and generate_cluster1.sh. Combined these create the yaml manifests needed. Upgrade cluster can be run to refresh the yaml during an upgrade but do not let it trick you. It can also be used to generate the initial component yaml. The generate_cluster1.sh should only need to be run once.

The bootstrap_cluster1.sh is used by our ansible playbook to actually apply the yaml.

Bootstrapping Flux

The flux cli has a bootstrap command that can be used but for this, we want disposable K8s clusters that can be torn down and then new ones rebuilt and attached to the same repo. Not only does this allow the workloads running to be treated like cattle but also the infrastructure itself.

To achieve this, we are manually creating the yaml manifests (still using the supported CLI tools) but decoupling that from the initial setup, deploy and running of the environment.

What Did We Get?

From a simple set of changes to pull and deploy flux, we have a sample ingress controller (nginx). In it you can specify any parameter about it and have clear visibility as to what is deployed. In this scenario we are just specifying the version but we could also specify how many instances or whether to deploy via daemonset (one instance per worker node).

Wrapping It All Up – What Is The Big Deal?

It might be a natural question as to what is the big deal about K8s, IaC, GitOps and this entire ecosystem. True IaC combined with GitOps allows complete transparency into what is deployed into production because flux ensures what is in Git is reconciled with the configured cluster. No more, one off configurations that nobody knows about until upgrade to replacement time on the server.

The fact that we have so much automation allows for tearing down and rebuilding as necessary. This allows for easy patching. Instead of applying updates and hoping for the best, just instantiate new instances and tear down the old ones.

Fedora CoreOS + Ansible => K8s

Summary

Kubernetes is a personal passion of mine and I have written a few times about how to standup one of my favorite Container Optimized Operating Systems, PhotonOS. Most recently I wanted to rebuild my lab because it has been a while. While some of my prior posts have served as a Standard Operating Procedure for how to do it, its lost its luster doing it manually.

Because of this, I sought out to automate the deployment of PhotonOS with Ansible. Having already learned and written about SaltStack, I wanted to tool around with Ansible. I thought, great, Photon is highly orchestrated by VMware, this should be simple.

Unfortunately PhotonOS 5 does not work well with Ansible, namely due to the package manager.

Container Optimized Operating Systems

In my search for one that did work well with Ansible, I came across a few. Flatcar was the first. It seemed to have plenty of options. I think came across Fedora CoreOS. These seem to be two of many forks of an older “CoreOS” distribution. Since Ansible and Fedora fall under the RedHat umbrella, I went with FCOS.

The interesting thing about Flatcar and CoreOS is that they use Ignition (and Butane) for bootstrapping. This allows for first time boot provisioning. This is the primary method for adding authentication such as ssh keys.

My Lab

My lab consists of VMware Fusion since I’m on a Mac. For that a lot of my steps are specific to that but I attempted to make them generic enough so that it could be modified for your environment.

Here is my full repo on this – https://github.com/dcatwoohoo/k8-fcos/tree/postid_822

Setting up Ignition

To help ensure your ssh keys are put into the environment, you’ll need to update butane with the appropriate details. Particularly the section “ssh_authorized_keys”

Butane is a yaml based format that is designed to be human read/writable. Ignition is designed to be human readable but not easily writable. For that reason, we use a conversion tool to help.

docker run -i --rm quay.io/coreos/butane:release --pretty --strict < ignition/butane.yaml > ignition/ignition.json

Don’t worry, this is baked into the ovf.sh script

Instantiating the OVF

The first step was acquiring the OVA template (Open Virtual Appliance). On https://fedoraproject.org/coreos/download?stream=stable#arches that would be over here!

For this I scripted it via ovf.sh that instantiates it for a given number of instances. As documented, its 2 nodes, fcos-node01 and fcos-node02

Once they are done and powered one, along with a 45 second pause/sleep, we’re good to run Ansible and get the cluster going.

Running Ansible

Because I can be lazy, I created a shell script called k8s-fcos-playbook.sh that runs the playbook. At this point, sit back and enjoy the wait.

If all goes well you’ll have a list of pods up and running successfully and a bare bones cluster.

kubectl get pods -A

Concluding thoughts

While I did not go specifically into Ansible and the details, it is a public repository and written in a fairly self explanatory way. It’s not the best or most modular but is easy to follow.

Special Thanks!

A special thanks to Péter Vámos and his repository on doing this similarly. He gave me some great ideas, although some of it I went in a different direction.