Network Configuration Management With Rancid

Summary

Network Configuration management is many times overlooked. Better yet, companies with strong Change Management practices believe they do not need config management because of this.

The issue is that sometimes commands entered to network gear do not take effect as we expect or we want to compare history and easily diff changes for root cause analysis.

Rancid

Rancid is a free open source tool to handle just this. I have successfully used it for the past few years. It has been a great tool and caught a typo from time to time as well as unexpected application of commands.

At a high level, the way it works is to pull a full config each time and push it into a version control system like CVS or Subversion. Git is also a popular choice but not really necessary as we will not be branching.

Once the configs are pumped into a versioning system, it is easy to produce diffs and any time rancid runs, it outputs the diffs so you can see the change.

The initial setup of rancid is often a barrier to entry. Once you get it setup the first time, upgrades are fairly simple.

Installing

For this demo, we are using a VM. We installed a minimal install CentOS 8.0 on a 1GB RAM, 10GB HDD with 1 CPU core. Production specs are not much more than this depending on how many devices you are querying and how often.

Let’s download the tar first!

[root@rancid ~]# curl -O https://shrubbery.net/pub/rancid/rancid-3.10.tar.gz

We need to install some dependencies! Expect is the brains of rancid and used to send and receive data from the network devices. Many of the modules that manipulate the data received are perl. Gcc and make are used to build the source code.

We need some sort of mailer, hence sendmail. You can use postfix if you prefer that.

We will be using CVS for simplicity and the default configuration of rancid.

[root@rancid rancid-3.10]# yum install expect perl gcc make sendmail

[root@rancid rancid-3.10]# yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm

[root@rancid rancid-3.10]# yum install cvs

We then need to extract and build the source!

[root@rancid ~]# tar xzf rancid-3.10.tar.gz 

[root@rancid ~]# ls -la | grep rancid
drwxr-xr-x.  8 7053 wheel   4096 Sep 30 18:15 rancid-3.10
-rw-r--r--.  1 root root  533821 Nov  5 06:13 rancid-3.10.tar.gz

[root@rancid ~]# cd rancid-3.10



[root@rancid ~]# ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
.....
config.status: creating include/config.h
config.status: executing depfiles commands

[root@rancid rancid-3.10]# make
... tons of output
gmake[1]: Leaving directory '/root/rancid-3.10/share'

[root@rancid rancid-3.10]# make install

[root@rancid rancid-3.10]# ls -la /usr/local/rancid/
total 4
drwxr-xr-x.  7 root root   63 Nov  5 06:21 .
drwxr-xr-x. 13 root root  145 Nov  5 06:20 ..
drwxr-xr-x.  2 root root 4096 Nov  5 06:21 bin
drwxr-xr-x.  2 root root   90 Nov  5 06:21 etc
drwxr-xr-x.  3 root root   20 Nov  5 06:21 lib
drwxr-xr-x.  4 root root   31 Nov  5 06:21 share
drwxr-xr-x.  2 root root    6 Nov  5 06:20 var

We very likely do not want this to run as root so we will need to create a user. By default, rancid gets installed to /usr/local/rancid so we will set that to the user’s home directory

[root@rancid rancid-3.10]# useradd -d /usr/local/rancid -M -U rancid

[root@rancid rancid-3.10]# chown rancid:rancid /usr/local/rancid/
[root@rancid rancid-3.10]# chown -R rancid:rancid /usr/local/rancid/*

[root@rancid rancid-3.10]# su - rancid
[rancid@rancid ~]$ pwd
/usr/local/rancid

To preserve permissions, all further changes should be made under the rancid user.

Configuring Rancid

The global rancid configuration, rancid.conf is dictated by the following format – https://www.shrubbery.net/rancid/man/rancid.conf.5.html

We will need to modify the following line

# list of rancid groups
LIST_OF_GROUPS="networking"

Configuring Devices

cloginrc

This follows a specific format as described here – https://www.shrubbery.net/rancid/man/cloginrc.5.html

[rancid@rancid ~]$ cat .cloginrc
add user test-f5	root
add password test-f5	XXXXXXX

router.db

This follows a specific format as described here – https://www.shrubbery.net/rancid/man/router.db.5.html

For our example we put in the following line. Please keep in mind you can use any name you wish but it has to either resolve via DNS or hosts file

[rancid@rancid var]$ cat router.db 
test-f5;bigip;up

First Run

[rancid@rancid ~]$ bin/rancid-run
[rancid@rancid ~]$

Well that was anticlimactic. Rancid typically doesn’t output at the console and reserves that for the logs in ~/var/logs

[rancid@rancid logs]$ pwd
/usr/local/rancid/var/logs
[rancid@rancid logs]$ ls -altrh
total 4.0K
drwxr-xr-x. 3 rancid rancid  35 Nov  5 07:00 ..
-rw-r-----. 1 rancid rancid 270 Nov  5 07:00 networking.20191105.070023
drwxr-x---. 2 rancid rancid  40 Nov  5 07:00 .

[rancid@rancid logs]$ cat networking.20191105.070023 
starting: Tue Nov 5 07:00:23 CST 2019

/usr/local/rancid/var/networking does not exist.
Run bin/rancid-cvs networking to make all of the needed directories.

ending: Tue Nov 5 07:00:23 CST 2019
[rancid@rancid logs]$ 

Ok, let’s run rancid-cvs. Its nice that it will create the repos for you. It both versions the router configs and the router.db files

[rancid@rancid ~]$ ~/bin/rancid-cvs

No conflicts created by this import

cvs checkout: Updating networking
Directory /usr/local/rancid/var/CVS/networking/configs added to the repository
cvs commit: Examining configs
cvs add: scheduling file `router.db' for addition
cvs add: use 'cvs commit' to add this file permanently
RCS file: /usr/local/rancid/var/CVS/networking/router.db,v
done
Checking in router.db;
/usr/local/rancid/var/CVS/networking/router.db,v  <--  router.db
initial revision: 1.1
done

# Proof of CVS creation
[rancid@rancid ~]$ find ./ -type d -name CVS
./var/CVS
./var/networking/CVS
./var/networking/configs/CVS

Rancid-run again!

[rancid@rancid ~]$ cd var/logs
[rancid@rancid logs]$ ls -altrh
total 8.0K
-rw-r-----. 1 rancid rancid 270 Nov  5 07:00 networking.20191105.070023
drwxr-xr-x. 5 rancid rancid  64 Nov  5 07:04 ..
drwxr-x---. 2 rancid rancid  74 Nov  5 07:05 .
-rw-r-----. 1 rancid rancid 741 Nov  5 07:05 networking.20191105.070555
[rancid@rancid logs]$ cat networking.20191105.070555
starting: Tue Nov 5 07:05:55 CST 2019

cvs add: scheduling file `.cvsignore' for addition
cvs add: use 'cvs commit' to add this file permanently
cvs add: scheduling file `configs/.cvsignore' for addition
cvs add: use 'cvs commit' to add this file permanently

cvs commit: Examining .
cvs commit: Examining configs
RCS file: /usr/local/rancid/var/CVS/networking/.cvsignore,v
done
Checking in .cvsignore;
/usr/local/rancid/var/CVS/networking/.cvsignore,v  <--  .cvsignore
initial revision: 1.1
done
RCS file: /usr/local/rancid/var/CVS/networking/configs/.cvsignore,v
done
Checking in configs/.cvsignore;
/usr/local/rancid/var/CVS/networking/configs/.cvsignore,v  <--  .cvsignore
initial revision: 1.1
done

ending: Tue Nov 5 07:05:56 CST 2019

The router.db we created in ~/var/router.db needs to move to ~/var/networking/router.db

[rancid@rancid var]$ mv ~/var/router.db ~/var/networking/
[rancid@rancid var]$ ~/bin/rancid-run 
[rancid@rancid var]$ cd logs
[rancid@rancid logs]$ ls -la
total 12
drwxr-x---. 2 rancid rancid  108 Nov  5 07:08 .
drwxr-xr-x. 5 rancid rancid   47 Nov  5 07:08 ..
-rw-r-----. 1 rancid rancid  270 Nov  5 07:00 networking.20191105.070023
-rw-r-----. 1 rancid rancid  741 Nov  5 07:05 networking.20191105.070555
-rw-r-----. 1 rancid rancid 1899 Nov  5 07:08 networking.20191105.070840

[rancid@rancid logs]$ cat networking.20191105.070840
starting: Tue Nov 5 07:08:40 CST 2019

/usr/local/rancid/bin/control_rancid: line 433: sendmail: command not found
cvs add: scheduling file `test-f5' for addition
cvs add: use 'cvs commit' to add this file permanently
RCS file: /usr/local/rancid/var/CVS/networking/configs/test-f5,v
done
Checking in test-f5;
/usr/local/rancid/var/CVS/networking/configs/test-f5,v  <--  test-f5
initial revision: 1.1
done
Added test-f5



Trying to get all of the configs.
test-f5: missed cmd(s): all commands
test-f5: End of run not found
test-f5 clogin error: Error: /usr/local/rancid/.cloginrc must not be world readable/writable
#

This file does have passwords afterall, let’s lock it down

[rancid@rancid ~]$ chmod 750 .cloginrc 
[rancid@rancid ~]$ 

Iterative Approach

I went through a few iterations of troubleshooting and looking at the logs. I did this because nearly nobody gets the install 100% correct the first time. Therefore, its great to understand how to check the logs and make changes accordingly.

The final cloginrc looks like this

[rancid@rancid ~]$ cat .cloginrc
add user test-f5	root
add password test-f5	XXXXXXXXXX

#defaults for most devices
add autoenable *	1
add method *		ssh

The rancid.conf needed this line changed

SENDMAIL="/usr/sbin/sendmail"

And now we have a clean run!

[rancid@rancid ~]$ cat var/logs/networking.20191105.073209
starting: Tue Nov 5 07:32:09 CST 2019



Trying to get all of the configs.
All routers successfully completed.

cvs diff: Diffing .
cvs diff: Diffing configs
cvs commit: Examining .
cvs commit: Examining configs

ending: Tue Nov 5 07:32:20 CST 2019

Scheduled Runs

On UNIX, crontab is the typical default to run scheduled jobs and here is a good one to run. You can edit your crontab by running “crontab -e” or list it by running “crontab -l”

#Run config differ twice daily
02 1,14 * * * /usr/local/rancid/bin/rancid-run

#Clean out config differ logs
58 22 * * * /usr/bin/find /usr/local/rancid/var/logs -type f -mtime +7 -delete

This crontab runs rancid 2 minutes after the hour at 02:02 and 14:02. It then clears logs older than 7 days every 24 hours at 22:58. We do not want the drive to fill up due to noisy logs.

Web Interface

Rancid is nearly 100% CLI but there are addon tools for CVS that we can use. Namely cvsweb. FreeBSD was a heavy user of CVS and created this project/package.

cvsweb will require apache and “rcs”. RCS does not yet exist in EPEL for CentOS 8.0

[root@rancid ~]# yum install httpd

[root@rancid ~]# curl -O https://people.freebsd.org/~scop/cvsweb/cvsweb-3.0.6.tar.gz
[root@rancid ~]# tar xzf cvsweb-3.0.6.tar.gz

[root@rancid cvsweb-3.0.6]# cp cvsweb.cgi /var/www/cgi-bin/

[root@rancid cvsweb-3.0.6]# mkdir /usr/local/etc/cvsweb/
[root@rancid cvsweb-3.0.6]# cp cvsweb.conf /usr/local/etc/cvsweb/

[root@rancid httpd]# chmod 755 /var/www/cgi-bin/cvsweb.cgi 

We need to tell cvsweb where the repo is! Find the following section to add ‘Rancid’ in /usr/local/etc/cvsweb/cvsweb.conf

@CVSrepositories = (
        'Rancid'  => ['Rancid Repository', '/usr/local/rancid/var/CVS'],

Now let’s start up apache and let it rip!

[root@rancid cvsweb-3.0.6]# systemctl enable httpd
Created symlink /etc/systemd/system/multi-user.target.wants/httpd.service → /usr/lib/systemd/system/httpd.service.
[root@rancid cvsweb-3.0.6]# systemctl start httpd

# Enable port 80 on firewall

[root@rancid httpd]# firewall-cmd --zone=public --add-service=http --permanent
success
[root@rancid httpd]# firewall-cmd --reload
success

Wait, it still doesn’t work. Let’s check /var/log/httpd/error_log

Can't locate IPC/Run.pm in @INC (you may need to install the IPC::Run module) (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5) at /var/www/cgi-bin/cvsweb.cgi line 100.
BEGIN failed--compilation aborted at /var/www/cgi-bin/cvsweb.cgi line 100.
[Tue Nov 05 08:02:58.091776 2019] [cgid:error] [pid 20354:tid 140030446114560] [client ::1:37398] End of script output before headers: cvsweb.cgi

On CentOS 8 – It seems the best way to get this is via https://centos.pkgs.org/8/centos-powertools-x86_64/perl-IPC-Run-0.99-1.el8.noarch.rpm.html

[root@rancid httpd]# dnf --enablerepo=PowerTools install perl-IPC-Run

Then I ran into the following issue which seems to be a known bug. I manually edited the file as recommended in the patch.

"my" variable $tmp masks earlier declaration in same statement at /var/www/cgi-bin/cvsweb.cgi line 1338.
syntax error at /var/www/cgi-bin/cvsweb.cgi line 1195, near "$v qw(hidecvsroot hidenonreadable)"
Global symbol "$v" requires explicit package name (did you forget to declare "my $v"?) at /var/www/cgi-bin/cvsweb.cgi line 1197.
Global symbol "$v" requires explicit package name (did you forget to declare "my $v"?) at /var/www/cgi-bin/cvsweb.cgi line 1197.
syntax error at /var/www/cgi-bin/cvsweb.cgi line 1276, near "}"
  (Might be a runaway multi-line << string starting on line 1267)
syntax error at /var/www/cgi-bin/cvsweb.cgi line 1289, near "}"
syntax error at /var/www/cgi-bin/cvsweb.cgi line 1295, near "}"
syntax error at /var/www/cgi-bin/cvsweb.cgi line 1302, near "}"
syntax error at /var/www/cgi-bin/cvsweb.cgi line 1312, near "}"
syntax error at /var/www/cgi-bin/cvsweb.cgi line 1336, near "}"
syntax error at /var/www/cgi-bin/cvsweb.cgi line 1338, near ""$tmp,v" }"
/var/www/cgi-bin/cvsweb.cgi has too many errors.

Are we there yet?

Yay - We can see the CVS root!
Yay – We can see the root!
And we can drill into router.db and other areas!

And we can drill into router.db and other areas!

Security

We really should secure this page because 1) We are running perl scripts and cgi-bin is notoriously insecure. For 2) We have router configs, possibly with passwords and passphrases.

[root@rancid ~]# htpasswd -c /etc/httpd/.htpasswd dwchapmanjr
New password: 
Re-type new password: 
Adding password for user dwchapmanjr
[root@rancid ~]# 

Create the /var/www/cgi-bin/.htaccess

AuthType Basic
AuthName "Restricted Content"
AuthUserFile /etc/apache2/.htpasswd
Require valid-user

Set permissions

[root@rancid html]# chmod 640 /etc/apache2/.htpasswd 
[root@rancid html]# chmod 640 /var/www/cgi-bin/.htaccess
[root@rancid html]# chown apache /etc/apache2/.htpasswd 
[root@rancid html]# chmod apache /var/www/cgi-bin/.htaccess 

We then want to Allow overrides so that the .htaccess will actually work by editing /etc/httpd/conf/httpd.conf

# Change Allow Override to All

<Directory "/var/www/cgi-bin">
    #AllowOverride None
    AllowOverride All
    Options None
    Require all granted
</Directory>

And then “systemctl restart httpd”

With any luck you should get a user/pass prompt now! It is not the most secure but it is something.

Final Words

In this article we have stood up rancid from scratch. We have also gone over some basic troubleshooting steps and configured apache and cvsweb to visually browse the files.

How to Cheaply Host Your Business Online

Summary

Hosting your company online could mean a few things to different people. The purpose of this post is to help guide new startups through hosting email, web presence and some other related tools.

About 20 years ago this was a more costly endeavor. There were fewer players and each component cost more. It has been highly commoditized and DNS Hosting is usually free these days.

This will not go into the technical details of doing it. If there is a need for that I am happy to expand on this article. This article simply describes the route to you can take.

Leading By Example

An example I will provide is setting up my wife’s business Pretty Hare Salon. She had a need for online presence, email and online booking for appointments.

Registrar and DNS Hosting

Once she filed her proper paperwork to register her business we were off to searching for a domain name and settled on prettyharesalon.com. Registering with Google Registrar services was easy and it was a mere $12/year for this domain including DNS services.

Some people get confused by the difference between the two. Registrar Services involves reserving the name and pointing to the DNS Hosting provider which in this case is also Google.

Forwarding

A neat feature of Google DNS is that you can do HTTP(S) forwarding. For her business she does not need a published web site as social media suffices for this for the time being. We opted to use HTTPS forwarding for www.prettyharesalon.com and redirect it to her Facebook page. This is a free service Google provides and quite helpful.

Domain forwarding to Facebook
Domain forwarding to Facebook

Email

For email, we opted for a personal gmail account to save cost. Google Domains also allows forwarding email from her prettyharesalon.com to her personal account. We have a few addresses that we forward for. Her clients aren’t concerned that the correspondence comes from @prettyharesalon.com as they all know her. With that said, publishing the @prettyharesalon.com email addresses on social media helps greatly to those just finding her business.

If the email forwarding doesn’t work for your business, G Suite (Google’s Platform) offers flexible email hosting starting at roughly $6/month/account. I use that here at Woohoo Services for mine.

Web Site Hosting

If you do need to host a web site, Google Cloud does have hosting offerings including WordPress and a few others. See my article on how I set this up for this blog if you’re interested. I highly recommend putting CloudFlare or something similar in front of it though. Web sites get scanned and attacked on a daily basis, unbeknownst to the owner. This can damage your reputation.

Wix is another popular web hosting platform that is fairly easy to use.

Online Platform

Square is a great platform for startups that need to accept credit cards but cannot commit to monthly fees. It offers a fairly flat percentage structure on card swipes/dips that are predictable. It also offers free add-ons like the appointment module for businesses that require/prefer appointments and have a set list of services to offer.

Social Media

It goes without saying that social media is a most. It can be a free source of marketing and a great way to keep in touch with your clients. Do not limit yourself to just one though. Get on as many as you believe are relevant to your business. If you have a brick and mortar store, register that business with the major search engines, add your business hours to it, etc. You’d be surprised at how many people exclusively use Yelp or Google. Different people use different social media and its best to try to capture it all.

A Beginner’s Guide to SEO From a Beginner

Summary

In my attempt to stand up this blog, I have gone through the Search Engine Optimization (SEO) process. This is the first time I have had to do this in roughly a decade or two. Quite a bit has changed but some things are still the same.

Patience

I started out expecting it to be fairly instant. Refreshing the screens every hour and then daily. The crawling and indexing seemed to run on its own terms.

One of the most important things to have is patience. It takes time for your site to get crawled, even after submitting site maps to various search engines. In today’s world of instant gratification, if it is a new site, it can take days or weeks for your site to get indexed.

Analytics

Analytics are the second most important part of this. You need to be able to measure traffic to your site. Google Analytics is a great thing to inject into your site. It is not the only source of analytics though.

I currently use Cloudflare for this site. CloudFlare is great at providing some analytics as well. I particularly like the ability to see which “crawlers” are going through my site.

Cloudflare crawlers
Cloudflare “crawlers”

Not that it is terribly useful but since my DNS is hosted with Cloudflare, I get some metrics from that. It is neat to see where the queries are coming from.

Top DNS Queries

Analytics can help you determine the format of your information or time of day to release new content. Google Analytics is also great about telling you how you obtained the audience. Do most people see it through your Facebook feed, search or organic directly from your site?

Registering With Search Engines

Search engines need to know about you and your site. One quick way is to submit your site to the major ones. Google (Search Console), Bing (Webmaster) and Yandex (Webmaster) are a great start.

If your pages are not heavily linked to each other, a great idea is to upload a sitemap. This blog has that issue. If you are using a tool like WordPress, Yoast is a great tool to generate these. Yoast dynamically generates the site map.

Validating Site Content

Google Search Console allows you to inspect the URL for issues which can help you pre-empt issues before they get into the index. Here you can see it has actually already crawled the site but not indexed it yet. Still worth a test!

URL Inspection in Google Search Console
URL Inspection in Google Search Console
URL Inspection in Google Search Console - Live Test
URL Inspection in Google Search Console – Live Test – All “green”

Bing and Yandex have similar tools.

Addressing Issues

Try to resolve any issues the various tools detect fairly quickly. Excessive redirects, bad robots, 4XX or 5XX can cause some “crawl” pains. With limited resources, search engines have to optimize their crawl and allocate a “budget” to each site. Up and coming sites without a reputation have a minimal budget so you do not want issues getting in the way of that.

Focus on Quality Content

Whether your site is fairly static or you are pumping out content, ensure the quality remains. Google and other search engines prioritize original quality content. They also rate on readability and mobile friendliness.

Final Words

Hopefully this has helped you as a very introductory article on SEO. There are plenty of guides that go into much more depth into the various aspects of SEO.

CheckMK Distributed Monitoring

Summary

The purpose of this guide is to show the strength and flexibility of CheckMK’s distributed monitoring. As you add hosts and services, the requirements can grow. It can be easy to get in the rut of adding more CPU and RAM until you have a monstrosity of a server that you cannot expand anymore.

Centrally monitoring all sites may not even work. The central CheckMK server may not have access to all of the remote devices.

Pre-requisites and Installation

To start off, we will need another CheckMK instance. If you do not already have one, check out the Introduction to CheckMK guide. Once you have CheckMK installed and a new unique site setup, the rest is trivial

Distributed monitoring also involves the slave listening on TCP/6557 so we need to open that up

[root@chckmk2 ~]# firewall-cmd --zone=public --add-port=6557/tcp --permanent
[root@chckmk2 ~]# firewall-cmd --reload

[root@chckmk2 ~]# su - second

OMD[second]:~$ omd config

We then enable distributed monitoring and enable livestatus

omd config - Distributed Monitoring
Select “Distributed Monitoring”
omd livestatus
omd livestatus

Enable livestatus which will listen on port 6557

Configure Connection

From the main site, in our series http://chckmk1.woohoosvcs.com/main navigate to Distributed Monitoring

Distributed Monitoring
Distributed Monitoring
New Connection
New Connection
Second connection options
Fill out the appropriate options.
Login to second site
Select the credentials that have access to login to the second site.
Login
Login again!
Save changes
Save changes
Activate
Activate!

Use Connection

Now that we have the connection, how do we actually use it? One of the easiest and likely ways is to have a folder configured just with that monitoring site

New Folder
Create a folder with a monitored site

Next I added a host and put it in this new folder. Here you can see CheckMK is smart enough to know only the “second” site needs to be reloaded due to the changes that only affect it.

Activate second site changes
Activate second site changes

Final Words

This article was mostly pictures but hopefully the point comes across. That point is how flexible and easy it is to setup distributed monitoring via CheckMK.

Upgrading CheckMK

Summary

In this article we will discuss the upgrade process using OMD. We will also go over the “werks” or changes and incompatibilities in the versions after upgrading.

If you have a fully functional environment such as one installed per Introduction to CheckMK – this should be fairly straight forward.

Prerequisites

CheckMK 1.5 RPM – We downloaded this in the previous article

[root@chckmk1 ~]# curl -O https://checkmk.com/support/1.5.0p23/check-mk-raw-1.5.0p23-el7-38.x86_64.rpm

[root@chckmk1 ~]# yum install check-mk-raw-1.5.0p23-el7-38.x86_64.rpm 

Wait, what just happened? Was that it? Did it just get upgraded? Yes and no. CheckMK 1.5 was installed but our instance is not upgraded to it.

[root@chckmk1 ~]# omd sites
SITE             VERSION          COMMENTS
main             1.4.0p38.cre      

[root@chckmk1 ~]# omd versions
1.4.0p38.cre
1.5.0p23.cre (default)

Each omd instance runs as the user so we will su to main to run commands for that instance

[root@chckmk1 ~]# su - main
OMD[main]:~$ omd update
Please completely stop 'main' before updating it.

# Yes that's a great idea!
OMD[main]:~$ omd stop
OMD[main]:~$ omd update

omd update
“omd update”
OMD[main]:~$ omd version
OMD - Open Monitoring Distribution Version 1.5.0p23.cre
OMD[main]:~$ omd start
Version 1.5.0p23 with “57”

Are we there yet?

We most definitely are. We are on 1.5.0p23 and that went fairly well. But what is the “57”.

Werks

From the CheckMK page a Werk is a bug or change that has an affect on the UI – https://checkmk.de/check_mk-werks.php

We can click on the 57 and there are 57 incompatible “werks” that we should be aware of. Many times these are non issues. Other times it is certain metrics that have gone away or configurations that have changed.

Werks / Release Notes
“Werks” AKA Release Notes

This is a huge help when upgrading. Instead of having to dig through text file release notes to see what major changes happened, we have werks! You can see any incompatible changes here and drilling into them will give you details on what to do. Once you have addressed it, you can acknowledge the werk.

Click on the “Show unacknowledged incompatible werks”

Incompatible Werk
Incompatible werk. Address and acknowledge

As we can see here, there is a clear description of what changed. Once we have addressed, you can acknowledge and the 57 will decrement to 56.

You also have the option on the release notes page to “acknowledge all” if you really do not care to work through the werks. for this lab I have gone through this before and I will just acknowledge all.

If you have a large deployment, most of your time will be spent going through the werks and addressing. With that said, going from 1.4 to 1.5 has been a breeze. 1.2.8 to 1.4 was a bit rougher with more incompatible werks that caused issues.

Rollback

The power of OMD lets rollbacks come fairly easy. OMD does not care if an “upgrade” is going forward or backwards. The only issues you may have is if you made a config change only compatible or implemented in 1.5. In that case, 1.4 may have issues with it. Otherwise the rollback is the same as the upgrade. 1) stop the site 2) omd update

Final Words

Now we have a fairly up to date CheckMK. Try doing the same to 1.6.0? For my production deploys I usually wait for a few revisions and 1.6.0 is still very early for my tastes.

Introduction to CheckMK

Summary

The purpose of this guide is to provide a high level overview of CheckMK. CheckMK is a great monitoring tool that has progressed greatly over the years. I have heavily depended on it for at least 3 years now.

Background

I came across CheckMK on a project that required a shot gun replacement of the current monitoring solution. A few solutions were vetted and Nagios was attempted. Unfortunately, the time to tweak and tune it was not compatible with the project timelines.

About CheckMK

CheckMK is an ecosystem that original was built around nagios. Many of the components of Nagios exist. Mathias Kettner is the founder of CheckMK.

There are quite a few editions but the scope of this is for the “Raw” edition. This is essentially the free unlimited tier.

Installation

Enough of the background, let’s get down to the technical installation

Requirements

For the purposes of this installation, we will be using a vanilla “minimal” install of CentOS 7.0. CentOS 7 is the latest supported version CheckMK supports. The VM will have 1 core, 1GB RAM and 8GB HDD. We will first be installing 1.4.0 so that the upgrade process can be shown.

curl -O https://checkmk.com/support/1.4.0p38/check-mk-raw-1.4.0p38-el7-85.x86_64.rpm

curl -O https://checkmk.com/support/1.5.0p23/check-mk-raw-1.5.0p23-el7-38.x86_64.rpm

# Always good to update first!
yum update

# Enable EPEL package repo
yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

# Then attempt to install
yum install check-mk-raw-1.4.0p38-el7-85.x86_64.rpm

Ok, great, ran through all of that. What’s next?

Configuration

Open Monitoring Distribution

CheckMK builds upon a framework called Open Monitoring Distribution (OMD). You may be asking, why the complexity? OMD makes upgrades quite useful and can allow you to run multiple versions of CheckMK on the same machine as OMD also allows multiple instances. The CheckMK rpms install this.

The first step after installing CheckMK will be to create an OMD site

# Here you can see there are no sites
[root@chckmk1 ~]# omd sites
SITE             VERSION          COMMENTS

# We then create a site
[root@chckmk1 ~]# omd create main
Adding /opt/omd/sites/main/tmp to /etc/fstab.
Creating temporary filesystem /omd/sites/main/tmp...OK
Restarting Apache...OK
Created new site main with version 1.4.0p38.cre.

  The site can be started with omd start main.
  The default web UI is available at http://chckmk1.woohoosvcs.com/main/

  The admin user for the web applications is cmkadmin with password: OkWZHNQr
  (It can be changed with 'htpasswd -m ~/etc/htpasswd cmkadmin' as site user.
)
  Please do a su - main for administration of this site.


[root@chckmk1 ~]# omd sites
SITE             VERSION          COMMENTS
main             1.4.0p38.cre     default version 

[root@chckmk1 ~]# omd start main
Starting mkeventd...OK
Starting rrdcached...OK
Starting npcd...OK
Starting nagios...2019-11-02 15:43:45 [6] updating log file index
2019-11-02 15:43:45 [6] updating log file index
OK
Starting dedicated Apache for site main...OK
Initializing Crontab...OK

# Open port 80 with firewalld
[root@chckmk1 ~]# firewall-cmd --zone=public --add-service=http --permanent
success
[root@chckmk1 ~]# firewall-cmd --reload

# Set SELINUX

setsebool -P httpd_can_network_connect 1

If everything went well, you should be able to browse to the IP or URL and get a login page.

CheckMK 1.4.0 Login Page

CheckMK 1.4.0 Login Page

It is really that simple. You now have a working CheckMK instance ready to be configured.

CheckMK Main Page

CheckMK Main Page

Configuring CheckMK Application

Now that we have a working instance and want to actually monitor something, why not the CheckMK server itself? There are a few options for this. You can use SNMP, CheckMK Agent or both. We will go over installing and configuring the agent.

Installing the CheckMK Agent

The agent requires xinetd as it essentially ties a script (the agent) to a socket/tcp port (6556). We will put the agent directly on the checkmk server.

The agents can be found in the “Monitoring Agents” section. For RPM based distributions it is easy to just install the RPM.

[root@chckmk1 ~]# curl -O http://chckmk1.woohoosvcs.com/main/check_mk/agents/check-mk-agent-1.4.0p38-1.noarch.rpm

[root@chckmk1 ~]# yum install check-mk-agent-1.4.0p38-1.noarch.rpm

[root@chckmk1 ~]# netstat -an | grep 6556
tcp6       0      0 :::6556                 :::*                    LISTEN     

Adding Host

Now we get to add the host to CheckMK

New hosts
Go to “Hosts” and then “New host”
Select host monitoring configuration
Enter the Hostname or IP. The agent type defaults to Check_MK Agent but I wanted to point it out. You then want to “Save & go to Services”.
Fix all missing/vanished services
This screen shows all of the services it can monitor. We want to “Fix all missing/vanished” and then save “1 change” which should after this actually change to save “2 changes”.
Activate new changes made
At this point we need to activate those changes. This typically requires a reload of Nagios which the activation does.
Host added but stale host and services
But wait, what am I seeing now and why are those hosts stale?
Some activity
Here we go, some activity!

At this point, we added a host and added some metrics to it. It ran through a discovery and found some more. Many times this happens. This is because some checks run asynchronously in the background. The first time you check a host, it does not return all of the services. On the second run they show up. From here you can go through similar steps to click on the host and acknowledge the new services.

In this case I simply forgot to follow my own instructions and click the “fix button”. Likely more services would have shown up later but not as many in the screenshot.

Other Configurations

Just because we have a monitoring system in place does not mean it is fully configured. We still have notifications, alert levels and many other tuning. Those are out of scope of this document but I will likely generate them going forward.

Final Words

We stood up a Check_MK server from scratch and are monitoring one host. At the beginning of the article I discussed upgrading. I will follow up with another article on upgrading. The process is fairly simple but there

Kubernetes Dashboard

Summary

Now that we’ve stood up a majority of the framework we can get to some of the fun stuff. Namely Kubernetes Dashboard. Due to compatibility reasons we will be using 2.0beta1. Newer 2.0 betas are not well tested and I ran into some issues with our 1.14 that Photon comes with.

Download and Install

This is short and sweet. As usual, I like to download and then install. I didn’t like the name of this file though so I renamed it.

curl -O https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.0-beta1/aio/deploy/recommended.yaml

mv recommended.yaml dashboard-2b1.yaml

kubectl apply -f dashboard-2b1.yaml 
namespace/kubernetes-dashboard created
serviceaccount/kubernetes-dashboard created
service/kubernetes-dashboard created
secret/kubernetes-dashboard-certs created
secret/kubernetes-dashboard-csrf created
secret/kubernetes-dashboard-key-holder created
configmap/kubernetes-dashboard-settings created
role.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
deployment.apps/kubernetes-dashboard created
service/dashboard-metrics-scraper created
deployment.apps/kubernetes-metrics-scraper created

Health Check

The dashboard namespace is kubernetes-dashboard so we run the following.

root@kube-master [ ~/kube ]# kubectl get all --namespace=kubernetes-dashboard
NAME                                              READY   STATUS    RESTARTS   AGE
pod/kubernetes-dashboard-6f89577b77-pbngw         1/1     Running   0          27s
pod/kubernetes-metrics-scraper-79c9985bc6-kj6h5   1/1     Running   0          28s

NAME                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/dashboard-metrics-scraper   ClusterIP   10.254.189.11    <none>        8000/TCP   57s
service/kubernetes-dashboard        ClusterIP   10.254.127.216   <none>        443/TCP    61s

NAME                                         READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/kubernetes-dashboard         1/1     1            1           57s
deployment.apps/kubernetes-metrics-scraper   1/1     1            1           57s

NAME                                                    DESIRED   CURRENT   READY   AGE
replicaset.apps/kubernetes-dashboard-6f89577b77         1         1         1       29s
replicaset.apps/kubernetes-metrics-scraper-79c9985bc6   1         1         1       29s

Connecting

On the main Dashboard page it indicates you can access via running “kubectl proxy” and access the URL. This is where it gets a little tricky. Not for us since we have flannel working, even on the master. Simply download the Kubernetes kubectl client for your OS and run it locally.

dwcjr@Davids-MacBook-Pro ~ % kubectl proxy
Starting to serve on 127.0.0.1:8001

Now access the indicated link in the article. Namespace changed as it changed in 2.0 – http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/

Kubernetes Login Screen

Authenticating

Kubernetes Access Control page does a good job at describing this but at a high level

Create an dashboard-adminuser.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin-user
  namespace: kubernetes-dashboard

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: admin-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: admin-user
  namespace: kubernetes-dashboard

kubectl apply -f dashboard-adminuser.yaml

Then use this cool snippet to find the token. If you’re doing this on the master, make sure to install awk

kubectl -n kubernetes-dashboard describe secret $(kubectl -n kubernetes-dashboard get secret | grep admin-user | awk '{print $1}')

At the bottom of the output should be a token section that you can plug into the token request.

From here you’ve made it. Things just got a whole lot easier if you’re a visual learner!

Kubernetes Dashboard View

Final Words

I may write a few more articles on this but that this point we have a very functional Kubernetes Cluster that can deploy apps given we throw enough resources at the VMs. Other topics that need to be covered are networking and the actual topology. I feel that one of the best ways to learn a platform or technology is to push through a guided install and then understand what the components are. This works for me but not everyone.

Kubernetes Flannel Configuration

Summary

With all the pre-requisites met, including SSL, flannel is fairly simple to install and configure. Where it goes wrong is if some of those pre-requisites have not been met or are misconfigured. You will star to find that out in this step.

We will be running flannel in a docker image, even on the master versus a standalone which is much easier to manage.

Why Do We Need Flannel Or An Overlay?

Without flannel, each node has the same IP range associated with docker. We could change this and manage it ourselves. We would then need to setup firewall rules and routing table entries to handle this. Then we also need to keep up with ip allocations.

Flannel does all of this for us. It does so with a minimal amount of effort.

Staging for Flannel

Config

We need to update /etc/kubernetes/controller-manager again and add

--allocate-node-cidrs=true --cluster-cidr=10.244.0.0/16

KUBE_CONTROLLER_MANAGER_ARGS="--root-ca-file=/secret/ca.crt  --service-account-private-key-file=/secret/server.key --allocate-node-cidrs=true --cluster-cidr=10.244.0.0/16"

And then restart kube-controller-manager

I always prefer to download my yaml files so I can review and replay as necessary. Per their documentation I am just going to curl the URL and then apply it

On each node we need to add the following the the /etc/kubernetes/kubelet config and then restart kubelet

KUBELET_ARGS="--network-plugin=cni"

Firewall

Since flannel is an overlay, it overlays over the existing network and we need to open UDP/8285 per their doc. Therefore we need to put this in iptables on each host

# This line for VXLAN
-A INPUT -p udp -m udp --dport 8472 -j ACCEPT

# This line for UDP
-A INPUT -p udp -m udp --dport 8285 -j ACCEPT

Fire it up!

Now we are ready to apply and let it all spin up!

root@kube-master [ ~/kube ]# curl -O https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

root@kube-master [ ~/kube ]# kubectl apply -f kube-flannel.yml
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds-amd64 created
daemonset.apps/kube-flannel-ds-arm64 created
daemonset.apps/kube-flannel-ds-arm created
daemonset.apps/kube-flannel-ds-ppc64le created
daemonset.apps/kube-flannel-ds-s390x created

If all is well at this point, it should be chewing through CPU and disk and in a minute or two the pods are deployed!

root@kube-master [ ~/kube ]# kubectl get pods --namespace=kube-system
NAME                          READY   STATUS    RESTARTS   AGE
kube-flannel-ds-amd64-7dqd4   1/1     Running   17         138m
kube-flannel-ds-amd64-hs6c7   1/1     Running   1          138m
kube-flannel-ds-amd64-txz9g   1/1     Running   18         139m

On each node you should see a “flannel” interface now too.

root@kube-master [ ~/kube ]# ifconfig -a | grep flannel
flannel.1 Link encap:Ethernet  HWaddr 1a:f8:1a:65:2f:75

Troubleshooting Flannel

From the “RESTARTS” section you can see some of them had some issues. What kind of blog would this be if I didn’t walk you through some troubleshooting steps?

I knew that the successful one was the master so it was likely a connectivity issue. Testing “curl -v https://10.254.0.1” passed on the master but failed on the nodes. By pass, I mean it made a connection but complained about the TLS certificate (which is fine). The nodes, however, indicated some sort of connectivity issue or firewall issue. So I tried the back end service member https://192.168.116.174:6443 and same symptoms. I would have expected Kubernetes to open up this port but it didn’t so I added it to iptables and updated my own documentation.

Some other good commands are “kubectl logs <resource>” such as

root@kube-master [ ~/kube ]# kubectl logs pod/kube-flannel-ds-amd64-txz9g --namespace=kube-system
I1031 18:47:14.419895       1 main.go:514] Determining IP address of default interface
I1031 18:47:14.420829       1 main.go:527] Using interface with name eth0 and address 192.168.116.175
I1031 18:47:14.421008       1 main.go:544] Defaulting external address to interface address (192.168.116.175)
I1031 18:47:14.612398       1 kube.go:126] Waiting 10m0s for node controller to sync
I1031 18:47:14.612648       1 kube.go:309] Starting kube subnet manager
....

You will notice the “namespace” flag. Kubernetes can segment resources into namespaces. If you’re unsure of which namespace something exists in, you can use “–all-namespaces”

Final Words

Now we have a robust network topology where pods can have unique IP ranges and communicate to pods on other nodes.

Next we will be talking about Kubernetes Dashboard and how to load it. The CLI is not for everyone and the dashboard helps put things into perspective.

Next – Kubernetes Dashboard
Next – Spinning Up Rancher With Kubernetes

Kubernetes SSL Configuration

Summary

Picking up where we left off in the Initializing Kubernetes article, we will now be setting up certificates! This will be closely following the Kubernetes “Certificates” article. Specifically using OpenSSL as easyrsa has some dependency issues with Photon.

OpenSSL

Generating Files

We’ll be running the following commands and I keep them in /root/kube/certs. They won’t remain there but its a good staging area that needs to be cleaned up or secured so we don’t have keys laying around.

openssl genrsa -out ca.key 2048
openssl req -x509 -new -nodes -key ca.key -subj "/CN=$192.168.116.174" -days 10000 -out ca.crt
openssl genrsa -out server.key 2048

We then need to generate a csr.conf

[ req ]
default_bits = 2048
prompt = no
default_md = sha256
req_extensions = req_ext
distinguished_name = dn

[ dn ]
C = <country>
ST = <state>
L = <city>
O = <organization>
OU = <organization unit>
CN = <MASTER_IP>

[ req_ext ]
subjectAltName = @alt_names

[ alt_names ]
DNS.1 = kubernetes
DNS.2 = kubernetes.default
DNS.3 = kubernetes.default.svc
DNS.4 = kubernetes.default.svc.cluster
DNS.5 = kubernetes.default.svc.cluster.local
IP.1 = <MASTER_IP>
IP.2 = <MASTER_CLUSTER_IP>

[ v3_ext ]
authorityKeyIdentifier=keyid,issuer:always
basicConstraints=CA:FALSE
keyUsage=keyEncipherment,dataEncipherment
extendedKeyUsage=serverAuth,clientAuth
subjectAltName=@alt_names

In my environment the MASTER_IP is 192.168.116.174 and the cluster IP is usually a default but we can get it by running kubectl

root@kube-master [ ~/kube ]# kubectl get services kubernetes
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.254.0.1   <none>        443/TCP   60m
[ req ]
default_bits = 2048
prompt = no
default_md = sha256
req_extensions = req_ext
distinguished_name = dn

[ dn ]
C = US
ST = Texas
L = Katy
O = Woohoo Services
OU = IT
CN = 192.168.116.174

[ req_ext ]
subjectAltName = @alt_names

[ alt_names ]
DNS.1 = kubernetes
DNS.2 = kubernetes.default
DNS.3 = kubernetes.default.svc
DNS.4 = kubernetes.default.svc.cluster
DNS.5 = kubernetes.default.svc.cluster.local
IP.1 = 192.168.116.174
IP.2 = 10.254.0.1

[ v3_ext ]
authorityKeyIdentifier=keyid,issuer:always
basicConstraints=CA:FALSE
keyUsage=keyEncipherment,dataEncipherment
extendedKeyUsage=serverAuth,clientAuth
subjectAltName=@alt_names

We then run

openssl req -new -key server.key -out server.csr -config csr.conf

openssl x509 -req -in server.csr -CA ca.crt -CAkey ca.key \
-CAcreateserial -out server.crt -days 10000 \
-extensions v3_ext -extfile csr.conf

# For verification only
openssl x509  -noout -text -in ./server.crt

Placing Files

I create a /secrets and moved the files in as follows

mkdir /secrets
chmod 700 /secrets
chown kube:kube /secrets

cp ca.crt /secrets/
cp server.crt /secrets/
cp server.key /secrets/
chmod 700 /secrets/*
chown kube:kube /secrets/*

Configure API Server

On the master, edit /etc/kubernetes/apiserver and add the following parameters

--client-ca-file=/secrets/ca.crt
--tls-cert-file=/secrets/server.crt
--tls-private-key-file=/secrets/server.key

KUBE_API_ARGS="--client-ca-file=/secrets/ca.crt --tls-cert-file=/secrets/server.crt --tls-private-key-file=/secrets/server.key"

Restart kube-apiserver. We also need to edit /etc/kubernetes/controller-manager

KUBE_CONTROLLER_MANAGER_ARGS="--root-ca-file=/secrets/ca.crt  --service-account-private-key-file=/secrets/server.key"

Trusting the CA

We need to copy the ca.crt to /etc/ssl/certs/kube-ca.pem on each node and then install the package “openssl-c_rehash” as I found here. Photon is very minimalistic so you will find you keep having to add packages for things you take for granted.

tdnf install openssl-c_rehash

c_rehash
Doing //etc/ssl/certs
link 3513523f.pem => 3513523f.0
link 76faf6c0.pem => 76faf6c0.0
link 68dd7389.pem => 68dd7389.0
link e2799e36.pem => e2799e36.0
.....
link kube-ca.pem => 8e7edafa.0

Final Words

At this point, you have a Kubernetes cluster setup with some basic security. Not very exciting, at least in terms of seeing results but the next article should be meaningful to show how to setup flannel.

Next – Flannel Configuration

Initializing Kubernetes

Summary

In my previous article Intro To Kubernetes, we walked through installing dependencies and setting the stage for initializing Kubernetes. At this point you should have a master and one or two nodes with the required software installed.

A Little More Configuration

Master Config Prep

We have just a little more configuration to do. On kube-master we need to change “/etc/kubenertes/apiserver” lines as follows. This allows other hosts to connect to it. If you don’t want to bind to 0.0.0.0 you could bind to the specific IP but would lose localhost binding.

# From this
KUBE_API_ADDRESS="--insecure-bind-address=127.0.0.1"

# To this
KUBE_API_ADDRESS="--address=0.0.0.0"

Create the Cluster Member Metadata

Save the following as a file, we’ll call it create_nodes.json. When standing up a cluster I like to start out with doing it on the master so I create a /root/kube and put my files in there for reference.

{
     "apiVersion": "v1",
     "kind": "Node",
     "metadata": {
         "name": "kube-master",
         "labels":{ "name": "kube-master-label"}
     },
     "spec": {
         "externalID": "kube-master"
     }
 }

{
     "apiVersion": "v1",
     "kind": "Node",
     "metadata": {
         "name": "kube-node1",
         "labels":{ "name": "kube-node-label"}
     },
     "spec": {
         "externalID": "kube-node1"
     }
 }

{
     "apiVersion": "v1",
     "kind": "Node",
     "metadata": {
         "name": "kube-node2",
         "labels":{ "name": "kube-node-label"}
     },
     "spec": {
         "externalID": "kube-node2"
     }
 }

We can then run kubectl to create the nodes based on that json. Keep in mind this is just creating metadata

root@kube-master [ ~/kube ]# kubectl create -f /root/kube/create_nodes.json
node/kube-master created
node/kube-node1 created
node/kube-node2 created

# We also want to "taint" the master so no app workloads get scheduled.

kubectl taint nodes kube-master key=value:NoSchedule

root@kube-master [ ~/kube ]# kubectl get nodes
NAME          STATUS     ROLES    AGE   VERSION
kube-master   NotReady   <none>   88s   
kube-node1    NotReady   <none>   88s   
kube-node2    NotReady   <none>   88s   

You can see they’re “NotReady” because the services have not been started. This is expected at this point.

All Machine Config Prep

This will be run on all machines, master and node. We need to edit “/etc/kubernetes/kubelet”

KUBELET_ADDRESS="--address=0.0.0.0"
KUBELET_HOSTNAME=""

Also edit /etc/kubernetes/kubeconfig

server: http://127.0.0.1:8080

# Should be

server: http://kube-master:8080

In /etc/kubernetes/config

KUBE_MASTER="--master=http://kube-master:8080"

Starting Services

Master

The VMware Photon Kubernetes guide we have been going by has the following snippit which I want to give credit to. Please run this on the master

for SERVICES in etcd kube-apiserver kube-controller-manager kube-scheduler kube-proxy kubelet docker; do
     systemctl restart $SERVICES
     systemctl enable $SERVICES
     systemctl status $SERVICES
 done

You can then run “netstat -an | grep 8080” to see it is listening. Particularly on 0.0.0.0 or the expected bind address.

Nodes

On the nodes we are only starting kube-proxy, kubelet and docker

for SERVICES in kube-proxy kubelet docker; do 
     systemctl restart $SERVICES
     systemctl enable $SERVICES
     systemctl status $SERVICES 
 done

Health Check

At this point we’ll run kubectl get nodes and see the status

root@kube-master [ ~/kube ]# kubectl get nodes
NAME          STATUS     ROLES    AGE     VERSION
127.0.0.1     Ready      <none>   23s     v1.14.6
kube-master   NotReady   <none>   3m13s   
kube-node1    NotReady   <none>   3m13s   
kube-node2    NotReady   <none>   3m13s   

Oops, we didn’t add 127.0.0.1 – I forgot to clear the hostname override in /etc/kubernetes/kubelet. Fixed that, restarted kubelet and then “kubectl delete nodes 127.0.0.1”

It does take a while for these to start showing up. The provisioning and orchestration processes are not fast but you should slowly show the version show up and then the status to Ready and here we are.

root@kube-master [ ~/kube ]# kubectl get nodes
NAME          STATUS   ROLES    AGE     VERSION
kube-master   Ready    <none>   9m42s   v1.14.6
kube-node1    Ready    <none>   9m42s   v1.14.6
kube-node2    Ready    <none>   9m42s   v1.14.6

Final Words

At this point we could start some pods if we wanted but there are a few other things that should be configured for a proper bare metal(or virtual) install. Many pods are now depending on auto discovery which uses TLS. Service accounts also need and service accounts are using secrets.

For the networking we will go over flannel which will provide our networking overlay using VXLAN. This is needed so that pods running on each node have a unique and routable address space that each node can see. Right now each node has a docker interface with the same address and pods on different nodes cannot communicate with each other.

Flannel uses the TLS based auto discovery to the ClusterIP. Without hacking it too much it is just best to enable SSL/TLS Certificates and also a security best practice.

root@kube-master [ ~/kube ]# kubectl get services
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.254.0.1   <none>        443/TCP   49m
root@kube-master [ ~/kube ]# kubectl describe services/kubernetes
Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP:                10.254.0.1
Port:              https  443/TCP
TargetPort:        6443/TCP
Endpoints:         192.168.116.174:6443
Session Affinity:  None
Events:            <none>

Next – SSL Configuration