Demystifying Linux Auditd

Summary

My first real exposure to auditd has been deploying Rapid7 InsightIDR. Its been a great tool but I did not understand why they require auditd to be disabled. Is it not more secure having an audit daemon running? After a server had rebooted to completely disable this, it ran into a ton of performance issues and my path lead me down better understanding auditd.

What is Auditd?

At a high level it is a piece of software that allows you to view and audit actions being performed on the operating system. These actions can be directory listing, program execution, audit policy changes.

How is this Facilitated?

There are two pieces to the puzzle. There is a kernel based kauditd which queues up audit events for consumption. Various syscalls and other events cause an audit event to trigger.

These events are consumed via a userland auditd process. This process enables the auditing (via auditctl) and registers its PID and starts consuming events pushed through a PIPE. by default these events are simply logged to /var/log/audit/audit.log.

In the event that the auditd can’t consume these properly there is an audisp process that handles the overflow. Perhaps the auditd crashes but audit events are still pumped to the PIPE. Audisp will generically pump these to syslog which may push to /var/log/messages instead.

Auditctl

You can use the auditctl command to view the status or add rules. These rules would allow you to monitor extra resources or ignore certain ones.

auditctl -s

Why Does Rapid7 Want This Disabled?

The Rapid7 ir_agent has its own consumer process that scans the auditd events. Unfortunately (unless compatibility mode is enabled), only one process can consume these and Rapid7 wants its agent to be the consumer. It does so in an interesting way though. kauditd knows events can’t always be consumed in realtime so there are provisions for a backlog buffer which Rapid7 increases. It does this so a background job can run periodically to pull these events from the queue instead of listening in realtime. I’m not sure of the benefit of this method but it seems to work.

Without auditd disabled and stopped though, no others can listen, including Rapid7.

# auditctl -s
enabled 0
failure 0
pid 0
rate_limit 0
backlog_limit 65752
lost 0
backlog 0

Here you can see, it is currently disabled and no pid is registered. Importantly though you can see the high backlog_limit (buffer). It is normally under 300 because auditd is running constantly.

So Why Did Our Server Blow Up?

Well, in our case its a server that generates a ton of audit logs because it kicks off tens of processes per second. It was easy to dump those to a audit.log file but a security tool actually parsing those just couldn’t keep up. We will either need to filter some of those auditd events or disable the agent.

Summary

If you ever need to use auditd, hopefully this gives you a basic insight into it.

Quick and Dirty Noise Vibration Harshness Quantification

Summary

Car owners increasingly complain about road vibrations. You may get some new tires and notice a vibration you can’t explain. Over time a vibration may start or get worse. You bring it in and they can’t find anything wrong and recommend a tire balance and rotation. The issue is still there but the issue can’t be found. On the repair shop side, a customer comes in with a vibration complaint. Maybe they are being picky and its a minor vibration that is normal. Or perhaps you find the vibration and fix it but they still complain about it. How do you quantify the mitigation of the vibration?

This is what the science of Noise Vibration Harshness (NVH) aims to address. NVH “scopes” have been around for decades but many times they are difficult to read and understand. Newer scopes that make this much easier are out of budget for many people and repair shops. For repair shops they don’t tend to make money off troubleshooting these kinds of issues so it is hard to justify the expense.

My Exposure

In full disclosure, I came across this with my own vehicle vibration. I took it into the dealer and they did not find anything but recommended a road force balance. It seemed to help a little but not really.

I was researching scientific methods of identifying the vibration and came across NVH. My intent is to use this data to help provide some guidance and extra data for the dealer when asking them to use a PicoScope. I always like going in, informed and being able to provide objective and quantifiable data to the any repair shop. I don’t know that this will actually help narrow it down but understanding the science of how more professional tools works was a neat discovery.

High Level Science

The high level science to how a scope like these work is that they detect vibrations and use calculations to convert that to frequencies in hertz (Hz). That is basically a quantification of how many times per second that vibration happens. Each set of components components that rotate has a set frequency it will vibrate at. For example, an engine at 1200 Rotations Per Minute (RPM) is rotating at 20 Rotations Per Second. A hertz is a unit of measurement that represents one cycle per second. Something that cycles 20 times per second or rotations per second would be 20Hz.

It becomes a little complicated because vibrations can have an order to them. In the above example of an engine at 1200 RPM. Its first order vibration would be 20Hz. A second order vibration would be 40Hz, doubling for each subsequent order. Certain components have natural vibrations at various orders. For example an 8 cylinder engine will have a 1st order vibration and a 4th order vibration. Each engine has a Nth order vibration that is equal to half of their cylinders in a standard 4 stroke engine.

Everything in a vehicle vibrations, so how do we know what to look for? The amplitude of each vibration is the key. Many software packages measure this in meters per second squared or mg or mili-gravity. The universal constant for gravity is 9.81 meters per second squared so something traveling at 1G is accelerating or increasing speed at a rate of 9.81 meters per second every second. In the automotive world, typical vibrations are so minimal that we use mili-gravity or mg.

Understanding the Science For Automotive

The key to this is to find order vibrations that relate to major components. A vibration that relates to the engine does not mean the engine is to blame. It just means that something that spins at the same rate as the engine is the cause. This could be an engine fan, crankshaft, camshaft or other moving part at the same speed or one of the vibration orders.

The same applies to tires. Just because you found a vibration that correlates to tires doesn’t mean it is the tires. It could be a wheel bearing or other part that moves at the same speed as the tire.

The third major component that is measured is a drive shaft or prop shaft(in rear wheel drive vehicles). These typically rotate at a fixed rate that is related to tires but also by a multiplier of what your rear end ratio is. The rear end ratio is usually not a whole number like 4 to 1 or 4:1. It is usually something like 3.73:1, 4.11:1 or 3.23:1. This allows us to differentiate it from tire vibration orders fairly easily.

Dealer Tools

To combat this, the Dealers have tools at their disposal. In particular, in recent times they have a PicoScope with NVH modules – https://www.picoauto.com/products/noise-vibration-and-balancing/nvh-overview. There is a specific version for GM. It has a price point into the thousands of dollars depending on the model of unit and how many NVH modules. More NVH modules gives you different reference points for the concern and helps isolate it to the area of the vehicle.

Once nice thing about PicoScope though is the software itself is free. If you can get a dealer or someone else to run the diagnostics and send you the file, you can open it up and view/analyze it.

Lower End tools

There are other options. Since smart phones have the necessary equipment to capture many of the metrics necessary, one in particular called “NVH App” by VibrateSoftware caught my eye. These types of software use your phone’s accelerometer to detect vibrations and its GPS to detect speed. It is unfortunately out of most people’s budget @ $399.00USD. It does put it closer to budget for smaller repair shops though or hobbyists that deal with these kinds of issues frequently. Personally I think if they had a 3-4 day subscription for $50, more people would go for it or allow a yearly repair shop price of $399 it may do better. Even AC Delco’s TIS2WEB lets you have 3 day prices on their software. In any case it is very promising.

Even Cheaper!

If you really want to save a buck though, there are quite a few vibration apps not specifically geared towards this use case that you can use. I went through a few of them and came across myFrequency by APPtodate. It was fairly economical at around $9. The main feature you need is the ability to detect multiple frequencies at the same time. I won’t claim to understand the math behind it but I believe it has to do with determining velocities of the vibrations to distinguish them from each other when you’re using one accelerometer.

In this particular example I was driving my truck at 50mph and maintaining 1200rpm. The tires are P275/55R20 and rear end ratio is 3.23:1. We’ll get into later on what that all means. From here you can see the top frequency by amplitude is 26Hz. Drilling into the app it is at 53.52mg which is substantial.

Vibration Analysis in X axis

Here you can see the full frequency spectrum

Frequency Spectrum Analysis

This does however require you to do quite a few calculations and use a constant RPM and speed for the duration of the test. These tests are usually only 10-20 seconds so it is possible.

Deep Dive Into Math

As discussed, engine Hz is the easiest to calculate. Simply divide RPM by 60 to get RPS or Hz. In the above case 1200RPM / 60 = 20Hz.

At 50mph, the tire RPM becomes a little more lenghty. My tires are P275/55R20 which using a tire calculator like https://tiresize.com/calculator/ comes out to 31.9 inch diameter. 31.9 * 3.1416 (constant of PI) gives us 100.2 inch circumference. We already knew that because the calculator provides it. Divide that by 12 and we get 8.35 foot circumference. Every rotation of the tire goes 8.35 feet.

Every mile has 5280 feet in it. 50 miles per hour multiplied by 5280 gives us feet per hour. Divide that by 60 to get 4400 feet per minute. Divide that by 8.35 feet to come to 526.95 RPM. We can then divide that by 60 to get 8.78 as our first order tire vibration.

The rear end of 3.23 means that that prop shaft turns 3.23 times per one turn of the tire or every tire RPM/Hz equals 3.23 turns of the prop shaft. Multiplying 8.78 by 3.23 comes out to 28.36 Hz.

So now we have our first order vibrations of Engine 20Hz, Tire 8.78Hz and 28.36Hz of the prop shaft. We’re also looking for direct multiples of those as well.

Reading the Graph

Pointing out some of the peaks of the graph, you can see we have a 20Hz disturbance with a high amplitude and a 26Hz one as well. 20Hz matches up to the engine and 26 is fairly close. Usually it would have to be closer but I’ve not found the proper mounting point of the vehicle yet. I’m also just starting to take these measurements.

Some things could make the tire and prop shaft skew like not keeping exact speed and the fact that while tire diameters are the factory spec, when you put a vehicle on them they will not have a perfect diameter that matches. Tire wear can also play a little in the numbers.

In any case, this particular one seems to point more to engine RPM so I’ll likely take it for another run and bring up the engine RPM. This particular truck has had torque converter lockup issues and pulsates the TCC at around 1200RPM so I could be catching that pulsation on the TCC.

Final Words

If you don’t have access to a shop that has a PicoScope or similar NVH and don’t have the budget for the NVH app, you may have what you need to perform these calculations on your own. In my case I created a spreadsheet to calculate these frequencies based on engine RPM, speed and a few vehicle variables. For my test runs I just need to plus in the variables and see where the graphs line up.

Phone mounting is a huge variable in this. The more professional tools have heavy duty magnets that you adhere to the seat rail which is an excellent position to detect vibration. You can move them to various seats to find the source of the vibration as the amplitude is limited.

With a phone it is more difficult to find mounting points in various spots int he vehicle but hopefully this gives you an economical method to track down vibrations.

GM 8L90 Transmission – P0711

Summary

This is another automotive post on my journey through P0711 on my GM 8L90 transmission for my 2015 Yukon. It is on the K2XX platform. At 5 years old it is starting to get some mileage and wear as seen in GMT K2XX Magnetic Ride Control Shock Inspection.

Background

Monday morning I was out for a drive. With COVID all around us, I find I don’t get out of the house often. My truck still needs some miles on it or it would just sit for a while. All of a sudden I look down and notice the check engine light. I think this is the first time I have actually seen one on my truck. Despite being 5 years old, it only has 38k miles on it.

I’m fairly technical and like to do my own diagnostics when possible. I do it so that I can speak intelligently to the repair shop(dealer in this case). My truck was in the shop a few days on and off last week for some tire issues so I also wanted to avoid an unnecessary visit.

Diagnostic

Recently I acquired a Foxwell NT510 Elite. Its a pretty useful bidirectional scan tool. Bidirectional means it can not only read codes but it can do active tests as well and minor calibration resets. It seems to have most of the features of the more expensive ones but it is locked/licensed by the vehicle brand and usually comes with one brand free. In my case GM. You can purchase other brands and add them to the unit. On this model its a lifetime purchase which is nice.

In my case, I checked the codes starting with the ECM and had a P0700 – Transmission Control Module Requested MIL Illumination. This simply told me to check the Transmission Control Module which showed me P0711 – Transmission Fluid Temperature Sensor Performance. Here is a good link someone on www.gm-trucks.com shared with me – https://www.dtcdecode.com/GMC/P0711

The TCM wants to know transmission temp so it can make informed decisions on shift pressures. It will also do things like not lock up the torque converter clutch if it is too cold. I imagine the shift adapts require certain temperature ranges too, to learn properly.

Doing a bit of reading, seems many people have had failed Transmission Fluid Temperature (TFT) Sensors and they just fail. This was more so on the 2016 models which switched to a 1 piece hardness that encompassed the sensor.

On my dash, if in tow mode, I can tell TFT and it was showing a value. Using my scantool, it showed -40F originally but that ended up being a firmware bug and Foxwell sent me an updated code which fixed it. Cold ambient temp from letting it sit over night seemed in line with TFT. Monitoring it further though, the transmission never warms up to the recommended 195-200F. It also had a few cases where the temp dipped to 100 on the dash (lowest number).

Transmission Thermostat

On the 6L80 and 8L90 series, they introduced a thermostat to help increase the temperature to around 200 on daily driving. It is supposed to support better shifts. Others have said it also helps boil off the moisture which has been a huge problem for these transmissions. There is a small subset of people that feel this is too warm and they actually “delete” the thermostat to make it run cool constantly. They are either in much hotter climates or don’t believe the engineers at GM did it right. There is an old line of thinking that 200F will cook a transmission. They could be right but the warning temp on my dash indicates 300 and these newer fluids are supposed to easily get into the 240-250’s. Nobody on the forums with a properly working transmission seems to hit anywhere above 220 though and that’s under extreme loads. Normally the thermostat opens at 192 and any transmission cooler helps keep it down from much above that.

Freeze Frame

In any case, after clearing the codes one day, the next day only P0711 returned. It requires a second time to trigger before check engine shows up. I realized my Foxwell supports reading Freeze Frame. Its kind of like a black box but under less severe circumstances. In instances where certain codes are set, it will capture parameters that surrounded it. In my case I lucked out and it captured a 48F degree TFT which is definitely out of line

P0711 Freeze Frame - Transmission Fluid Temperature and Malfunction Counter
P0711 Freeze Frame

Graphing

I was even able to graph the temperature over the drive. It does seem like there is a bit of fluctuation but that could be because its not allowed to warm up with a possibly malfunctioning thermostat. Or the harness could be loose. Or the sensor could be inaccurate!

Replacing Transmission Fluid Temperature Sensor

There are quite a few videos on this. Here is a great one if you have the one piece harness, whereas at this time I think they switched back to the two piece which is less involved.

Conclusion

Since I have a GM Protection Plan (extended warranty), I’m not terribly interested in possibly voiding it. I also don’t have the expertise or comfort level to start toying with the transmission itself as the temperature sensor is in the pan. Appointment to the dealer made and I may update this Monday with the results. Regardless, anyone that gets a P0711 on a K2 platform on the 8L90 or even 6L80 transmission, this may help you diagnose it if you’re out of warranty. It could be an economical sensor/harness or even an easier to replace thermostat.

GMT K2XX Magnetic Ride Control Shock Inspection

Yes, this blog does tend to touch on other areas I am passionate about other than IT related topics. I’m an engineer at heart so I love investigating and understanding things, particularly those that tend to lead me down a rabbit hole.

Summary

About 6 months ago I started what would be my journey of understanding how to inspect and diagnose Magnetic Ride Control shocks. I took in my 2015 GMC Yukon Denali to the dealer for what I thought was going to be an oil change and a recall that simply required a code update. I was kindly informed that my front strut was leaking and needed replacement. Hopeful that my extended warranty might cover it, I was also kindly let know that was not the case. This was a maintenance item and parts & labor were going to be roughly $900.

Fast forward a few months and my original extended warranty was about to expire so I purchased a new one that came with a 30 day / 1,000 mile wait period. With COVID-19, I was doing 80 miles a month. I had to step it up and started doing 50-80 miles a day to make up for it. Towards the last 200 miles I realized the ride was very rough. I was not sure if the roads were terrible or not driving so much had increased my expectation of a smooth ride. Remembering that I had a leak on my front strut, I decided to take a look at the remaining strut and shocks.

Magnetic Ride Control

For those that do not know, Magnetic Ride Control is a premium suspension option in select GM vehicles. It has been around circa 2003 but is now on its 3rd iteration. There are many sources that can do a better job of explaining it than me but essentially instead of regular fluid, there is a magnetic fluid and an electromagnetic that is used to in realtime make adjust based on many driving conditions such as gas pedal position, steering position, incline of vehicle and ride height.

The rear shocks in the vehicle also have air springs that are inflated or deflated to provide auto leveling on top of this. This allows the vehicle to maintain its level independent of the load (to a degree).

For my vehicle platform (K2XX) the suspension RPO (Regular Production Option) is Z95. Here’s a good link on RPOs and what they’re used for – https://www.newgmparts.com/decoding-general-motors-rpo-codes

Leaky Shocks?

This is my first time having to deal with these particular shocks so I went to a few sources of information such as r/MechanicAdvice on reddit. Someone responded but they indicated that these shocks can’t leak as the air spring would contain the fluid. I would later find out this was partially correct.

Deciding to wipe it down and see if more fluid showed up, it did. This caused me to reach out to a few personal contacts and their opinion was that they were indeed leaking.

This onee had a ton of build up but was only slightly leaking, perhaps it had almost completely leaked out?
This onee had a ton of build up but was only slightly leaking, perhaps it had almost completely leaked out?
This one was very wet all the way around and you could tell with all the build up
Leaky Leaky

Bite the Bullet and Replace

After wiping down again, I was able to find the part number AC DELCO 23290661 which had been replaced by AC DELCO 84176675. The new part number hopefully indicated I might get more than 5 years / 37k miles on these! I shopped around and found Rock Auto had the best price for OE replacements although Arnott seemed to have a nice rebuild for $100 per shock cheaper. Maybe next time I’ll go that route.

They arrived and they were beautiful! Clear as day what they should look like.

So Pretty!

I’m a hobbyist and have done shocks before but just did not want to mess with this so I hired a local mobile mechanic to throw them in. Check out their Instagram video of them on the job! – I’m not sure why it says its violent but I promise its not!

After removing the old ones, they sure looked like they were in worse shape than I expected.

Anatomy

While learning about this I was curious of the anatomy of this. People had described it through my investigations but I’m a visual learner. I did come across this Youtube video on a Mercedes Air Shock which was similar but I wanted to see mine and how they failed.

Z95 Rear Shock Tear Down

To tear it down I used some rubber gloves as the magnetic fluid can be fairly abrasive, a very sharp box cutter and some pliers/cutters.

Start cutting!

Peekaboo!

At the end you can see a fully opened MRC shock. The lines point to the retaining metallic bands that hold the bladder into place. With it removed its essentially a regular shock in appearance.

So technically the bladder should hold the fluid but apparently the abrasiveness of it causes it to start leaking out over time by eating away at the rubber.

Looking towards the bottom of the shock
Looking towards the top where the air chamber is

Conclusion

Now that I am familiar with these shocks and how they fail, it does appear the remaining strut in the front is leaking and time for a replacement. Luckily I came across this very informative video on how to do it. I have a buddy with a spring compressor and better yet knows how to use it so wish me luck!

Extending Old Hardware Life With ChromeOS

Summary

I came across an interesting idea of extending the life of old by using ChromeOS. What is ChromeOS? We’ll get there. The problem is that many households have an old computer laying around collecting dust. It lacks either memory, hard drive space or compute power to be useable though. At the same time, with everyone social distancing, there’s a need for an extra device. You may have children trying to share devices to get on the internet or need an extra one for a family member that is in your house. This article may help you with that.

What is ChromeOS?

On wikipedia, ChromeOS is described as the following.

Chrome OS is a Linux kernel-based operating system designed by Google. It is derived from the free software Chromium OS and uses the Google Chrome web browser as its principal user interface.

https://en.wikipedia.org/wiki/Chrome_OS

For those non technical, it is basically an Operating System that is minimized to just the basic needs to get on the internet. Its assumption is that most uses cases for computers these days are to access the internet. When it boots up and you get logged in, its primary user interface is Google Chrome.

The problem

Google does not provide installation files to install ChromeOS onto your legacy hardware. ChromeOS is designed to come pre-installed on its Google ChromeBooks.

The Solution!

ChromeOS is open sourced which means other companies can take its code and build upon it. A company named neverware has done just that. It does have paid offerings for education and business but there is also a free download for home users.

What You Need

  • 8GB or larger USB stick
  • Operational computer you wish to install ChromeOS onto that has 2GB of RAM and 16GB harddrive
  • Ideally a newer and operational computer to generate a bootable USB stick.
  • A Google Account (you can create one as a part of this process)

Installing CloudReady

If you have a Windows machine, the installation is fairly straight forward as you will download the windows “USB Maker” and insert your USB drive into that computer and go through the guided wizard.

Once the bootable USB drive is created, insert it into the destination computer and boot or reboot it. You may need to go into that machine’s system BIOS to select a different boot order or ensure that it will boot off the USB drive. Sometimes during boot there is a keystroke to press such as F12 or ESC to select that. You may have to refer to your computer’s user guide on that. neverware has a list of these for the major manufacturers – https://guide.neverware.com/install-and-setup/boot-usb/

When booting off the USB drive, it will boot directly into ChromeOS or neverware’s version of it called CloudReady. It is fully usable at this point so even if the computer’s harddrive is crashed you could perform basic tasks. Ideally after booting you will go through an install process to install it locally and not require the external USB stick.

Booting

During the boot process you will see the following Welcome screen

Welcome!
Welcome!

Next it will require some sort of network connectivity. If you have a wired connection you are good to go. If you are on wifi, it will ask you to authenticate to your network.

Connect to network
Connect to network

Once connected to the network, it will ask you to login to your Google account.

Sign in to your Chromebook
Sign in to your Chromebook

Finally – here we are, a Chrome browser. The above is a one time setup process. Sometimes two times. When doing the above on the Live environment, after installing you may need to go through this once more.

Here we are!
Here we are!

Actually Installing

After you have used the live environment for a while and decided you want to permanently convert that computer to it you can click on the clock and select “Install OS”

https://guide.neverware.com/install-and-setup/home-edition/

Limitations

  • No Google App or Play Store due to licensing issues
  • This is not Microsoft Windows and your apps are limited to what’s in the Chrome Extension store

Final Words

At this point you will have a neverware’s version of ChromeOS installed and have basic web browser functionality on this device. It should be more than sufficient for basic browsing since that is the typical use-case. Hopefully it has helped you make some use of a device that was just collecting dust.

ISC BIND Look Aside Related Outage

Summary

I had a fun issue today. All of a sudden BIND stopped returning results for recursive queries to external zones.

My logs were filled with lines like the following

Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3458067420: dlv.isc.org NSEC: verify failed due to bad signature (keyid=64263): RRSIG has expired
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3458067420: dlv.isc.org NSEC: no valid signature found
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]: error (network unreachable) resolving 'com.dlv.isc.org/DS/IN': 2001:500:2c::254#53
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450030b60: dlv.isc.org NSEC: verify failed due to bad signature (keyid=64263): RRSIG has expired
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450030b60: dlv.isc.org NSEC: no valid signature found
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450065e40: dlv.isc.org NSEC: verify failed due to bad signature (keyid=64263): RRSIG has expired
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450065e40: dlv.isc.org NSEC: no valid signature found
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]: error (no valid RRSIG) resolving 'com.dlv.isc.org/DS/IN': 149.20.64.4#53
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450065e40: dlv.isc.org NSEC: verify failed due to bad signature (keyid=64263): RRSIG has expired
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450065e40: dlv.isc.org NSEC: no valid signature found
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]: error (no valid RRSIG) resolving 'com.dlv.isc.org/DS/IN': 156.154.100.23#53

Troubleshooting

Naturally I tried bouncing named without luck. I then thought there was an issue with the root zones and configured forwarders without luck. I had disabled dnssec via “dnssec-enable no” without luck.

This seemed fairly strange. Ultimately since it was DNSSEC related I opted to disable it via as a temp workaround. It appears the validation was the issue.

dnssec-enable no;
dnssec-validation no;

After some investigation and troubleshooting it appeared to be related to ISC’s DLV and letting RRSIG expire accidentally. It failed in an unexpected manner when this happened.

What is DLV?

DLV stands for DNSSEC Lookaside Validation. DLV is a service that ISC has provided since circa 2006. It allowed DNSSEC to be enabled on zones that could not otherwise be enabled. Not all Top Level Domains (TLD) implemented DNSSEC until the past few years. This was a workaround to allow DNSSEC until then.

In 2017 it was finally decommissioned with DNSSEC being fully available to all TLDs. The A record was left in place and many resolvers still attempt to connect but it does not provide any data.

What is RRSIG?

If you want a full view of DNSSEC and how it works, CloudFlare has a great article for that here – https://www.cloudflare.com/dns/dnssec/how-dnssec-works/ . In short though, RRSIG records contain cryptographic details, particularly start and end dates for the validity of that data. This is much like an SSL Certificate that has a valid period.

The RRSIG records are designed to be required to be updated frequently to ensure the security much like SSL Certificates need to be renewed. This helps prevent a replay attack where an older compromised key is reused.

RRSIG Value

Running the following I could see it expired

# dig +dnssec dlv.isc.org

dlv.isc.org.		3599	IN	RRSIG	DNSKEY 5 3 3600 20200325160456 20200224153150 19297 dlv.isc.org. TyUbbNgG/Oru7TQFHbDC9E208hB8Szheu634Q03nawQFz4dosOFg+ZB5 z8Svh8fw/g35a/ZW5AP1jbSKh19u4c7Ujre3iygS0Tjycmi0mYG6dS7I CcWLOxZpOKf8uw9mzgbIR/VDEFmKj0OJKdkxAqfaWxXLqBBWgFqIucC6 9Tb98clinCPW34xgk6Fzi+OKAFmiGH6/e8wk/h5RMWxipx5KAk2NsWsw QMyEDaA7eLzZTbBenftVR86g6QO4bR+LOKzxGBFQ2XW0ArQKDiuoBqEw 8cmRcGKzVJ761d7EK+LDvnktRNxRMJ9y5LPgxlO2Xm3Un8oExjVbLKi7 OigQnA==

20200325160456 was the key, that translated to 3/25/2020 16:04:56 UTC which is about when the issue started. Further down in the “References” section the ISC-USERS list confirmed this was by mistake. I suppose it was a good “scream” test to remove lookaside. Newer BIND versions do not even support this anymore.

What Happened?

On many older BIND servers deployed before 2017, they were configured with the following.

dnssec-lookaside auto;

Auto would try to query dlv.isc.org first and then query root name servers. The expected behavior was that it simply would not return any data and then the root zones would be queried.

Unfortunately with an expired RRSIG it failed in a way that made BIND think the query response was not valid and an expected failure. For all BIND knew, it was preventing a replay attack.

References

I had originally posted on Reddit and was also pointed to ISC-USERS.

Using Certificate Based Authentication

Summary

Recently a client had a need for putting a web application on the internet that end users could access. They wanted to lock it down so that not everyone on the internet could access. Whitelisting IP Address was not an option because they were remote users with dynamic IPs and the overhead of maintaining this whitelist would be problematic.

The use case was a password recovery tool that their remote users could use to reset and recover passwords. Simple authentication would not suffice. For starters if the users’ passwords expired they wouldn’t be able to easily log into the site. Along with that it would be a high profile target for brute forcing.

Why Not IP Whitelisting?

IP whitelisting used to be and still us for some organizations the de-facto method of filtering traffic. It can be very problematic though. Users today are on the go, working remotely or using their mobile device on cellular data as well as home internet. Other times it involves sales staff at client sites. Keeping up with these IP whitelists can be a chore. Updating this whitelist can be time sensitive to avoid halting productivity. When not maintained, there is a chance someone unexpected could gain access due to simply having an IP previously whitelisted.

A workaround for this is VPN but that requires a bit of support overhead in user training and support. This can be clunky for users that are not used to to using VPN.

Why Certificates

Many larger organizations already have internal Certificate Authorities in place. For Microsoft Active Directory deployments, when CA has been installed, end users are likely auto enrolling in user certificates. Domain joined workstations already have these and trust it the internal Root CA.

Certificates also have a built in expiration. In an auto enrollment environment, this expiration could be lowered substantially to below 1 year.

TLS Handshake

Once of the nice features of TLS is that it does include a mechanism for this. Below is an example of a TLS handshake where the server requests a certificate and the client provides it.

TLS Handshake - Certificate Authentication
TLS Handshake – Certificate Authentication

In Frame 19, the client makes the TLS request with a Client Hello. Frame 23 the Server response with a Server Hello. This is where they set parameters and negotiate things like TLS versions and encryption algorithms.

Frame 26 is part of the Server Hello but it was large and split up. Boxed in red is the “Certificate Request” where the server is requesting a certificate to authenticate.

Frame 33 is where the client actually provides it.

From here you can see this happens before the application level (HTTP) protocol communicates starting in frame 43. What this means is that before the user reaches the web application for authentication, the device requiring TLS Certificate Authentication is filtering the requests. Many times this is a reverse proxy or load balancer is not vulnerable to the same exploits as the web servers.

Browsers

When used properly and the client has a certificate, the browser will prompt users for a certificate to use such as pictured below.

Browser Certificate Authentication Prompt
Browser Certificate Authentication Prompt

Other Applications

A really neat application for this when you have a legacy plain text protocol in play but you want to open it up over the internet and secure similarly. Perhaps you have a legacy application that uses raw text and is not SSL/TLS capable. You can still put this on the internet through a reverse proxy like F5 LTM or stunnel.

Traditionally this type of traffic would be protected via IPSEC tunnel that is encrypted or a dedicated circuit such as MPLS. That does require specific hardware and/or monthly circuit costs to accommodate.

stunnel is extremely useful in this scenario as you can install it on the local machine that has the legacy application and configure it to connect to localhost on a random port and proxy information out over TLS and configure it to use the certificate based authentication.

Here is a graphical example of what that may look like with an stunnel server broken out. stunnel could be installed on the end user’s workstation though.

Legacy App Secured with TLS 1.2 or higher & Certificate Based Authentication
Legacy App Secured with TLS 1.2 or higher & Certificate Based Authentication

stunnel could be put on the local end user workstation to minimize that unencrypted leg. Typically on the server side the reverse proxy has a leg directly on the same VLAN/subnet as the application which minimizes exposure over that but this does help secure the application traffic over the untrusted internet.

Final Words

In this article we learned a little on Certificate Based Authentication. We also learned how it may help your organization better secure your applications and possibly avoid more costly solutions.

Internet Routing and BGP Looking Glasses

Summary

From time to time I get requests from colleagues, “Can you ping this address?”. Many times what is going on is they are bringing up a new internet link and want to check routing. Sometimes they areadding a public endpoint and want to make sure its accessible. They are asking me because within their network it works but they need to make sure it is accessible or routing properly over the internet.

BGP Looking Glasses are a great tool for this if you would like to be self sufficient. You can also get a wider view than just a few colleagues. My favorite one is HE.net’s https://lg.he.net

What is BGP?

BGP Stands for Border Gateway Protocol. It is the standard exterior gateway protocol for internet routing. While an internal network may use something like OSPF for its interior, BGP is better suited for exterior routing.

One of the few keys to understanding BGP at a high level is to understand it is a distance vector routing protocol. These routes are typically better suited for WAN routing as they are a bit more light weight. The downside is that each router’s perception of the internet is key. There is no unified single view of the internet routing. Each router has its own perception of the internet based on the BGP routes it receives.

This is why Looking Glasses are so important. You can see the perception of various points on the internet. This can help you determine internet traffic destined to your infrastructure is taking the expected and optimal path.

HE.net’s Looking Glass

Here is a small snippit of HE’s LG. They have various routers all over the world that would not fit into this screen shot. You can also see the various functions you can perform on the right hand side.

HE.net Looking Glass
HE.net Looking Glass

Ping and traceroute are fairly self explanatory. For ping it just returns the results of an ICMP ping and traceroute shows a list of routers in the path of getting to the destination.

HE.net Looking Glass - Traceroute
HE.net Looking Glass – Traceroute

The real value here is the “BGP Route” option.

HE.net BGP Looking Glass Details
HE.net Looking Glass BGP Details

Here we can see all of the BGP peers this particular router has learned the route to 1.1.1.0/24, the AS path it takes and which one it selects as the best path.

Autonomous Systems

If you are new to BGP and dynamic routing protocols you may be wondering what an AS (Autonomous System) is. In BGP world, it is basically a grouping of similar routers that announce a similar set of subnets or prefixes as they are called in BGP. BGP groups systems together by AS. As each router learns a route, it appends its own AS to the AS path before passing it along.

The above is a bad example because it shows a single direct AS path as HE appears to be directly peered with CloudFlare (AS 13335). CloudFlare is very well peered on the internet. Below is a better example. It at least shows it passing through AS 1299 (Telia) to AS 174 (Cogent)

It seems HE.net is fairly well peered but here is another router output that shows some decent AS paths and the differences. AS 174 being Cogent and AS 209 being CenturyLink and AS 3356 being Level3 it chooses the shortest AS path. Keep in mind the traceroute through CenturyLink could possibly be shorter actual router hops. A shorter AS path does not necessarily mean less latency or shorter traceroute hops.

#show ip bgp 73.0.0.0/8 
BGP routing table entry for 73.0.0.0/8, version 767462940
Paths: (2 available, best #1, table default)
  Advertised to update-groups:
     18        
  Refresh Epoch 1
  174 7922, (received & used)
    X.X.X.X from X.X.X.X (X.X.X.X)
      Origin IGP, metric 13031, localpref 100, valid, external, best
      Community: 174:21000 174:22013
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 3
  209 3356 7922, (received & used)
    X.X.X.X from X.X.X.X (X.X.X.X)
      Origin IGP, metric 7800026, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0

Final Words

If you ever find yourself needing to ping or traceroute from a remote endpoint lg.he.net has you covered. Many carriers have their own looking glass. This is useful incase you want to see how your routes are perceived from their end. If you use BGP at your edge and receive full routes, this is another avenue of seeing those prefixes. With looking glasses, you can do this from various endpoints across the internet pretty easily.

OpenVPN with Encrypted Private Key – Issue Resolved

Summary

I was working on Azure Client VPN with OpenVPN and in testing I had removed the passphrase on the private key for authentication but wanted to put it back on there and it would not work. Some quick searches did not turn up much other than common complaints of this.

Reason

With certificate based authentication on OpenVPN, the public key and private key are put in the ovpn file. This is not the most secure with an unencrypted private key as anyone can simply obtain the file and connect.

With a passphrase on it, there is less concern over the ovpn being disseminated and the key reused.

Issue

The OpenVPN client logs indicated the following

2019-12-20 12:22:42-0600 [-] OVPN 40_89_167_211_p4967 ERR: ">FATAL:CLIENT_EXCEPTION: connect error: PEM_PASSWORD_FAIL: mbed TLS: error parsing config private key : PK - Private key password can't be empty"
2019-12-20 12:22:42-0600 [MyOMIClient,0,] FROM OMI: u">FATAL:CLIENT_EXCEPTION: connect error: PEM_PASSWORD_FAIL: mbed TLS: error parsing config private key : PK - Private key password can't be empty"
2019-12-20 12:22:42-0600 [HTTPChannel,224,] *** API CALL f=xmlrpc_Poll args=['sess_40_89_167_211_p4967_VuxJ7YkTyUVYXeKV_1', 10] kw={} ret=[{'active': True, 'timestamp': 1576866162, 'type': 'ACTIVE', 'last': None}, {'timestamp': 1576866162, 'type': 'FATAL', 'error': u"CLIENT_EXCEPTION: connect error: PEM_PASSWORD_FAIL: mbed TLS: error parsing config private key : PK - Private key password can't be empty"}]

It seemed like it was just not prompting me to enter the passphrase and using a blank one.

Solution

I came across the following article ( https://github.com/pivpn/pivpn/issues/372 ) which had quite a few tangents on it but most recently someone indicated using OpenVPN Connect 3.1.0 and it worked. I tried that and was surprised, it actually prompts me for the passphrase!

It seems the community version of the OpenVPN GUI client supported this but the OpenVPN Connect lacked the feature until recently. As of the time of this writing 3.1.0 is Beta but seems to work great!

I had tried upgrading from 2.5 to 2.7 without luck. Finally after installing 3.1.0 it worked again.

Azure Client VPN with OpenVPN

Summary

In my article Intro To Azure Active Directory Domain Services we discussed environments with minimal infrastructure. With all of the RDP exploits it is typically best not to expose RDP over the internet. Since Bastion is not yet fully available the next best thing aside from setting up a VPN appliance is to use the Point-to-site functionality of a Virtual Network Gateway.

Prerequisites

The first pre-requisite for client VPN using a Virtual Network Gateway is to actually provision one. For OpenVPN compatibility it does require at least SKU VpnGw1 and will not work with basic.

It will require 2 subnets, one for the inside leg of the gateway and another for the client-side pool.

The Virtual Network Gateway does want an inside subnet dedicated for use to the Virtual Network Gateway and not shared amongst other devices.

Authentication is handled either via radius or certificate based. If you are reading this article for a minimized infrastructure you probably do not have radius servers.

Provisioning

The provisioning process is fairly simple although it can take 30-60 minutes for the Virtual Network Gateway to fully provision before you can use the Point-to-site configuration. There are a few simple questions.

Virtual Network Gateway - Create
Virtual Network Gateway – Create

That’s really it for the initial provisioning.

Configuration

Some basic Point-to-site configurations need to be set.

Point-to-site configuration
Point-to-site configuration

The next part is the most difficult part of this. A root and at least one child certificate have to be provisioned. Microsoft has some good documentation on it. To do in Powershell, it does require Windows 10 or Server 2016 or higher.

Root Certificate
Root Certificate

The name is arbritrary but the “Public certificate data” is the area between the “—BEGIN CERTIFICATE—” section and the “—END CERTIFICATE”

The following Microsoft article describes and outlines the process much better than I can do so I will just share it here – https://docs.microsoft.com/en-us/azure/vpn-gateway/vpn-gateway-certificates-point-to-site

Client Configuration

Again – Microsoft does a really good job on instructions for configuring the client so I will just share this link – https://docs.microsoft.com/en-us/azure/vpn-gateway/vpn-gateway-howto-openvpn-clients

Final Words

It can be a pain for those of us not familiar with certificates and command line tools like openssl. The idea is that you have a root certificate authority that then issues individual certificates per user or group of users. If that key becomes compromised you can then revoke the individual certificate or untrust the entire certificate authority. I like the idea of creating a CA chain per organization you grant access to.

In this article we walked through creating the resources required and configuring. We did rely heavily upon the Microsoft documentation but it was fairly complete and well shown.

Make sure you distribute your ovpn files with encrypted private keys! – OpenVPN with Encrypted Private Key – Issue Resolved