ISC BIND Look Aside Related Outage

Summary

I had a fun issue today. All of a sudden BIND stopped returning results for recursive queries to external zones.

My logs were filled with lines like the following

Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3458067420: dlv.isc.org NSEC: verify failed due to bad signature (keyid=64263): RRSIG has expired
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3458067420: dlv.isc.org NSEC: no valid signature found
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]: error (network unreachable) resolving 'com.dlv.isc.org/DS/IN': 2001:500:2c::254#53
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450030b60: dlv.isc.org NSEC: verify failed due to bad signature (keyid=64263): RRSIG has expired
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450030b60: dlv.isc.org NSEC: no valid signature found
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450065e40: dlv.isc.org NSEC: verify failed due to bad signature (keyid=64263): RRSIG has expired
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450065e40: dlv.isc.org NSEC: no valid signature found
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]: error (no valid RRSIG) resolving 'com.dlv.isc.org/DS/IN': 149.20.64.4#53
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450065e40: dlv.isc.org NSEC: verify failed due to bad signature (keyid=64263): RRSIG has expired
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450065e40: dlv.isc.org NSEC: no valid signature found
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]: error (no valid RRSIG) resolving 'com.dlv.isc.org/DS/IN': 156.154.100.23#53

Troubleshooting

Naturally I tried bouncing named without luck. I then thought there was an issue with the root zones and configured forwarders without luck. I had disabled dnssec via “dnssec-enable no” without luck.

This seemed fairly strange. Ultimately since it was DNSSEC related I opted to disable it via as a temp workaround. It appears the validation was the issue.

dnssec-enable no;
dnssec-validation no;

After some investigation and troubleshooting it appeared to be related to ISC’s DLV and letting RRSIG expire accidentally. It failed in an unexpected manner when this happened.

What is DLV?

DLV stands for DNSSEC Lookaside Validation. DLV is a service that ISC has provided since circa 2006. It allowed DNSSEC to be enabled on zones that could not otherwise be enabled. Not all Top Level Domains (TLD) implemented DNSSEC until the past few years. This was a workaround to allow DNSSEC until then.

In 2017 it was finally decommissioned with DNSSEC being fully available to all TLDs. The A record was left in place and many resolvers still attempt to connect but it does not provide any data.

What is RRSIG?

If you want a full view of DNSSEC and how it works, CloudFlare has a great article for that here – https://www.cloudflare.com/dns/dnssec/how-dnssec-works/ . In short though, RRSIG records contain cryptographic details, particularly start and end dates for the validity of that data. This is much like an SSL Certificate that has a valid period.

The RRSIG records are designed to be required to be updated frequently to ensure the security much like SSL Certificates need to be renewed. This helps prevent a replay attack where an older compromised key is reused.

RRSIG Value

Running the following I could see it expired

# dig +dnssec dlv.isc.org

dlv.isc.org.		3599	IN	RRSIG	DNSKEY 5 3 3600 20200325160456 20200224153150 19297 dlv.isc.org. TyUbbNgG/Oru7TQFHbDC9E208hB8Szheu634Q03nawQFz4dosOFg+ZB5 z8Svh8fw/g35a/ZW5AP1jbSKh19u4c7Ujre3iygS0Tjycmi0mYG6dS7I CcWLOxZpOKf8uw9mzgbIR/VDEFmKj0OJKdkxAqfaWxXLqBBWgFqIucC6 9Tb98clinCPW34xgk6Fzi+OKAFmiGH6/e8wk/h5RMWxipx5KAk2NsWsw QMyEDaA7eLzZTbBenftVR86g6QO4bR+LOKzxGBFQ2XW0ArQKDiuoBqEw 8cmRcGKzVJ761d7EK+LDvnktRNxRMJ9y5LPgxlO2Xm3Un8oExjVbLKi7 OigQnA==

20200325160456 was the key, that translated to 3/25/2020 16:04:56 UTC which is about when the issue started. Further down in the “References” section the ISC-USERS list confirmed this was by mistake. I suppose it was a good “scream” test to remove lookaside. Newer BIND versions do not even support this anymore.

What Happened?

On many older BIND servers deployed before 2017, they were configured with the following.

dnssec-lookaside auto;

Auto would try to query dlv.isc.org first and then query root name servers. The expected behavior was that it simply would not return any data and then the root zones would be queried.

Unfortunately with an expired RRSIG it failed in a way that made BIND think the query response was not valid and an expected failure. For all BIND knew, it was preventing a replay attack.

References

I had originally posted on Reddit and was also pointed to ISC-USERS.

Using Certificate Based Authentication

Summary

Recently a client had a need for putting a web application on the internet that end users could access. They wanted to lock it down so that not everyone on the internet could access. Whitelisting IP Address was not an option because they were remote users with dynamic IPs and the overhead of maintaining this whitelist would be problematic.

The use case was a password recovery tool that their remote users could use to reset and recover passwords. Simple authentication would not suffice. For starters if the users’ passwords expired they wouldn’t be able to easily log into the site. Along with that it would be a high profile target for brute forcing.

Why Not IP Whitelisting?

IP whitelisting used to be and still us for some organizations the de-facto method of filtering traffic. It can be very problematic though. Users today are on the go, working remotely or using their mobile device on cellular data as well as home internet. Other times it involves sales staff at client sites. Keeping up with these IP whitelists can be a chore. Updating this whitelist can be time sensitive to avoid halting productivity. When not maintained, there is a chance someone unexpected could gain access due to simply having an IP previously whitelisted.

A workaround for this is VPN but that requires a bit of support overhead in user training and support. This can be clunky for users that are not used to to using VPN.

Why Certificates

Many larger organizations already have internal Certificate Authorities in place. For Microsoft Active Directory deployments, when CA has been installed, end users are likely auto enrolling in user certificates. Domain joined workstations already have these and trust it the internal Root CA.

Certificates also have a built in expiration. In an auto enrollment environment, this expiration could be lowered substantially to below 1 year.

TLS Handshake

Once of the nice features of TLS is that it does include a mechanism for this. Below is an example of a TLS handshake where the server requests a certificate and the client provides it.

TLS Handshake - Certificate Authentication
TLS Handshake – Certificate Authentication

In Frame 19, the client makes the TLS request with a Client Hello. Frame 23 the Server response with a Server Hello. This is where they set parameters and negotiate things like TLS versions and encryption algorithms.

Frame 26 is part of the Server Hello but it was large and split up. Boxed in red is the “Certificate Request” where the server is requesting a certificate to authenticate.

Frame 33 is where the client actually provides it.

From here you can see this happens before the application level (HTTP) protocol communicates starting in frame 43. What this means is that before the user reaches the web application for authentication, the device requiring TLS Certificate Authentication is filtering the requests. Many times this is a reverse proxy or load balancer is not vulnerable to the same exploits as the web servers.

Browsers

When used properly and the client has a certificate, the browser will prompt users for a certificate to use such as pictured below.

Browser Certificate Authentication Prompt
Browser Certificate Authentication Prompt

Other Applications

A really neat application for this when you have a legacy plain text protocol in play but you want to open it up over the internet and secure similarly. Perhaps you have a legacy application that uses raw text and is not SSL/TLS capable. You can still put this on the internet through a reverse proxy like F5 LTM or stunnel.

Traditionally this type of traffic would be protected via IPSEC tunnel that is encrypted or a dedicated circuit such as MPLS. That does require specific hardware and/or monthly circuit costs to accommodate.

stunnel is extremely useful in this scenario as you can install it on the local machine that has the legacy application and configure it to connect to localhost on a random port and proxy information out over TLS and configure it to use the certificate based authentication.

Here is a graphical example of what that may look like with an stunnel server broken out. stunnel could be installed on the end user’s workstation though.

Legacy App Secured with TLS 1.2 or higher & Certificate Based Authentication
Legacy App Secured with TLS 1.2 or higher & Certificate Based Authentication

stunnel could be put on the local end user workstation to minimize that unencrypted leg. Typically on the server side the reverse proxy has a leg directly on the same VLAN/subnet as the application which minimizes exposure over that but this does help secure the application traffic over the untrusted internet.

Final Words

In this article we learned a little on Certificate Based Authentication. We also learned how it may help your organization better secure your applications and possibly avoid more costly solutions.