ISC BIND Look Aside Related Outage

Summary

I had a fun issue today. All of a sudden BIND stopped returning results for recursive queries to external zones.

My logs were filled with lines like the following

Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3458067420: dlv.isc.org NSEC: verify failed due to bad signature (keyid=64263): RRSIG has expired
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3458067420: dlv.isc.org NSEC: no valid signature found
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]: error (network unreachable) resolving 'com.dlv.isc.org/DS/IN': 2001:500:2c::254#53
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450030b60: dlv.isc.org NSEC: verify failed due to bad signature (keyid=64263): RRSIG has expired
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450030b60: dlv.isc.org NSEC: no valid signature found
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450065e40: dlv.isc.org NSEC: verify failed due to bad signature (keyid=64263): RRSIG has expired
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450065e40: dlv.isc.org NSEC: no valid signature found
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]: error (no valid RRSIG) resolving 'com.dlv.isc.org/DS/IN': 149.20.64.4#53
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450065e40: dlv.isc.org NSEC: verify failed due to bad signature (keyid=64263): RRSIG has expired
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]:   validating @0x7f3450065e40: dlv.isc.org NSEC: no valid signature found
Mar 25 09:05:00 XXXXXXX named[XXXXXXX]: error (no valid RRSIG) resolving 'com.dlv.isc.org/DS/IN': 156.154.100.23#53

Troubleshooting

Naturally I tried bouncing named without luck. I then thought there was an issue with the root zones and configured forwarders without luck. I had disabled dnssec via “dnssec-enable no” without luck.

This seemed fairly strange. Ultimately since it was DNSSEC related I opted to disable it via as a temp workaround. It appears the validation was the issue.

dnssec-enable no;
dnssec-validation no;

After some investigation and troubleshooting it appeared to be related to ISC’s DLV and letting RRSIG expire accidentally. It failed in an unexpected manner when this happened.

What is DLV?

DLV stands for DNSSEC Lookaside Validation. DLV is a service that ISC has provided since circa 2006. It allowed DNSSEC to be enabled on zones that could not otherwise be enabled. Not all Top Level Domains (TLD) implemented DNSSEC until the past few years. This was a workaround to allow DNSSEC until then.

In 2017 it was finally decommissioned with DNSSEC being fully available to all TLDs. The A record was left in place and many resolvers still attempt to connect but it does not provide any data.

What is RRSIG?

If you want a full view of DNSSEC and how it works, CloudFlare has a great article for that here – https://www.cloudflare.com/dns/dnssec/how-dnssec-works/ . In short though, RRSIG records contain cryptographic details, particularly start and end dates for the validity of that data. This is much like an SSL Certificate that has a valid period.

The RRSIG records are designed to be required to be updated frequently to ensure the security much like SSL Certificates need to be renewed. This helps prevent a replay attack where an older compromised key is reused.

RRSIG Value

Running the following I could see it expired

# dig +dnssec dlv.isc.org

dlv.isc.org.		3599	IN	RRSIG	DNSKEY 5 3 3600 20200325160456 20200224153150 19297 dlv.isc.org. TyUbbNgG/Oru7TQFHbDC9E208hB8Szheu634Q03nawQFz4dosOFg+ZB5 z8Svh8fw/g35a/ZW5AP1jbSKh19u4c7Ujre3iygS0Tjycmi0mYG6dS7I CcWLOxZpOKf8uw9mzgbIR/VDEFmKj0OJKdkxAqfaWxXLqBBWgFqIucC6 9Tb98clinCPW34xgk6Fzi+OKAFmiGH6/e8wk/h5RMWxipx5KAk2NsWsw QMyEDaA7eLzZTbBenftVR86g6QO4bR+LOKzxGBFQ2XW0ArQKDiuoBqEw 8cmRcGKzVJ761d7EK+LDvnktRNxRMJ9y5LPgxlO2Xm3Un8oExjVbLKi7 OigQnA==

20200325160456 was the key, that translated to 3/25/2020 16:04:56 UTC which is about when the issue started. Further down in the “References” section the ISC-USERS list confirmed this was by mistake. I suppose it was a good “scream” test to remove lookaside. Newer BIND versions do not even support this anymore.

What Happened?

On many older BIND servers deployed before 2017, they were configured with the following.

dnssec-lookaside auto;

Auto would try to query dlv.isc.org first and then query root name servers. The expected behavior was that it simply would not return any data and then the root zones would be queried.

Unfortunately with an expired RRSIG it failed in a way that made BIND think the query response was not valid and an expected failure. For all BIND knew, it was preventing a replay attack.

References

I had originally posted on Reddit and was also pointed to ISC-USERS.