DNS is currently a “once it runs, never touch it again” infrastructure. This changes with the introduction of DNSSEC. Managing a DNSSEC signed zone involves a continuous effort of resigning zones and generating key material. Apart from that, DNS is a fundamental Internet protocol, thus the changes required to implement DNSSEC have an impact at many levels of the Internet infrastructure. In turn, DNSSEC is affected by many network elements. The result of this is that there are potentially some operational issues that might affect a DNSSEC signed zone.
Four categories of operational DNSSEC issues can be distinguished:
- Network related issues, such as firewall problems
- Trust issues, such as incorrect secure parent-to-child delegations
- Zone related issues, such as time/duration problems (TTL of a record vs. signature validity)
- DNSSEC choices, such as NSEC vs. NSEC3
Most of the known operational issues can be found by either monitoring actively (fully automated online monitoring) or by running an integrity check (non-realtime checking of zone integrity and sanity).
Some efforts for developing a DNSSEC monitoring solution have been made. These include SecSpider (a distributed polling system that crawls the Internet to monitor worldwide DNSSEC deployment), DNSCheck.se (a web-based zone checking tool for DNS zones that includes some support for DNSSEC) and some tool suites from NLnet Labs and SPARTA Inc. None of these efforts, however, comprehensively address most known operational issues nor do they provide active online monitoring solutions for organisations that deploy DNSSEC signed zones.
To address this gap in monitoring capabilities, SURFnet introduces a DNSSEC monitoring plugin which checks the known operational issues. This plugin can be used with nagios or as a standalone web application. The nagios based plugin can warn a DNS operator when something is wrong, where the standalone web application can be used to manually validate whether or not a zone is operating properly.
Our monitoring plugin solution, which is based on unbound (by NLnet Labs) and dnspython, consists of four basic tests that, together, check the operational issues that affect a DNSSEC signed zone.
DNSSEC uses public key cryptography to build a chain of trust between various parent and child name servers. The first test is a chain check, which checks and validates the complete chain of trust, starting at the secure entry point and ending with the signature of a signed record. It is important to keep monitoring this chain of trust, because the chain can change during key rollovers and when signatures are updated.
The DNS data portion of regular DNS UDP packets was before the introduction of DNSSEC limited to 512 bytes. This standard meant that if the data required to be in the response to a UDP requestdoes not fit in 512 bytes, a truncation flag bit is set in the response and the resolver must try again using TCP. TCP uses a substantially higher set up and tear down overhead and is therefore not preferred. Since the requests with DNSSEC became a lot larger, the limitation of 512 bytes did not hold anymore. Therefore DNS has been enhanced with new features (EDNS0), while maintaining compatibility with earlier versions of the protocol. Some non updated routers and/or firewalls do not support EDNS0 and/or block DNS UDP packets larger than 512 bytes. This is why our second test includes an EDNS0 check, which uses a binary search algorithm to determine the minimal packet size for DNSKEY records and in the course of this detects potential network maximum transmission unit problems caused by e.g. firewalls.
To keep DNSSEC fully secure a Next Secure (NSEC) record was defined to provide denial of existence, such that evildoers cannot return an NXDOMAIN for a record that actually does exist. This NSEC record, which links from existing name to existing name, will be returned by an authoritive server when a non-existing record is requested. The requested record should be between the verified records in the NSEC record, to make sure the requested record truly does not exist. Because this feature made it possible to enumerate a complete zone NSEC3 was introduced. NSEC3 does exactly the same, but includes a hashing over the names, such that it becomes impossible to get the domain names. Our third test is an NSEC(3) test, which checks whether a zone is using NSEC or NSEC3.
The final check is a TTL check, which verifies whether the TTL parameters used in the zone comply with the recommendations in RFC 4641bis. This includes:
- The TTL value of a RR has to be equal to the TTL of the RRSIG that belongs to this RR. The reason for this is that once a resolver has cached both values and either one of them times out earlier, a non matching RR or RRSIG could be downloaded from the authoritive server.
- When the maximum zone TTL is, e.g., equal to the signature validity period then all signatures will be cached until the signature expiration time. This can cause a high load on the authoritive servers, because all resolvers will, at the same time, request updates.
- Re-signing a zone shortly before the end of the signature validity period may cause simultaneous expiration of data from caches, which can again lead to peak load on authoritive servers.
- A validator should be able to complete validation before a record has expired, therefore the maximum zone TTL should not be smaller than 5 minutes.
- If a secondary authoritive server serves a DNSSEC zone and it is impossible to get updates from its primary authoritive server, it may happen that the signatures expire before the SOA expiration timer counts down to zero. However it is not possible to completely prevent this from happening, the effects can be minimized where the SOA expiration time is equal to or shorter than the signature validity period.
The SURFnet DNSSEC monitor can be live tested at http://www.dnssecmonitor.org/. It is also possible to download the source code and use the DNSSEC monitor within your own nagios environment. We encourage you to try it out and give us feedback!