Introduction to DNS
The Domain Name System is a huge distributed eventually-consistent database
mapping names, like www.barrucadu.co.uk
, to numbers, like 116.203.34.201
.
It's federated, with trusted entities delegating control of segments of the DNS
namespace to others. It holds hundreds of millions of records, and updates to
this database are typically visible in minutes to hours.
And the protocol behind it is not massively different to when it was standardised in the 1980s in RFC 1034: Domain Names - Concepts and Facilities and RFC 1035: Domain Names - Implementation and Specification.
This page gives an overview of how all this works, covering:
- The DNS protocol
- How your browser gets from
www.barrucadu.co.uk
to an IP address - What a "zone" is
- The difference between authoritative, recursive, and forwarding nameservers
- What happens when you update a DNS record (there's no such thing as "propagation")
The DNS protocol
Let's start with an example (the +noedns
flag turns off some extensions to the
basic DNS protocol):
$ dig +noedns www.barrucadu.co.uk
; <<>> DiG 9.18.19 <<>> +noedns www.barrucadu.co.uk
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42090
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;www.barrucadu.co.uk. IN A
;; ANSWER SECTION:
www.barrucadu.co.uk. 300 IN CNAME barrucadu.co.uk.
barrucadu.co.uk. 300 IN A 116.203.34.201
;; Query time: 28 msec
;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP)
;; WHEN: Sat Oct 14 15:22:49 BST 2023
;; MSG SIZE rcvd: 67
I've used dig
a lot so I'm fairly used to reading this output, but I've since
realised I wasn't really reading it.
What does flags: qr rd ra
mean? What about AUTHORITY
and ADDITIONAL
?
Also, all the domain names there have a trailing dot. What's that about?
Time to dig into the protocol. RFC 1035 is our guide here.
Format of a DNS Message
DNS has two types of messages, queries and responses, and uses port 53. It prefers UDP but, if a message is too long to send in a single UDP datagram, it falls back to TCP.
A DNS message has five parts. These are:
-
A header, which specifies what sort of message this is and how many entries are in the other parts. This also has those flags we saw in the
dig
output. -
The "question section", which specifies what sort of records the client is interested in. In principle you can ask multiple questions in a single query, but in practice this isn't widely supported.
-
The "answer section", a collection of records directly answering the questions.
-
The "authority section", a series of
NS
records pointing to an authoritative source which can answer the questions. -
The "additional section", a series of records which may be useful when using records from the answer and authority sections. For example, the
A
records for any nameservers given in the authority section. This is often omitted to save space.
The answer, authority, and additional sections won't be present in a query. But the question section will be present in a response: it's copied over from the query.
The Header
The header is 12 bytes long and has a few different fields packed in there. RFC 1035 has some nice ASCII art illustrations:
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ID |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|QR| Opcode |AA|TC|RD|RA| Z | RCODE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QDCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ANCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| NSCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ARCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Where,
-
ID
is a 16-bit random identifier set by the client and copied into the response by the server. Since UDP is connectionless, the ID is essential for the client to know which response goes with which query. -
QR
indicates whether this is a query (0) or a response (1). -
OPCODE
is a four-bit field, set by the client and copied into the response by the server, indicating what type of query this message is. The most common opcode is 0, which is a "standard query". -
AA
("Authoritative Answer") is set by the server and means that this response is authoritative.More on authority in zones.
-
TC
("Truncation") is set by the server and means that the full response couldn't fit in a single UDP datagram, and so the client should try again using TCP. -
RD
("Recursion Desired") is set by the client, and copied into the response by the server, and means that they would like the server to answer the question recursively, if they can.More on recursive and non-recursive resolution in how resolution happens.
-
RA
("Recursion Available") is set by the server and means that it can perform recursive resolution, if requested. -
Z
is reserved for future use, and so should be set to zero if you don't implement those future standards. -
RCODE
is a four-bit field, set by the server, indicating what type of response this message is. There are a few common ones:- 0 means no error
- 1 means the server couldn't understand the query
- 2 means the server encountered an error processing the query
- 3 means the domain name in the query doesn't exist
- 4 means the server doesn't support this sort of query
- 5 means the server refused to answer the query
-
QDCOUNT
,ANCOUNT
,NSCOUNT
, andARCOUNT
are 16-bit integers specifying the number of entries in the question, answer, authority, and additional sections (respectively) of the message.
All the multi-byte fields in a DNS message are unsigned and big endian.
Domain Names
Before diving into the other sections, let's have a look at how domain names are encoded. They show up a lot, after all.
Let's take the domain name www.barrucadu.co.uk.
, and separate it by dots.
This gives us a sequence of labels:
www
barrucadu
co
uk
- (the empty label)
How you actually interpret those labels is a bit confused, unfortunately.
RFC 1035 says that they are sequences of arbitrary octets and that you can't assume any particular character encoding, but it also says that labels are to be compared case-insensitively.
RFC 4343 clarifies that that means octets in the range 0x41
to 0x5a
(the
upper case ASCII letters) are considered equal to corresponding octets in the
range 0x61
to 0x7a
(the lower case ASCII letters), and vice versa, but that
that still doesn't mean that labels are ASCII, as they can also contain
arbitrary non-ASCII octets.
But there's also RFC 3492, which defines the punycode standard for encoding internationalised, i.e. unicode, domain names into ASCII. So maybe domain names are ASCII after all?
There may well be a later RFC which resolves this ambiguity and says that labels are definitely ASCII, but I haven't seen it yet.
Anyway, back to the topic of encoding.
A label is encoded as a one-octet length field followed by the octets of the
label. And an encoded domain name is a sequence of encoded labels. This means
that a domain name ends with 0x00
, the length of the empty label (and also
makes encoded domain names work as null-terinated C strings, what a handy
coincidence).
So www.barrucadu.co.uk
is encoded as:
0x03 w w w 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00
There are two restrictions on domain names:
-
A single label may be no more than 63 octets long (not including the length octet)
-
An entire encoded domain name may be no more than 255 octets long (including the label length octets)
Compression
Domain names get repeated a lot in DNS messages, and the 512 bytes of a UDP datagram can start to feel pretty limiting. So DNS also has a compression mechanism, where some suffix of a domain name can be replaced with a pointer to an earlier occurrence of that name.
So if the name www.barrucadu.co.uk.
appears in a message twice, the
second occurrence could be represented as:
www.barrucadu.co.uk.
www.barrucadu.co.[pointer to 'uk.']
www.barrucadu.[pointer to 'co.uk.']
www.[pointer to 'barrucadu.co.uk.']
[pointer to 'www.barrucadu.co.uk.']
But how do you distinguish between a regular label and a pointer? Well, remember that a label can't be longer than 63 octets. And what's 63 as an 8-bit binary number?
It's 00111111
.
There's two whole bits there at the front which are completely wasted!
So pointers are encoded as the two-octet sequence 11[14-bit index into message]
.
Pretty clever.
Questions
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| |
/ QNAME /
/ /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QTYPE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QCLASS |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Where,
-
QNAME
is the domain name, which can be any length (so long as it's properly encoded), it's not padded to any specific size. -
QTYPE
is a 16-bit integer specifying the type of records the client is interested in. Which will usually be a record type (see the next subsection) or 255, meaning "all records". There are a few otherQTYPE
s but those are less common. -
QCLASS
is a 16-bit integer specifying which network class the client is interested in. These days this will always be 1, orIN
, for "internet".
As an aside, it feels kind of wasteful that we effectively throw away 16 whole bits for each question and record on the historical "class" artefact. UDP messages are short, so we compress domain names to squeeze out a little extra space, but then we waste a bunch like this! Even worse, there never were very many network classes: RFC 1035 only defines four. Did the IETF really expect there to be so many non-internet networks in the future?
We can now understand the question section of our dig
example!
;; QUESTION SECTION:
;www.barrucadu.co.uk. IN A
Means that it's looking for an internet address record for
www.barrucadu.co.uk.
(yes, it shows the type and class the other
way around). That question is encoded as:
0x03 w w w 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 ; qname: www.barrucadu.co.uk.
0x00 0x01 ; qtype: A
0x00 0x01 ; qclass: IN
Resource Records
The answer, authority, and additional sections are all a sequence of resource records:
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| |
/ /
/ NAME /
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| TYPE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| CLASS |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| TTL |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| RDLENGTH |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
/ RDATA /
/ /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
Where,
-
NAME
is the domain name, which is variable-length like theQNAME
of a question. -
TYPE
is a 16-bit integer specifying what sort of record this is. There are a fair few of these, but some common ones are:- 1, an
A
record - 2, a
NS
record - 5, a
CNAME
record - 28, a
AAAA
record (from RFC 3596) - and plenty others
- 1, an
-
CLASS
is a 16-bit integer specifying what network class this record applies to. Like theQCLASS
, these days this will always be 1. Unless you're specifically running some sort of old non-IP-based network for fun. -
TTL
is a 32-bit integer specifying the number of seconds that this record is valid for. This is important for caching purposes. Zero has a special meaning: it means that you can use the record to do whatever it is you're doing right now, but that you can't cache it at all. -
RDLENGTH
is a 16-bit integer specifying the length of theRDATA
section. -
RDATA
is the record data, which is type- and class-specific. For example:IN A
records hold an IPv4 address, as a 32-bit numberIN NS
andIN CNAME
records hold a domain nameIN AAAA
records hold an IPv6 address, as a 128-bit number
Returning to our dig
example, we had two different resource records in the
response:
;; ANSWER SECTION:
www.barrucadu.co.uk. 300 IN CNAME barrucadu.co.uk.
barrucadu.co.uk. 300 IN A 116.203.34.201
We have one IN CNAME
record for www.barrucadu.co.uk.
and one IN A
record
for barrucadu.co.uk.
. This is because, upon encountering a CNAME
,
resolution starts again with whatever name the CNAME
refers to (unless the
query was for, say IN CNAME www.barrucadu.co.uk
- more on this in how
resolution happens).
Leaving out the name compression for simplicity, those records are encoded as:
0x03 w w w 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 ; name: www.barrucadu.co.uk.
0x00 0x05 ; type: CNAME
0x00 0x01 ; class: IN
0x00 0x00 0x01 0x2c ; ttl: 300
0x00 0x11 ; rdlength: 17
0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 ; rdata: barrucadu.co.uk.
0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 ; name: barrucadu.co.uk.
0x00 0x01 ; type: A
0x00 0x01 ; class: IN
0x00 0x00 0x01 0x2c ; ttl: 300
0x00 0x04 ; rdlength: 4
0x74 0xcb 0x22 0xc9 ; rdata: 116.203.34.201
Example DNS query & response
Returning to our dig +noedns www.barrucadu.co.uk
example from the beginning,
we can now see the whole encoded query and response. I've included comments and
linebreaks to make it clear what's what.
Here's the query:
;; header
0xa4 0x6a ; ID: 46676
0x01 0x00 ; flags: RD
0x00 0x01 ; QDCOUNT: 1
0x00 0x00 ; ANCOUNT: 0
0x00 0x00 ; NSCOUNT: 0
0x00 0x00 ; ARCOUNT: 0
;; question section
; www.barrucadu.co.uk. A IN
0x03 w w w 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 0x00 0x01 0x00 0x01
And here's the response (omitting compression):
;; header
0xa4 0x6a ; ID: 46676
0x81 0x80 ; flags: QR, RD, RA
0x00 0x01 ; QDCOUNT: 1
0x00 0x02 ; ANCOUNT: 2
0x00 0x00 ; NSCOUNT: 0
0x00 0x00 ; ARCOUNT: 0
;; question section
; www.barrucadu.co.uk. A IN
0x03 w w w 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 0x00 0x01 0x00 0x01
;; answer section
; www.barrucadu.co.uk. CNAME IN 300 barrucadu.co.uk.
0x03 w w w 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 0x00 0x05 0x00 0x01 0x00 0x00 0x01 0x2c 0x00 0x11 0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00
; barrucadu.co.uk. A IN 300 116.203.34.201
0x09 b a r r u c a d u 0x02 c o 0x02 u k 0x00 0x00 0x01 0x00 0x01 0x00 0x00 0x01 0x2c 0x00 0x04 0x74 0xcb 0x22 0xc9
And that's that!
The DNS protocol isn't very complicated. But it is somewhat fiddly, what with
each record type having its own RDATA
format, and the domain name compression.
How resolution happens
When we ran dig +noedns www.barrucadu.co.uk
in the previous section, we got an
answer. We found the IP address which www.barrucadu.co.uk.
refers to.
But how?
Well, dig
tells us that it talked to some server at 1.1.1.1. But how did
that server know? Does it have a copy of the entire DNS? Unlikely, since
there are hundreds of millions of records in use.
The answer is that the server followed a process called recursive resolution. This is described in section 5.3.3 of RFC 1034:
- See if we already know the answer (e.g. the relevant records are already cached), and return it to the client if so
- Figure out the best nameservers to ask
- Send them queries until one responds
- Analyse the response:
- If the response answers the question, cache it and return it to the client
- If the response gives some better nameservers to use, cache them and go back to step 2
- If the response gives a CNAME, and this is not the answer, cache the CNAME record and start again with the new name
- If the response is an error or doesn't make sense, go back to step 3
On the face of it this looks pretty straightforward, but on closer inspection that step 2 is doing a lot of work: how exactly do we "figure out the best nameservers to ask"? (That step 1 is also doing a surprising amount of work if your nameserver supports authoritative zones (see next section) - for the full details, see section 4.3.2 of RFC 1034).
Well, step 4.b gives us a clue here: if the response gives some better nameservers to use, cache them and go back to step 2. So we don't need to pick the correct nameservers at the very beginning. We only need to know about a nameserver which will be able to point us to a nameserver which knows that (or is closer to knowing that).
There are thirteen nameservers which, transitively, know about every domain name. These are the root nameservers, and they're where recursive resolution starts.
You can find them at a.root-servers.net.
through m.root-servers.net.
So you just point your recursive resolver at, say, j.root-servers.net.
and... oh wait, we have a chicken-and-egg problem. Ultimately, you need to know
their IP addresses. IANA, the Internet Assigned Numbers Authority, provides the
"root hints" file, which has the IPv4 and IPv6 addresses of these root
nameservers.
How do you download that file if you don't have DNS working to resolve
www.iana.org.
? Look, you just need IP addresses to get DNS and DNS to get IP
addresses. Use 1.1.1.1 or something while you get your fancy recursive resolver
working.
Alright, let's resolve www.barrucadu.co.uk.
recursively! Starting with:
$ dig www.barrucadu.co.uk @j.root-servers.net
; <<>> DiG 9.18.19 <<>> www.barrucadu.co.uk @j.root-servers.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 52538
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 8, ADDITIONAL: 17
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1472
;; QUESTION SECTION:
;www.barrucadu.co.uk. IN A
;; AUTHORITY SECTION:
uk. 172800 IN NS nsa.nic.uk.
uk. 172800 IN NS nsb.nic.uk.
uk. 172800 IN NS nsc.nic.uk.
uk. 172800 IN NS nsd.nic.uk.
uk. 172800 IN NS dns1.nic.uk.
uk. 172800 IN NS dns2.nic.uk.
uk. 172800 IN NS dns3.nic.uk.
uk. 172800 IN NS dns4.nic.uk.
;; ADDITIONAL SECTION:
nsa.nic.uk. 172800 IN A 156.154.100.3
nsb.nic.uk. 172800 IN A 156.154.101.3
nsc.nic.uk. 172800 IN A 156.154.102.3
nsd.nic.uk. 172800 IN A 156.154.103.3
dns1.nic.uk. 172800 IN A 213.248.216.1
dns2.nic.uk. 172800 IN A 103.49.80.1
dns3.nic.uk. 172800 IN A 213.248.220.1
dns4.nic.uk. 172800 IN A 43.230.48.1
nsa.nic.uk. 172800 IN AAAA 2001:502:ad09::3
nsb.nic.uk. 172800 IN AAAA 2001:502:2eda::3
nsc.nic.uk. 172800 IN AAAA 2610:a1:1009::3
nsd.nic.uk. 172800 IN AAAA 2610:a1:1010::3
dns1.nic.uk. 172800 IN AAAA 2a01:618:400::1
dns2.nic.uk. 172800 IN AAAA 2401:fd80:400::1
dns3.nic.uk. 172800 IN AAAA 2a01:618:404::1
dns4.nic.uk. 172800 IN AAAA 2401:fd80:404::1
;; Query time: 12 msec
;; SERVER: 192.58.128.30#53(j.root-servers.net) (UDP)
;; WHEN: Sat Oct 14 15:26:42 BST 2023
;; MSG SIZE rcvd: 552
Alright, we now know the names and IP addresses of the uk.
nameservers.
Thanks, additional section!
On we go:
$ dig www.barrucadu.co.uk @156.154.100.3
; <<>> DiG 9.18.19 <<>> www.barrucadu.co.uk @156.154.100.3
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 32107
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: e393a7276308963401000000652aa5442e48e9fbe6ee0c99 (good)
;; QUESTION SECTION:
;www.barrucadu.co.uk. IN A
;; AUTHORITY SECTION:
barrucadu.co.uk. 172800 IN NS ns-1828.awsdns-36.co.uk.
barrucadu.co.uk. 172800 IN NS ns-1520.awsdns-62.org.
barrucadu.co.uk. 172800 IN NS ns-98.awsdns-12.com.
barrucadu.co.uk. 172800 IN NS ns-763.awsdns-31.net.
;; Query time: 12 msec
;; SERVER: 156.154.100.3#53(156.154.100.3) (UDP)
;; WHEN: Sat Oct 14 15:27:16 BST 2023
;; MSG SIZE rcvd: 215
No additional section here, so we'll need to resolve one of those nameservers.
However, if we pick ns-1828.awsdns-36.co.uk.
we can just ask the uk.
nameservers again rather than going back to the root:
$ dig ns-1828.awsdns-36.co.uk. @156.154.101.3
; <<>> DiG 9.18.19 <<>> ns-1828.awsdns-36.co.uk. @156.154.101.3
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1223
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 9
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 3976ec588720dc3001000000652aa5b0bad98873c996aaa1 (good)
;; QUESTION SECTION:
;ns-1828.awsdns-36.co.uk. IN A
;; AUTHORITY SECTION:
awsdns-36.co.uk. 172800 IN NS g-ns-356.awsdns-36.co.uk.
awsdns-36.co.uk. 172800 IN NS g-ns-1511.awsdns-36.co.uk.
awsdns-36.co.uk. 172800 IN NS g-ns-932.awsdns-36.co.uk.
awsdns-36.co.uk. 172800 IN NS g-ns-1832.awsdns-36.co.uk.
;; ADDITIONAL SECTION:
g-ns-1832.awsdns-36.co.uk. 172800 IN A 205.251.199.40
g-ns-1511.awsdns-36.co.uk. 172800 IN A 205.251.197.231
g-ns-932.awsdns-36.co.uk. 172800 IN A 205.251.195.164
g-ns-356.awsdns-36.co.uk. 172800 IN A 205.251.193.100
g-ns-1832.awsdns-36.co.uk. 172800 IN AAAA 2600:9000:5307:2800::1
g-ns-1511.awsdns-36.co.uk. 172800 IN AAAA 2600:9000:5305:e700::1
g-ns-932.awsdns-36.co.uk. 172800 IN AAAA 2600:9000:5303:a400::1
g-ns-356.awsdns-36.co.uk. 172800 IN AAAA 2600:9000:5301:6400::1
;; Query time: 10 msec
;; SERVER: 156.154.101.3#53(156.154.101.3) (UDP)
;; WHEN: Sat Oct 14 15:29:04 BST 2023
;; MSG SIZE rcvd: 350
Getting there, we've now got down to the AWS DNS nameservers. Next!
$ dig ns-1828.awsdns-36.co.uk. @205.251.199.40
; <<>> DiG 9.18.19 <<>> ns-1828.awsdns-36.co.uk. @205.251.199.40
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8253
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 9
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;ns-1828.awsdns-36.co.uk. IN A
;; ANSWER SECTION:
ns-1828.awsdns-36.co.uk. 172800 IN A 205.251.199.36
;; AUTHORITY SECTION:
awsdns-36.co.uk. 172800 IN NS g-ns-1511.awsdns-36.co.uk.
awsdns-36.co.uk. 172800 IN NS g-ns-1832.awsdns-36.co.uk.
awsdns-36.co.uk. 172800 IN NS g-ns-356.awsdns-36.co.uk.
awsdns-36.co.uk. 172800 IN NS g-ns-932.awsdns-36.co.uk.
;; ADDITIONAL SECTION:
g-ns-1511.awsdns-36.co.uk. 172800 IN A 205.251.197.231
g-ns-1511.awsdns-36.co.uk. 172800 IN AAAA 2600:9000:5305:e700::1
g-ns-1832.awsdns-36.co.uk. 172800 IN A 205.251.199.40
g-ns-1832.awsdns-36.co.uk. 172800 IN AAAA 2600:9000:5307:2800::1
g-ns-356.awsdns-36.co.uk. 172800 IN A 205.251.193.100
g-ns-356.awsdns-36.co.uk. 172800 IN AAAA 2600:9000:5301:6400::1
g-ns-932.awsdns-36.co.uk. 172800 IN A 205.251.195.164
g-ns-932.awsdns-36.co.uk. 172800 IN AAAA 2600:9000:5303:a400::1
;; Query time: 12 msec
;; SERVER: 205.251.199.40#53(205.251.199.40) (UDP)
;; WHEN: Sat Oct 14 15:30:26 BST 2023
;; MSG SIZE rcvd: 338
We've got an IP address for ns-1828.awsdns-36.co.uk.
! Now we can answer our
original question:
$ dig www.barrucadu.co.uk @205.251.199.36
; <<>> DiG 9.18.19 <<>> www.barrucadu.co.uk @205.251.199.36
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62416
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 4, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.barrucadu.co.uk. IN A
;; ANSWER SECTION:
www.barrucadu.co.uk. 300 IN CNAME barrucadu.co.uk.
barrucadu.co.uk. 300 IN A 116.203.34.201
;; AUTHORITY SECTION:
barrucadu.co.uk. 172800 IN NS ns-1520.awsdns-62.org.
barrucadu.co.uk. 172800 IN NS ns-1828.awsdns-36.co.uk.
barrucadu.co.uk. 172800 IN NS ns-763.awsdns-31.net.
barrucadu.co.uk. 172800 IN NS ns-98.awsdns-12.com.
;; Query time: 12 msec
;; SERVER: 205.251.199.36#53(205.251.199.36) (UDP)
;; WHEN: Sat Oct 14 15:31:20 BST 2023
;; MSG SIZE rcvd: 212
And we're done, after 5 requests to other nameservers. But in practice, a recursive resolver would likely have some of those already cached and wouldn't need to fetch them again.
Zones
In the previous section, it looked very much like the DNS was broken up into
subtrees (or "zones", if you will) based on the label structure: the .
nameservers knew about the uk.
nameservers, but couldn't answer queries about
subdomains of those directly; and similarly, the uk.
nameservers knew about
the nameservers for barrucadu.co.uk.
, but not any of its other records.
This makes sense. Imagine if the root nameservers knew every DNS record! Their databases would be huge! It would be infeasible to run a handful of servers which know hundreds of millions of records and which the whole world uses.
So .
is a zone. And uk.
is a zone. And barrucadu.co.uk.
is a zone. All
of the TLDs are zones, and every domain you can buy creates a new zone. A zone
can be bigger than a single label, e.g. foo.bar.baz.barrucadu.co.uk.
is in
the barrucadu.co.uk.
zone unless I delegate it to someone else, by creating
some NS
records for, say, baz.barrucadu.co.uk.
That's exactly how registering a domain name works, by the way. The registrars
have privileged access to the TLD nameservers, and you pay them some money for
them to send a message to the nameservers saying "please delegate barrucadu
to
these other nameservers".
Zones are traditionally represented in a textual format defined in RFC 1035.
You've seen this format before: it's the format dig
gives its responses in and
it's the format of the root hints file.
Here's the zone file I use for my LAN DNS (which is served by resolved
):
$ORIGIN lan.
@ 300 IN SOA @ @ 6 300 300 300 300
router 300 IN A 10.0.0.1
nyarlathotep 300 IN A 10.0.0.3
*.nyarlathotep 300 IN CNAME nyarlathotep
help 300 IN CNAME nyarlathotep
*.help 300 IN CNAME help
nas 300 IN CNAME nyarlathotep
bedroom.awair 300 IN A 10.0.20.187
living-room.awair 300 IN A 10.0.20.117
It's a list of records, but note that they all use relative domain names (no dot
at the end). I could write them as absolute domain names, but that would be
repetitive, and who doesn't want to golf their zone files? The $ORIGIN
line
at the top is used to complete any relative names, and the @
is an alias for
the origin, so this zone file could also be written as:
lan. 300 IN SOA lan. lan. 6 300 300 300 300
router.lan. 300 IN A 10.0.0.1
nyarlathotep.lan. 300 IN A 10.0.0.3
*.nyarlathotep.lan. 300 IN CNAME nyarlathotep.lan.
help.lan. 300 IN CNAME nyarlathotep.lan.
*.help.lan. 300 IN CNAME help.lan.
nas.lan. 300 IN CNAME nyarlathotep.lan.
bedroom.awair.lan. 300 IN A 10.0.20.187
living-room.awair.lan. 300 IN A 10.0.20.117
Zones come in two types: authoritative (also just called a zone, or a master zone) and non-authoritative (also called hints). An authoritative zone has a SOA record, and causes the nameserver to give authoritative responses to questions which fall into that zone (see the next section).
Non-authoritative zones are primarily useful as a sort of permanent cache. Take
the root hints file for example: all recursive resolvers need to know the NS
records for .
. But they should not act as if they're authoritative for .
,
they just know a little bit about it.
Since any nameserver could claim to be authoritative for any zone it wants, and
I'm sure malicious nameservers often do try to claim ownership of big sites like
google.com.
, how does the DNS work?
It works on trust.
You trust that the root nameservers will give you the correct nameservers for all the TLDs. You then, in turn, trust that the TLD nameservers will give the correct nameservers for the domains registered under those TLDs. And so on, all the way down to the domain you actually want to resolve.
Not every nameserver operator will be equally trustworthy or competent, so that trust does erode somewhat as you move further and further away from the root, but if you do some basic validation of DNS responses (e.g. validating that the answers you get are for domains you know to be delegated to this nameserver), you can do pretty well.
DNS doesn't "propagate"
When I first got into web development, the common wisdom was that DNS changes took 24 to 48 hours to "propagate". But having seen some details of the DNS protocol and how recursive resolution works, does that really make sense? Shouldn't changes be visible as soon as the TTL of the old record expires? And shouldn't new records be visible immediately? Why do changes need to propagate? Where do they propagate to?
Propagation implies a push model, where you make your changes and then they get sent to the resolvers which need them. But that's not what happens at all: instead, caches expire.
Ok, there are two cases in which DNS does propagate:
- If you update your domain's NS records, your registrar needs to push those changes to the TLD nameservers. Apparently this used to be kind of slow, like, 20+ years ago. These days it's very fast.
- If you run a very high traffic authoritative nameserver, you'll operate multiple instances of it around the world to improve reliability and latency. So if you change a record, that change needs to be pushed out to all your servers. But this should take under a minute unless something is very wrong.
My hunch is that this 24 to 48 hour window came from:
- Registrars being slow to update the TLD nameservers once upon a time
- ISPs running notoriously poorly-behaving nameservers
Ah, ISP DNS. Almost the first thing any self-respecting nerd changes when setting up a new home network. They often do nefarious things like redirect misspelled domain names to ad-covered search pages, trying to profit off your typos. And, as it turns out, a lot of them ignore TTLs, and will cache something for a long period if they feel like it.
How long? Well, I've seen reports of 24 hours...
Well, no matter what the cause of the occasional slow DNS update is (though I can't say I've experienced slow DNS updates in a very long time, and updates are evidently fast enough for changing an A record to be considered a viable failover mechanism for big sites) "propagation" is the wrong mental model.
DNS is pull, not push.