Note: this is a very abstract introduction to domain name system so don’t use it as a “resource” to study.
Whenever we type https://my.site.com the browser doesn’t know where to go, because to visit a website you have to know the IP address the site lives on. It’s pretty similar to how we type names in our mobile phones and call people even though we don’t really remember their phone numbers (you’d have to be Mike Ross to remember everyone’s phone number).
That’s why the browser queries for IP address to a server and that server is what we call an DNS server. The flow is simple:
We request the DNS server for the IP of our target website -> the DNS server gives a response -> done. Right? It’s not incorrect but for the sake of learning we will assume the case in which our DNS doesn’t know the IP of the target domain.
Note: DNS servers only use domain name, they don’t care about protocols.
From the Beginning
When we type my.site.com, first and foremost our system checks its
stub resolver [an operating system component that performs DNS name resolutions for applications on the machine, also called as DNS client].
It’ll search its own cache memory [if we ever visited the site previously the IP will be stored in the cache]. Again, for the sake of learning we are assuming that everything is going wrong to understand the concept well.
Suppose we never visited this website before then what? In that scenario the stub resolver will search its own DNS server instead! Our machine has its own network configurations along with the IP address and a DNS server we can talk to. Yes, every operating system has a default DNS server through which our operating system tries to resolve DNS queries. Most of you can already assume that it’s google, they have the IP address of 8.8.8.8 [almost all the machines have this DNS server as its default].
Our stub resolver will send a DNS query at 8.8.8.8 and ask the IP address
of my.site.com. On this stage almost all the DNS queries can be
resolved, I mean it’s Google! But let’s say we are trying to visit some
very fishy website that no one has ever visited before through Google. Anyways, if the DNS server of Google doesn’t have it then who does?
The DNS server of Google is a recursive DNS server which can perform
recursive DNS lookups. The DNS server itself queries other DNS servers and
if they are recursive servers as well then they go further down the tree
and eventually they return the IP the client was looking for. It’s basically
a situation of “I don’t know this but I know a guy who knows a guy who knows a guy who maybe knows the IP for this query”.
In a scenario that anyone anywhere visited this site through google then the IP can definitely be found into the DNS server because the DNS servers also have their own cache. (we are assuming that we didn’t find our IP here as well and our recursive DNS lookup failed).
DNS Hierarchy & Root Servers
On top of the DNS hierarchy, we have root servers.
There is a total of 13 organizations with 13 server groups and a few thousand servers
spread across the planet. Any domain, no matter how deep or hidden it is,
can always be found in these servers, but the root servers don’t resolve
the domains they only function on top level domains [TLDs]. .com, .net, .co
Are all examples of top level domain. These servers don’t care about DNS
resolving. They have a list of DNS that handles the specific top level
domain we have queried. “my.site.com” the root servers identify
“.com” in our query and send a portion of database that includes the DNS
servers IP address who have “.com” domain. The root server points the
client towards the DNS servers who contain the info about the TLD present
in our request. We call these servers top level domain servers.
Now, after getting the IP address of the .com TLD DNS server, 8.8.8.8 sends a query to that TLD server. But does it know the IP address we are actually looking for? Remember that our website name is my.site.com.
The TLD server has NS records for second level domains [SLDs] under .com,
such as google.com, portflex.com, amazon.com, in our case it’s site.com.
These records point to the authoritative name servers responsible for
those domains.
Finally, Google DNS queries the authoritative name servers for the site.com zone, which return DNS records for my.site.com (such as A, AAAA, CNAME, or NXDOMAIN), after which the result is cached and returned to the stub resolver. But what’s my. in our my.site.com?
Note: The authoritative servers don’t send IP addresses all the time. Sometimes it can be a CNAME, A record, NXDOMAIN, or something else.
Subdomains
This portion is called a subdomain. It’s similar to a directory within
a larger domain. You can create multiple subdomains over the second-level
domain, and each subdomain can have different content and point users to
different servers. For example: blog.my.site.com can host a blog,
shop.my.site.com can be for shopping products, portfolio.my.site.com can
be self portfolio etc.
You can create as many subdomains as you want and each can point to a different IP address or even be hosted on a completely different server. Companies use subdomains to separate different parts of their website or even for staging/testing environments like dev.my.site.com
DNS Security and Encryption
By the way, this whole process is highly insecure by default. Our
system’s stub resolver sends requests over UDP port 53 without encryption.
This means anyone can sniff the traffic and see which domains you query!
We could even intercept and block the response from 8.8.8.8
sends and manipulate it such that we send another IP address of another
website and eventually hack the user. It’s called DNS spoofing. Also,
similarly like malicious users our ISPs can also see the queries our stub
resolver sends. So how do we solve this issue? There is
something called DOH [DNS over https]. We know https is secure version of
http that lets us see websites in secure manner and hackers can’t see what
websites we are visiting. This is, DNS.