Detecting DNS Tunneling/Exfiltration with Elastic machine learning

Patrick Cosic

21. September 2020

Reading time: 7 min

Detecting DNS Tunneling/Exfiltration with Elastic machine learning

If you have never heard of DNS tunneling/exfiltration, or even had contact with it in your enterprise network and are wondering how to detect it, you have probably come to the right place. In this blog post, we explain what exactly is DNS exfiltration or tunneling attack, as well as why it has previously been difficult to detect it and how we can successfully use the built-in machine learning capabilities of Elasticsearch.

The geometry of the DNS Tunneling/Exfiltration attack

The DNS tunneling aka. DNS exfiltration attack is characterized by sending encrypted data hidden in DNS queries to the DNS server of the attacker. In the following diagram, you can see that the hijacked host sends a small amount of payload in fragments, encrypted in a subdomain, as a query to the attacker. In reality, this can be thousands or even millions of queries. On the other hand, the attacker collects these queries and decrypts and reassembles them.

This attack usually stays unnoticed for the following reasons:

DNS traffic is rarely blocked to the outside world, for virtually all applications.
DNS generates a lot of traffic, if firewalls would have DNS rules, they should check every packet intensively, which is very impractical without a big-data platform. This is why the traffic is often not monitored at all, making it a perfect breeding ground for malicious behavior
The method of sending the data via tunneling or exfiltration is very inefficient and slow, so it can take weeks to complete. This again increases the chance that it will not be detected in the enterprise network.

All the time we have mixed the terms tunneling and exfiltration, in principle tunneling differs only in the fact that the attacker’s DNS server also sends a response, thus creating the possibility to use other protocols in the name of the DNS protocol like ssh or ftp. The Machine Learning Job detects both.

Architecture and configuration of the DNS attack detection

Next, we show the detection method. The Machine Learning Job is one of the Out-of-the-Box ML Jobs from elasticsearchs SIEM App. For better results we have adapted and tuned it. To reduce false positives and minimize noise, we have built custom rules into the anomaly detection, furthermore, we make a more precise analysis possible by separating the subdomain from the second level domain with an ingest – pipeline, instead of using the whole domain. Later I will go into more detail about the topics here.

Ingest DNS Traffic in Elasticsearch

First, we should ingest DNS traffic data into elasticsearch and monitor it somehow, for this case we used Packetbeat (Isa, Link zu Packebeat), to be able to analyze packet data transmitted.

If you are planning to use other data sources make sure to use at least these ECS fields:

destination.ip
dns.question.registered_domain
host.name
dns.question.name
event.dataset
agent.type

At this point, it is important to configure two things in the packetbeat.yml file: First, the protocol we want to monitor, in that case, it is DNS on port 53, and add authorities and ad-ons for additional information.

Second, the ingest – pipeline that is used to enrich the data with the subdomain and geo information. This is important to specify, otherwise no indexes are created.

Anomaly detection in Elasticsearch

With that data stored in Elasticsearch we have created the fundamentals to realize a machine learning job. Because it would go beyond the scope of this blog-post we only show you parts of the machine learning config, but still enough to fully understand it.

The Machine Learning job is configured as a population analysis with a high_info_content function. If you use this function in a detector in your anomaly detection job, it models information that is present in the subdomain string. It detects anomalies where the information content is unusual compared to the other highest values of “highest_registered_domain“(dns.question.name.etld_plus_one). There are three main information function:

info_content: detect high or low drops in information in a content.
low_info_content: detect low drops in information in a content.
high_info_content: detect only unusually high amounts of information in a content.

Since we are most interested in high amounts of information in subdomains, we will use the high_info_content as a function in the Machine Learning Job. Shown in the picture above.

Now, to test if everything works and detects correctly, I ran a script on one of the hosts. The query looks similar and imitates a DNS data exfiltration. In the screenshot above you can obviously see the encrypted (base64) subdomain.

We now see the Anomaly Explorer in the screenshot below, which is basically the dashboard of a machine learning job. In the center of the image, in the Anomaly Timeline, we can capture a colored heatmap, such as red, yellow, or light blue.

In the upper left corner, we can also see the top influencers, we can see which IP addresses might be involved, or domains that we can determine by just looking at them that they might be indicating an anomaly. But what is most important for further analysis is to know which host the anomaly was detected. We can now see that an anomaly has been detected in our selected time period that has been rated 100 times higher in order to start a further analysis and to check if it is actually a DNS attack or false positive, as in the SIEM app by clicking on Host Details.

Now we have filtered all events for the period of the found anomaly on the domain. And see all the relevant packages. You can clearly see that a cryptic subdomain has been used here, which indicates that this is a serious threat and should be taken into consideration.

Another important difference to tunneling is that no response is needed, it even works with an NXDOMAIN type because only queries are sent.

How we beat background noise

One of the most common challenges is to display only the important anomalies or as few false positives as possible. We used the top 100 domains of Majestic million and configured a custom rule to exclude possible false positives from these highly trusted websites. The filter is called safe_domains, and can be extended or created after the creation of the Machine Learning Job if necessary. So if in the future another website is detected as false positive, you can prevent the problem for the explicit domain with just a few clicks.

As we see in the screenshot above, we also use a rule that skipped all scores below 350, so we also reduce the amount of noise here.

Summary

As we could see, with a few clicks and a bit of json, we were able to create a Machine Learning Job that gave us a whole new insight into our DNS traffic, without understanding much of the algorithms or the math behind it. We are able to automatically detect DNS anomalies in our network, and take early action. And by the way, this is just the beginning, we’ll be able to build more dashboards based on the data we already have and speed up the process. We’re also able to implement watchers that automatically notify us e.g. by email if something is wrong with our network.