Categories
Blog

Machine Clicks

Target Audience: Email marketers Estimated read time: 20 minutes

Like opens, clicks are an important metric of engagement, in this article we learn specifically about machine clicks in email.

What are Machine Clicks?

Usually we expect that all clicks registered on URLs in Emails are from end-user interaction. Security applications installed on the end recipient’s system, or on the recipient’s receiving email platform. However, these servers can visit links from a recipient’s email. This causes the sending platform to count these “machine” clicks in addition to the clicks made by an actual recipient.

Usually mail systems count 2 sorts of click statistics. Unique clicks per link and total clicks per link. Unique clicks statistics are fairly accurate. However, Total clicks per link statistics (a recipient clicks on a link multiple times) are usually skewed by machine clicks.

How to track clicks in an email

This is from a source message with links going directly to a specific website. The platform will identify these links, and redirect them through the system used to send the corresponding message. The web server will receive details and parameters from this link. This allows the platform to identify the system, the message, the recipient and the actual clicked link. On top of that, the basic details of the browser used to visit the link. The clicker’s source IP address, and the date and time of the click event. The statistics database stores all of these details. A web server redirection is fed back to the receipient, whom receives the original corresponding link as a web server redirection.

All this happens in a fraction of a second, and is globally transparent to the recipient.

What does a click look like technically?

Filtering out machine clicks is extremely difficult. Once a link is clicked, technical information is fed back. Such as:

example.com:80 98.51.100.36 - - [03/Mar/2019:22:06:33 -0100] "GET /redirect?link=kYhjd36684679Gopkfdgjhf_htwVylYcl
l=D0I3r8wfwlIrtfku43I0IEk3felfqy&s=RBMKHJZJHECCFGRAFOF HTTP/1.1" 302 - "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/536.39 (KHTML, like Gecko) Chrome/64.0.3386.130 Safari/536.39 Edge/18.17633"

Further, this can be split down to the following useful pieces of information:

  • example.com:80: This request to the system is on port 80 using the delegated domain called example.com
  • 98.51.100.36: The public IP address of the clicker
  • [03/Mar/2019:22:06:33 -0100] : The time of the actual call
  • GET redirect?link=kYhjd36684679Gopkfdgjhf_htwVylYcl HTTP/1.1: The URL called on the example.com domain using the protocol HTTP 1.1
  • Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/536.39 (KHTML, like Gecko) Chrome/64.0.3386.130 Safari/536.39 Edge/18.17633: The browser identification string.

All this looks normal. The IP address is legitimate (though anonymized in this example). The link redirection values l and s are correct. The browser identification string, which is free text sent voluntarily by the clicker and not under the ESPs control, states that the request came from a legitimate end user browser. 

machine clicks, a robot playing a piano
Photo by Franck V. on Unsplash

Why does the system not filter out these clicks?

Comparing this request with another:

example.com:80 98.51.100.36 - - [03/Mar/2019:22:06:33 -0100] "GET /redirect?link=MVhld34628972JprnGmWsOq_jepgownrk HTTP/1.1" 302 - "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/536.39 (KHTML, like Gecko) Chrome/64.0.3386.130 Safari/536.39 Edge/18.17633"
example.com:80 34.232.127.0 - - [03/Mar/2019:22:08:15 -0100] "GET /redirect?link=MVhld34628972JprnGmWsOq_jepgownrk HTTP/1.1" 302 - "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/18.17763"

The first link is our example link. The second one is an anonymised real machine click. Can we spot the difference?

  • The target domain is correct.
  • The public IP address of the clicker is valid
  • The time stamp is correct
  • The URL’s are identical
  • The browser identification strings are slightly different (different browser versions) but shows that they are from a legitimate recipient browser.
  • Both clicks came about 2 minutes apart.

There is no immediately detectable difference between these two clicks.

The only real difference is the public IP address of the clicker. The first link is the real recipient click. The second we believe is a machine click as the IP address corresponds to one of Amazon Web Services IP ranges.

IP version 4, which is the most used IP addressing system allows for up to about 4.5 billion different addresses. These can be broken down into blocks or “ranges”, and these ranges are not listed in any central database, and can be reallocated on the fly. A static map of ranges is not technically feasible, it cannot be implemented as a block list with 100% accuracy. Especially that some companies may use the AWS IP’s as their internet endpoint, for example by using some VPN’s, so there is no direct way that we can differentiate legitimate clicks from security software initiated automated clicks that are looking at the link target content and scanning it for malware, and where this software needs to impersonate a user to be sure to see the same content as the end user.

The two known types of machine clicks: Pre-Reception and Post-Click

Pre-Reception

Certain security software on corporate email servers will scan content on incoming emails. They then check the links present in an email, to check any links present in an email for malicious content. The process is executed before any email is delivered into the recipient’s inbox. Generally all links are scanned in one block and all within the same second.

One known email protection solution provider that scans emails in this way is Mimecast. A recipients incoming email server defines this as their email security software. Mimecast processes all mails sent to their email address. In turn, they run security and anti-spam scanning, before sending the email on to the recipient’s real mail server, so in effect, Mimecast sits “transparently” between the sending MTA’s and the recipient’s receiving MTA, as set up by the client’s IT team. 

This can generate a mass of clicks on all mails sent to recipients who implement Mimecast scanning on incoming emails. But this has a strange secondary effect. It scans the links on incoming emails that the sending platform detects as valid clicks. Further, if it accepts a mail after successful scanning, it forwards it to a recipient’s actual mail connection. If the email then bounces for any reason, both clicks are validated as if there was engagement on the message. On top of that a non-delivery bounce that happens several seconds to several minutes after the clicks.

This depends on how the client’s infrastructure is setup. There is no way around this, as all the data is valid. However, it is confusing!

Therefore a cluster of clicks on multiple links in a mail, from an identical IP address in under 2 seconds. Are probably machine clicks. 

Post-Click

Large freemail providers, such as Microsoft on outlook.com/hotmail/live and Google‘s Gmail use Post-Click scanning. This is implemented on other corporate security software too, such as Proofpoint.

Mails in a recipient’s inbox have all their links re-redirected to a different URL. The software stores the received emails original link. Outlook.com modifies and redirects all links sent through safelinks.protection.outlook.com. Clicking on the link will take you to this domain. After that, it redirects to the link that was originally present in the received email.

Microsoft then directs the recipient to the actual link. Via Safelinks, to the sending system who records the click, and again redirected to the target link. To sum up, so far, so good.

The issue is that several minutes later, a second and possibly a third click will be recorded. Also with valid browser identification details. This time though, it will be coming from an IP range that belongs to Microsoft.

Gmail works in the same way. The initial click on a link takes you to the correct web page. Then a second and possibly third click comes through, but from a Google IP range.

Post-Click Analysis

Companies who receive a high volume of emails usually run Post-Click analysis. Where there could be comparatively lower click rates. To avoid scanning every link on an incoming message for malicious content, the provider notes that a link was clicked, and sends the recipient to the correct target. In parallel, adds the target URL to a scanning queue. Later it clicks on the link and it may generate a security alert if malicious content was discovered, and then protecting other users later on, without overloading their processing infrastructure.

In this case, you will see only one click for a message for a given user. Within 10 minutes, you will see one to three extra clicks, all coming from possibly different IP ranges. The first click is valid, however the additional clicks from a different IP address within that 10 minute period are probably machine generated clicks.

Known networks that generate machine clicks

This is a non-exhaustive CIDR list of IP ranges that that we have seen generating potential machine clicks

To convert the CIDR notation addresses above to a classic IP address range. Use this tool.

Final filtering tips in a nutshell

  • If a click comes from one of the IP ranges above they are probably machine clicks. However, anything from Cogent’s range may also be legitimate business traffic.
  • If a series of multiple clicks – minimum two – for a specific email sent & specific recipient all happen within 2 seconds. They are probably all machine clicks, irrespective of their source IP address.
  • Observe within a 10 minute time frame if several clicks are recorded coming from multiple IP addresses. The first click is probably legitimate and the others coming from different IP addresses in that 10 minute time frame are probably machine clicks.