This post will focus on explaining the basics of browser fingerprinting, one of the most effective and advanced tracking methods used in the modern web capable of identify users across several websites even if they are behind a proxy or VPN.
What is browser fingerprinting?
Essentially what fingerprinting boils down to is collecting information from a remote device so that it can be identified later. This information can be used to fully or partially identify users even if they have cookies disabled. Whenever you connect to a website via your laptop, phone, or any other device you are sending information about yourself to that site automatically. And if said browser has JavaScript enabled then the accuracy of these fingerprints are increased substantially.
Simple fingerprinting
The most straightforward type of browser fingerprinting is HTTP fingerprinting.
What is an HTTP request?
Whenever your browser has to fetch a resource from a server it usually has to perform an HTTP request such as GET request. In this case the server is the website you are requesting data from and the resource can be something arbitrary like images, videos, audio files, HTML documents...
Each HTTP request includes a head and a body. The head of an HTTP request contains crucial data such as what encodings your browser supports, browser version, operating system, language, and more. The body of the request contains the resources you're trying to access such as images or videos. If you're submitting data to a server the body of the request could be something like form field input, or a file you are uploading.
Here is an example of a very simple HTTP request and response from the host https://www.google.com/favicon.ico:
INITIAL REQUEST:
GET https://www.google.com/favicon.ico HTTP/1.1 Host: google.com User-Agent: Mozilla/5.0 (Linux x86_64; rv:71.0) Gecko/20100101 Firefox/71.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US Accept-Encoding: gzip, deflate, br DNT: 1 Connection: keep-alive Upgrade-Insecure-Requests: 1 If-Modified-Since: Sun, 29 Dec 2019 04:10:14 GMT If-None-Match: "5e082726-613f" Cache-Control: max-age=0
SERVERS RESPONSE HEAD:
HTTP/1.1 200 OK Server: GWS Date: Tue, 31 Dec 2019 06:41:26 GMT Last-Modified: Sun, 29 Dec 2019 04:10:14 GMT Connection: keep-alive ETag: "5e082726-613f"
SERVERS RESPONSE BODY:
Now what most basic fingerprinting software would do is take the HTTP headers of your initial request (ie: User-Agent, Accept, Accept-Language, Accept-Encoding, DNT) and concatenate their values together so that it can hash the result. This incredibly simple process can immediately narrow down your browser from millions of possible users to thousands.
However this form of fingerprinting is fairly simple and can be unreliable to sites with a high traffic volume as multiple users could end up with the same hash, especially when it comes to mac users and iOS devices. Therefore this method is usually used in conjunction with other more advanced fingerprinting techniques that will be discussed below.
Hardware fingerprinting
Hardware fingerprinting is what you should be worried about as they're far more accurate then HTTP fingerprinters and can vary greatly or even be embedded directly into the websites functionality making them harder to identify. On top of that most are even able to work in conjunction with each other increasing their reliability. The following more advanced techniques are more then capable of identifying specific users even without you having to log in or have cookies enabled.
Canvas fingerprinting:
The HTML5 Canvas is a type of DOM element which allows a website to render/draw 2D and 3D objects within your browser. Web developers typically utilize this element to create animations and draw shapes with less overhead then previously possible. The issue is that this element can also be abused as a fingerprinting mechanism. Canvas fingerprinting is usually done by writing some text within a canvas element. The funny thing is that the canvas does not have to be visible to users and in most cases isn't. Once the text is drawn and rendered on the canvas an array of pixels and their corresponding values is created and the value is hashed returning a somewhat unique fingerprint.
The way this works is because fonts are not like images which are rasterized and render more or less the same across devices. Fonts are essentially just mathematical equations as far as your computer is concerned. Because of this different graphics cards, browsers, and even operating systems use different rendering engines/methods while also applying different types of anti-aliasing and font hinting algorithms. The result being that the text ends up looking slightly different at the pixel level. To humans these differences can be very hard to notice but to computers it's fairly easy as the only thing a computer needs to do is compare the pixel hashes and see if they match any other hashes that we're previously identified. This type of hardware fingerprinting has the potential to make you stick out like my boy Shaun when he walks into a building. Silly Shaun, go back outside where you truly belong.
For more information on font rendering differences across browsers check the following articles:
- https://css-tricks.com/font-rendering-differences-firefox-vs-ie-vs-safari/
- https://multilogin.com/everything-you-need-to-know-about-canvas-fingerprinting/
Audio Context fingerprinting:
This form of hardware fingerprinting utilizes your browser's AudioContext API and is based off your device's audio stack. Essentially what happens is your browser uses your devices soundcard to generates a low frequency sound. After which it then measures how the computer processes this sent data. Based on how this signal is processed, the results from the AudioContext API can help identify the same user across different browsers even on the same device.
More information on AudioContext fingerprinting
WebGL fingerprinting:
WebGL is a Javascript API for rendering 3D objects on a canvas element, another interesting thing about the WebGL API is that there are 2 different ways to fingerprint a user via this API.
The first method is WebGL Report Hashing which is when the browser hashes the WebGL Browser Report table, this table contains a listing of your device's WebGL capabilities and supported extensions. The hash is usually taken from the highest supported WebGL context dump. The advantages of using this technique over other canvas fingerprinting methods is that your browser doesn't have to render anything which would takes up time and resources, instead a table is quickly loaded and hashed, very fast, very simple.
The second method is WebGL Image Hashing which is very similar to canvas fingerprinting discussed above. WebGL Image Hashing involves drawing an image to the screen that would typically utilize more advanced rendering engines such as gradients or shaders. Once again because of the discrepancies between rendering methods across different browsers, graphics cards, and operating systems all the browser has to do is create an array of pixels and then hash the result.
How to defeat fingerprinting
If you want to defeat fingerprinting software then you have to understand that these programs rely on 2 types of user supplied data, data sent via the HTTP headers and Javascript code executed by the browser.
As a general rule of thumb, the more common your fingerprint is the harder it will be to identify certain users, therefore your task should be to make your fingerprint as common as possible.
An example can be made with mac books and iOS devices which are more difficult to fingerprint. This is because most of these devices have similar configurations and similar if not identical hardware. Therefore it's pretty common for mac users to end up with the same fingerprint, effectively thwarting some fingerprinting mechanisms.
Important Browsing habits
The following steps below are also very helpful if not essential to guarding your privacy online. I will be making a separate post going into detail within the near future.
- Disable Flash; In 2020 adobe is officially dropping support for the flaming pile of shit that is flash. Since it's inception in 1996, flash has been used to violate user privacy, spread malware, and even exploit unsuspecting users online. Remember that stigma of being able to get a virus from going on a sketchy website, that's because of flash. Thanks adobe. (essential)
- Disable 3rd party cookies; doing this simple task is enough to thwart a vast amount of common trackers. (essential)
- Set cookies to expire when you exit your browser. (recommended)
- Install uBlock origin or uMatrix which are amazing browser extensions that block ads and trackers. (recommended)
- Use a privacy focused browser such as firefox or at least switch to chromium if you must use chrome. (essential)
- Install an extension such as noscript which makes Javascript fingerprinting impossible as it blocks untrusted Javascript execution by default. This is a more extreme measure but a precaution I have been taking for years. Plus it's pretty cool to see how websites function without Javascript and what kind of garbage they're trying to load. (optional)
Fighting HTTP fingerprinting
To fight HTTP fingerprinting you're going to have to dive into your browser settings and modify certain headers to something more common. A good example would be changing your Accept header to a more common value or setting your user agent to a more common browser and operating system. These can also be done via extensions. Note if you are on a Unix operating system chances are that gzip will be present within your Accept-Encoding header. This is important because if you want to set your user agent string to something with windows you will stick out and be very easy to track. This is due to gzip not being present on windows by default so anybody with a windows user agent and gzip enabled is an anomaly.
An alternative would be to have the value of certain headers such as your user agent change slightly and constantly. If your fingerprint changes constantly then it's useless to any tracker.
Fighting Javascript fingerprinting
Now the best thing you can do to defeat these trackers is to completely disable Javascript. As mentioned above I am a big fan of the noscript extension which performs this very task flawlessly. However to most people this isn't a practical solution as disabling Javascript will break most modern websites (you can thank those disgusting millennial web devs for that).
An alternative and more practical solution would be to have an extension that blocks canvas, webGL, and AudioContext code by default or at least notify users that these elements are present on the page and ask for permission before rendering or executing them. As of now I have not found any extensions that do this effectively but if you are able to find something please contact me so I can update this article. Heck if enough people are interested I will develop a Firefox extension for this.
Another interesting solution I found was to run a program that generates random noise during these API calls/readouts so that the result of these readouts change slightly each time. Even though this only changes the results slightly it's enough to get a completely new hash making your fingerprint unique and impossible to track via the methods discussed above.
The only way to defeat this technique would be through advanced statistical analysis, which is very time consuming and usually involves manual human labor. On top of that the more people doing this, the harder it will be for companies to try and fingerprint us.
To add a unique persistent noise to canvas readouts check out this Firefox extension by Multilogin: Canvas Defender Extension
To defeat the other 3 forms of fingerprinting I recommend looking into the Multilogin App