April 02, 2019

Introduction to Device Fingerprinting

by Camilo Reyes

Introduction to Device Fingerprinting

Customers nowadays are into all kinds of devices. There are screens everywhere and the average user tends to own more than one. This makes it hard to know where real users are while using a secure system. With more devices, an app can’t identify a trusted device versus one that belongs to an attacker. From an attacker’s perspective, this means more attack vectors in being able to mimic real users.

An attacker is happy with more options — but is there a way to mitigate this? If it is possible to identify both the user and the device, how hard can this be to put in place?

What Is Fingerprinting?

A fingerprint can uniquely identify a trusted user on the internet. This unique fingerprint can depend on both the browser and the device. Multi-factor authentication can verify this trusted device to identify a real user. An attacker will then need more than user credentials to do any real damage to a secure system. The application has the option of managing a small set of trusted devices that belong to a user. This reduces attack vectors and increases security.

For fingerprints, we’ll pick ClientJS as the open source library to gather data points in the browser. An open source library is a good option because anyone can inspect the code for security flaws.

To start using this library, do:

var client = new ClientJS();  

Keep this client JavaScript object in mind as we’ll come to it throughout this piece. The idea is to focus on the main question, how can I identify an actual user behind a screen?

Device vs. Browser Fingerprints

There are two kinds of fingerprints: those that identify a browser and those that detect a device. Knowing which of the two kinds a data point identifies helps in increasing entropy. A good level of entropy is necessary because it adds uniqueness to the fingerprint. The entropy level sets a differentiator so the app can tell devices apart. Having low entropy means all devices have the same fingerprint.

For factors that target the device, these are some options:

var os = client.getOS();  
var version = client.getOSVersion();  
var language = client.getLanguage();  
var timezone = client.getTimeZone();  
var resolution = client.getAvailableResolution();  

Factors such as the OS and version tend to remain static per device. The language and timezone vary depending on the physical location of the device. Customers who travel get dinged on location preferences because it changes so often. The available resolution has screen data which changes per physical configuration. If the user is behind a docked laptop, for example, then the screen resolution changes.

For factors that target both the browser and the device:

var canvas = client.getCanvasPrint();  
var fonts = client.getFonts();  
var plugins = client.getPlugins();  

These are hybrid factors because they vary per browser and device. The Canvas API in the browser taps into both hardware and browser capabilities. Fonts pull up system fonts installed on the device which the browser has access to. ClientJS has a long list of fonts it detects which might affect performance — if the library takes too long, be sure to reduce this list. Plugins are pieces of software installed on the device which the browser can detect.

For factors that target the browser, try:

var userAgent = client.getUserAgent();  

The user agent can identify the browser in a unique way. One caveat is it changes with every new release of the browser, so it has high entropy. From the server side, devices with identical user agents will send matching headers. One differentiator is the IP address which reveals the location unless the user is behind a VPN.

It is important to know customer behavior around devices when coming up with a list of data points. Knowing device types, travel, and upgrade patterns helps in getting a good list.

Try Jscrambler JavaScript Monitoring

With every data point, what you get back is this raw string. Strings are good for humans who know how to read but cumbersome for a computer to process. So, is there a simple way to tell data points apart without matching on raw strings?

To Hash or Not to Hash

Hashing raw fingerprint data allows quick analysis. A computer can store and match a number much quicker than raw string data. This reduces storage and retrieval times from a database and it’s efficient to put in place.

For fingerprints, we only care about matches from previous data, so a number hash is superior. The matching algorithm can handle many data points that only show partial matches. For example, say the resolution matches but the canvas and fonts data points do not. With a hash, it’s efficient to do partial matches on all data points and set a threshold. If partial matches are below the threshold, it becomes a new device.

ClientJS uses the MurmurHash3 algorithm to hash fingerprint data. This algorithm returns an unsigned int in a JavaScript number type. On the server, a positive integer type may not be supported if you’re not using JavaScript. So be sure to use the appropriate type on the server that supports this hash value. To learn more about hashing, check out this article on hashing algorithms.

To hash a data point in ClientJS, do:

var canvasFp = client.getCustomFingerprint(canvas);  

You have the option to hash data points in both the client or the server in JavaScript. This allows you to have both the hash and the raw string data, if necessary. One caveat is to make sure raw data is secure, so it doesn’t leak any customer PII.

Conclusion

Fingerprints help mitigate attack vectors with the many devices in use today. Fingerprint data points can uniquely identify the device, the browser, or both. Hashing data points enables fast retrieval and analysis. Raw fingerprints are personally identifiable information or PII, so keep this data secure.

Book a Jscrambler Demo