Section: New Results
Measurement and Detection of Web Tracking
Detecting Web Trackers via Analyzing Invisible Pixels
The Web has become an essential part of our lives: billions are using Web applications on a daily basis and while doing so, are placing digital traces on millions of websites. Such traces allow advertising companies, as well as data brokers to continuously profit from collecting a vast amount of data associated to the users.
Web tracking has been extensively studied over the last decade. To detect tracking, most of the research studies and user tools rely on consumer protection lists. EasyList  and EasyPrivacy  (EL&EP) are the most popular publicly maintained blacklist of know advertising and tracking domains, used by the popular browser extensions AdBlock Plus  and uBlockOrigin . Disconnect  is another very popular list for detecting domains known for tracking, used in Disconnect browser extension  and in integrated tracking protection of Firefox browser . Relying on EL&EP or Disconnect became the de facto approach to detect third-party tracking requests in privacy and measurement community. However it is well-known that these lists detect only known tracking and ad-related requests, and a tracker can easily avoid this detection by registering a new domain or changing the parameters of the request.
In this work, to detect trackers, we propose a new technique based on the analysis of invisible pixels (By “invisible pixels” we mean 1x1 pixel images or images without content.). These images are routinely used by trackers in order to send information or third-party cookies back to their servers: the simplest way to do it is to create a URL containing useful information, and to dynamically add an image HTML tag into a webpage. Since invisible pixels do not provide any useful functionality, we consider them perfect suspects for tracking.
By using an Inria cluster and setting up a distributed crawler, we have collected a dataset of invisible pixels from 829,349 webpages. By analyzing this dataset, we observed that invisible pixels are widely used: more than 83% of pages incorporate at least one invisible pixel.
Overall, we made the following key contributions:
We define a new classification of Web tracking behaviors based on the analysis of invisible pixels. By analyzing behavior associated to the delivery of invisible pixels, we propose a new fine-grained classification of tracking behaviors, that consists of 8 categories of tracking. To our knowledge, we are the first to analyse tracking behavior based on invisible pixels that are present on 83% of the webpages.
We apply our classification to a full dataset and uncover new collaborations between third-party domains. We detect new relationships between third-party domains beyond basic cookie syncing detected in the past. In particular, we discovered that first to third party cookie syncing is the most prevalent tracking behavior performed by 50,812 distinct domains. Finally, we find that 76.23% of requests responsible for tracking originate from loading other resources than invisible images. To our knowledge, we are the first to discover a highly prevalent first to third party syncing behavior detected on 51.54% of all crawled domains.
We show that the consumer protection lists cannot be considered as ground truth to identify trackers. We find out that the browser extensions based on EasyList and EasyPrivacy (EL&EP) and Disconnect each miss 22% of tracking requests we detect. Moreover, if we combine all the lists, 238,439 requests originated from 7,773 domains are unknown to these lists and hence still track users on 5,098 webpages even if tracking protection is installed. We also detect instances of cookie syncing in domains unknown to these lists and therefore likely unrelated to advertising. To our knowledge, we are the first to detect that EL&EP and also Disconnect lists used in majority of Web Tracking detection literature are actually missing tracking requests to 7,773 distinct domains.
This working paper  is currently under submission at an international conference.
A survey on Browser Fingerprinting
This year, we have conducted a survey on the research performed in the domain of browser fingerprinting, while providing an accessible entry point to newcomers in the field. We explain how this technique works and where it stems from. We analyze the related work in detail to understand the composition of modern fingerprints and see how this technique is currently used online. We systematize existing defense solutions into different categories and detail the current challenges yet to overcome.
The goal of this work is twofold: first, to provide an accessible entry point for newcomers by systematizing existing work, and second, to form the foundations for future research in the domain by eliciting the current challenges yet to overcome. We accomplish these goals with the following contributions:
A thorough survey of the research conducted in the domain of browser fingerprinting with a summary of the framework used to evaluate the uniqueness of browser fingerprints and their adoption on the web.
This work has been submitted for publication at an international journal.
Measuring Uniqueness of Browser Extensions and Web Logins
Web browser is the tool people use to navigate through the Web, and privacy research community has studied various forms of browser fingerprinting. Researchers have shown that a user's browser has a number of inherent “physical” characteristics that can be used to uniquely identify her browser and hence to track it across the Web. Fingerprinting of users' devices is similar to physical biometric traits of people, where only physical characteristics are studied.
Similar to previous demonstrations of user uniqueness based on their behavior, behavioral characteristics, such as browser settings and the way people use their browsers can also help to uniquely identify Web users. For example, a user installs web browser extensions she prefers, such as AdBlock, LastPass, or Ghostery to enrich her Web experience. Also, while browsing the Web, she logs in her preferred social networks, such as Gmail, Facebook or LinkedIn. In this work, we study users' uniqueness based on their behavior and preferences on the Web: we analyze how unique are Web users based on their browser extensions and logins.
In this work, we performed the first large-scale study of user uniqueness based on browser extensions and Web logins, collected from more than 16,000 users who visited our website https://extensions.inrialpes.fr/. Our experimental website identifies installed Google Chrome extensions via Web Accessible Resources. and detects websites where the user is logged in by methods that rely on URL redirection and CSP violation reports. Our website is able to detect the presence of 13K Chrome extensions (the number of detected extensions varied monthly between and ), covering approximately of all free Chrome extensions (The list of detected extensions and websites are available on our website: https://extensions.inrialpes.fr/faq.php) . We also detect whether the user is connected to one or more of 60 different websites. Our main contributions are:
We study the privacy dilemma on Adblock and privacy extensions, that is, how well these extensions protect their users against trackers and how they also contribute to uniqueness. We evaluate the statement “the more privacy extensions you install, the more unique you are” by analyzing how users' uniqueness increases with the number of privacy extensions she installs; and by evaluating the tradeoff between the privacy gain of the blocking extensions such as Ghostery  and Privacy Badger .
We furthermore show that browser extensions and web logins can be exploited to fingerprint and track users by only checking a limited number of extensions and web logins. We have applied an advanced fingerprinting algorithm  that carefully selects a limited number of extensions and logins. For example, we show that 54.86% of users are unique based on all 16,743 detectable extensions. However, by testing 485 carefully chosen extensions we can identify more than 53.96% of users. Besides, detecting 485 extensions takes only 625ms.
Finally, we give suggestions to the end users as well as website owners and browser vendors on how to protect the users from the fingerprinting based on extensions and logins.
This paper has been published at at WPES international workshop affiliated with ACM CCS 2018 .