Section: New Results

Foundations of information hiding

Information hiding refers to the problem of protecting private information while performing certain tasks or interactions, and trying to avoid that an adversary can infer such information. This is one of the main areas of research in Comète; we are exploring several topics, described below.

Additive and multiplicative notions of leakage, and their capacities

Protecting sensitive information from improper disclosure is a fundamental security goal. It is complicated, and difficult to achieve, often because of unavoidable or even unpredictable operating conditions that can lead to breaches in planned security defences. An attractive approach is to frame the goal as a quantitative problem, and then to design methods that measure system vulnerabilities in terms of the amount of information they leak. A consequence is that the precise operating conditions, and assumptions about prior knowledge, can play a crucial role in assessing the severity of any measured vunerability.

In [20] we developed this theme by concentrating on vulnerability measures that are robust in the sense of allowing general leakage bounds to be placed on a program, bounds that apply whatever its operating conditions and whatever the prior knowledge might be. In particular we proposed a theory of channel capacity, generalising the Shannon capacity of information theory, that can apply both to additive and to multiplicative forms of a recently-proposed measure known as g-leakage. Further, we explored the computational aspects of calculating these (new) capacities: one of these scenarios can be solved efficiently by expressing it as a Kantorovich distance, but another turns out to be NP-complete.

We also found capacity bounds for arbitrary correlations with data not directly accessed by the channel, as in the scenario of Dalenius's Desideratum.

Compositionality Results for Quantitative Information Flow

In the min-entropy approach to quantitative information flow, the leakage is defined in terms of a minimization problem, which, in case of large systems, can be computationally rather heavy. The same happens for the recently proposed generalization called g-vulnerability. In [28] we studied the case in which the channel associated to the system can be decomposed into simpler channels, which typically happens when the observables consist of several components. Our main contribution was the derivation of bounds on the g-leakage of the whole system in terms of the g-leakages of its components.

LeakWatch: Estimating Information Leakage from Java Programs

Programs that process secret data may inadvertently reveal information about those secrets in their publicly-observable output. In [23] we presented LeakWatch, a quantitative information leakage analysis tool for the Java programming language; it is based on a flexible "point-to-point" information leakage model, where secret and publicly-observable data may occur at any time during a program's execution. LeakWatch repeatedly executes a Java program containing both secret and publicly-observable data and uses robust statistical techniques to provide estimates, with confidence intervals, for min-entropy leakage (using a new theoretical result presented in this paper) and mutual information. We demonstrated how LeakWatch can be used to estimate the size of information leaks in a range of real-world Java programs.

On the information leakage of differentially-private mechanisms

Differential privacy aims at protecting the privacy of participants in statistical databases. Roughly, a mechanism satisfies differential privacy if the presence or value of a single individual in the database does not significantly change the likelihood of obtaining a certain answer to any statistical query posed by a data analyst. Differentially-private mechanisms are often oblivious: first the query is processed on the database to produce a true answer, and then this answer is adequately randomized before being reported to the data analyst. Ideally, a mechanism should minimize leakage—i.e., obfuscate as much as possible the link between reported answers and individuals' data—while maximizing utility—i.e., report answers as similar as possible to the true ones. These two goals are, however, conflicting, and a trade-off between privacy and utility is imposed.

In [13] we used quantitative information flow principles to analyze leakage and utility in oblivious differentially-private mechanisms. We introduced a technique that exploits graph-symmetries of the adjacency relation on databases to derive bounds on the min-entropy leakage of the mechanism. We evaluated utility using identity gain functions, which are closely related to min-entropy leakage, and we derived bounds for it. Finally, given some graph-symmetries, we provided a mechanism that maximizes utility while preserving the required level of differential privacy.

Metric-based approaches for privacy in concurrent systems

In a series of two papers we investigated metric-based techniques for varifying differential privacy in the context of concurrent systems.

The first work [30] was motivated from the one of Tschantz et al., who proposed a verification method based on proving the existence of a stratified family of bijections between states, that can track the privacy leakage, ensuring that it does not exceed a given leakage budget. We improved this technique by investigating state properties which are more permissive and still imply differential privacy. We introduced a new pseudometric, still based on the existence of a family of bijections, but relaxing the relation between them by integrating the notion of amortization, and showed that this results to a more parsimonious use of the privacy budget. We also showed that for the new pseudometric the level of differential privacy is continuous on the distance between the starting states, which makes it suitable for verification.

Continuing this line of work, we studied the pseudometric based on the Kantorovich lifting, which is one of the most popular notions of distance between probabilistic processes proposed in the literature. However, its application in verification is limited to linear properties. In [19] , we proposed a generalization which allows to deal with a wider class of properties, such as those used in security and privacy. More precisely, we proposed a family of pseudometrics, parametrized on a notion of distance which depends on the property we want to verify. Furthermore, we showed that the members of this family still characterize bisimilarity in terms of their kernel, and provided a bound on the corresponding distance between trace distributions. Finally, we studied the instance corresponding to differential privacy, and we showed that it has a dual form, easier to compute. We also proved that the typical process-algebra constructs are non-expansive, thus paving the way to a modular approach to verification.

Optimal Geo-Indistinguishable Mechanisms for Location Privacy

With location-based services becoming increasingly more popular, serious concerns are being raised about the potential privacy breaches that the disclosure of location information may induce. In [21] we considered two approaches that have been proposed to limit and control the privacy loss: one is the geo-indistinguishability notion developed within Comète, which is inspired by differential privacy, and like the latter it is independent from the side knowledge of the adversary and robust with respect to composition of attacks. The other one is the mechanism of Shokri et al., which offers an optimal trade-off between the loss of quality of service and the privacy protection with respect to a given Bayesian adversary.

We showed that it is possible to combine the advantages of the two approaches: given a minimum threshold for the degree of geo-indistinguishability, we construct a mechanism that offers the maximal utility, as the solution of a linear program. Thanks to the fact that geo-indistinguishability is insensitive to the remapping of a Bayesian adversary, the mechanism so constructed is optimal also in the sense of Shokri et al. Furthermore we proposed a method to reduce the number of constraints of the linear program from cubic to quadratic (with respect to the number of locations), maintaining the privacy guarantees without affecting significantly the utility of the generated mechanism. This lowers considerably the time required to solve the linear program, thus enlarging significantly the size of location sets for which the optimal trade-off mechanisms can still be computed.

A Predictive Differentially-Private Mechanism for Mobility Traces

With the increasing popularity of GPS-enabled handheld devices, location based applications and services have access to accurate and real-time location information, raising serious privacy concerns for their millions of users. Trying to address these issues, the notion of geo-indistinguishability was recently introduced, adapting the well-known concept of Differential Privacy to the area of location-based systems. A Laplace-based obfuscation mechanism satisfying this privacy notion works well in the case of a sporadic use; Under repeated use, however, independently applying noise leads to a quick loss of privacy due to the correlation between the location in the trace.

In [22] we showed that correlations in the trace can be in fact exploited in terms of a prediction function that tries to guess the new location based on the previously reported locations. The proposed mechanism tests the quality of the predicted location using a private test; in case of success the prediction is reported otherwise the location is sanitized with new noise. If there is considerable correlation in the input trace, the extra cost of the test is small compared to the savings in budget, leading to a more efficient mechanism.

We evaluated the mechanism in the case of a user accessing a location-based service while moving around in a city. Using a simple prediction function and two budget spending strategies, optimizing either the utility or the budget consumption rate, we showed that the predictive mechanism can offer substantial improvements over the independently applied noise.

A differentially private mechanism of optimal utility for a region of priors

Differential privacy is a notion of privacy that was initially designed for statistical databases, and has been recently extended to a more general class of domains. Both differential privacy and its generalized version can be achieved by adding random noise to the reported data. Thus, privacy is obtained at the cost of reducing the data's accuracy, and therefore their utility.

In [31] we considered the problem of identifying optimal mechanisms for generalized differential privacy, i.e. mechanisms that maximize the utility for a given level of privacy. The utility usually depends on a prior distribution of the data, and naturally it would be desirable to design mechanisms that are universally optimal, i.e., optimal for all priors. However it is already known that such mechanisms do not exist in general. We then characterized maximal classes of priors for which a mechanism which is optimal for all the priors of the class does exist. We showed that such classes can be defined as convex polytopes in the priors space.

As an application, we considered the problem of privacy that arises when using, for instance, location-based services, and we showed how to define mechanisms that maximize the quality of service while preserving the desired level of geo-indistinguishability.

Compositional analysis of information hiding

Systems concerned with information hiding often use randomization to obfuscate the link between the observables and the information to be protected. The degree of protection provided by a system can be expressed in terms of the probability of error associated to the inference of the secret information. In [14] we considered a probabilistic process calculus to specify such systems, and we studied how the operators affect the probability of error. In particular, we characterized constructs that have the property of not decreasing the degree of protection, and that can therefore be considered safe in the modular construction of these systems. As a case study, we applied these techniques to the Dining Cryptographers, and we derived a generalization of Chaum's strong anonymity result.