We are witnessing an exponential accumulation of personal data on central servers: data automatically gathered by administrations and companies but also data produced by individuals themselves (e.g., photos, agendas, data produced by smart appliances and quantified-self devices) and deliberately stored in the cloud for convenience. The net effect is, on the one hand, an unprecedented threat on data privacy due to abusive usage and attacks and, on the other hand, difficulties in providing powerful user-centric services (e.g. personal big data) which require crossing data stored today in isolated silos. The Personal Cloud paradigm holds the promise of a Privacy-by-Design storage and computing platform, where each individual can gather her complete digital environment in one place and share it with applications and users, while preserving her control. However, this paradigm leaves the privacy and security issues in user's hands, which leads to a paradox if we consider the weaknesses of individuals' autonomy in terms of computer security, ability and willingness to administer sharing policies. The challenge is however paramount in a society where emerging economic models are all based - directly or indirectly - on exploiting personal data.
While many research works tackle the organization of the user's workspace, the semantic unification of personal information, the personal data analytics problems, the objective of the PETRUS project-team is to tackle the privacy and security challenges from an architectural point of view. More precisely, our objective is to help providing a technical solution to the personal cloud paradox. More precisely, our goals are (i) to propose new architectures (encompassing both software and hardware aspects) and administration models (decentralized access and usage control models, data sharing, data collection and retention models) for secure personal cloud data management, (ii) to propose new secure distributed database indexing models, privacy preserving query processing strategies and data anonymization techniques for the personal cloud, and (iii) study economic, legal and societal issues linked to secure personal cloud adoption.
To tackle the challenge introduced above, we identify three main lines of research:
These contributions will also rely on tools (algorithms, protocols, proofs, etc.) from other communities, namely security (cryptography, secure multiparty computations, formal methods, differential privacy, etc.) and distributed systems (distributed hash tables, gossip protocols, etc.). Beyond the research actions, we structure our software activity around a single common platform (rather than isolated demonstrators), integrating our main research contributions, called PlugDB. This platform is the cornerstone to help validating our research results through accurate performance measurements on a real platform, a common practice in the DB community, and target the best conferences. It is also a strong vector to federate the team, simplify the bootstrapping of new PhD or master students, conduct multi-disciplinary research and open the way to industrial collaborations and technological transfers.
As stated in the software section, the Petrus research strategy aims at materializing its scientific contributions in an advanced hardware/software platform with the expectation to produce a real societal impact. Hence, our software activity is structured around a common Secure Personal Cloud platform rather than several isolated demonstrators. This platform will serve as the foundation to develop a few emblematic applications.
Several privacy-preserving applications can actually be targeted by a Personal Cloud platform, like: (i) smart disclosure applications allowing the individual to recover her personal data from external sources (e.g., bank, online shopping activity, insurance, etc.), integrate them and cross them to perform personal big data tasks (e.g., to improve her budget management) ; (ii) management of personal medical records for care coordination and well-being improvement; (iii) privacy-aware data management for the IoT (e.g., in sensors, quantified-self devices, smart meters); (iv) community-based sensing and community data sharing; (v) privacy-preserving studies (e.g., cohorts, public surveys, privacy-preserving data publishing). Such applications overlap with all the research axes described above but each of them also presents its own specificities. For instance, the smart disclosure applications will focus primarily on sharing models and enforcement, the IoT applications require to look with priority at the embedded data management and sustainability issues, while community-based sensing and privacy-preserving studies demand to study secure and efficient global query processing.
Among these applications domains, one is already receiving a particular attention from our team. Indeed, we gained a strong expertise in the management and protection of healthcare data through our past DMSP (Dossier Medico-Social Partagé) experiment in the field. This expertise is being exploited to develop a dedicated healthcare and well-being personal cloud platform. We are currently deploying 10000 boxes equipped with PlugDB in the context of the DomYcile project. In this context, we are currently setting up an Inria Innovation Lab with the Hippocad company to industrialize this platform and deploy it at large scale (see Section the bilateral contract OwnCare II-Lab).
PlugDB is a complete platform dedicated to a secure and ubiquitous management of personal data. It aims at providing an alternative to a systematic centralization of personal data. The PlugDB engine is a personal database server capable of storing data (tuples and documents) in tables and BLOBs, indexing them, querying them in SQL, sharing them through assertional access control policies and enforcing transactional properties (atomicity, integrity, durability).
The prototype version of PlugDB engine is embedded in a tamper-resistant hardware device combining the security of smartcard with the storage capacity of NAND Flash. The personal database is hosted encrypted in NAND Flash and the PlugDB engine code runs in the tamper-resistant device. Complementary modules allow to pre-compile SQL queries for the applications, communicate with the DBMS from a remote Java program, synchronize local data with remote servers (typically used for recovering the database in the case of a broken or lost devices) and participate in distributed computation (e.g., global queries). Then, PlugDB was extended to run both on secure devices provided by Gemalto and on specific secure devices designed by PETRUS and assembled by electronic SMEs. Mastering the hardware platform opens up new research and experiment opportunities (e.g., support for wireless communication, secure authentication, sensing capabilities, battery powered ...).
PlugDB engine has been registered first at APP (Agence de Protection des Programmes) in 2009 - a new version being registered every two years - and the hardware datasheets in 2015. PlugDB has been experimented in the field, notably in the healthcare domain. PlugDB was used in an educational platform that we set up : SIPD (Système d’Information Privacy- by-Design). SIPD was used at ENSIIE, INSA CVL and UVSQ through the Versailles Sciences Lab fablab, to raise students awareness of privacy protection problems and embedded programming.
PlugDB combines several research contributions from the team, at the crossroads of flash data management, embedded data processing and secure distributed computations. It then strongly federates all members of our team (permanent members, PhD students and engineers). It is also a vector of visibility, technological transfer and dissemination and gives us the opportunity to collaborate with researchers from other disciplines around a concrete privacy-enhancing platform.
PlugDB is currently industrialized in the context of the OwnCare Inria Innovation Lab (II-Lab). In OwnCare, PlugDB acts as a secure personal cloud to manage medical/social data for people receiving care at home. It is currently being deployed over 10.000 patient in the Yvelines district. The industrialization process covers the development of a complete testing environment, the writing of a detailed documentation and the development of additional features (e.g., embedded ODBC driver, TPM support, flexible access control model and embedded code upgrade notably). It has also required the design of a new hardware platform equipped with a battery power supply, introducing new energy consumption issues for the embedded software.
Personal Data Management Systems (PDMS) arrive at a rapid pace boosted by smart disclosure initiatives and new regulations such as GDPR. However, our recent survey
1indicates that the existing PDMS solutions cover partially the PDMS data life-cycle and, more importantly, focus on specific privacy threats depending on the employed architecture. To address this issue, we proposed in
1a logical reference architecture for an extensive (i.e., covering all the major functionalities) and secure (i.e., circumventing all the threats specific to the PDMS context) PDMS. We also discussed several possible physical instances fo the architecture and showed that TEEs (Trusted Execution Environments) are a prime option for building a trustworthy PDMS platform
2.
Hence, based on our previous studies, we currently develop an extensive and secure PDMS (ES-PDMS) platform using the state-of-the-art TEE technology available today, i.e., Intel Software Guard eXtension (SGX). Our ES-PDMS software stack can be deployed on any SGX-enabled machine (i.e., any relatively recent computer having an Intel CPU). We have acquired a server with 6 Intel Xeon E-2276G CPU cores allowing a collaborative development of the ES-PDMS protoype by the Petrus team as well as the related Personal Cloud applications. We note that the Intel Xeon CPU series offer access to the Intel Attestation Service1 which is required for the remote enclave attestation process. This allows us to implement (and not only to simulate) remote attestations required by several use-cases in the PDMS context (e.g., distributed computations between a network of PDMSs or attesting results of local computations on the PDMS to a third party).
To design a platform for ES-PDMS 1, our approach is to use trusted execution environments. The challenge is to orchestrate the different data-related tasks using secure hardware enclaves, in order to provide rich data-oriented functionality while meeting the desired security objectives. We have begun the implementation of a new PDMS platform based on Intel SGX (see section 5.2). By decomposing the computation into data tasks, we aim to guarantee the bilateral security property, on the one hand by protecting the data confidentiality and privacy of the PDMS owner, and on the other hand by guaranteeing the integrity of the computed results shared with third parties. This work is part of Robin Carpentier's PhD. A short paper describing the underlying research problem was presented at EuroS&P 2021 14. A demonstration illustrating the proposal has also been designed.
Participatory sensing allows individuals to contribute data across time and space in order to feed general interest computation tasks. However, there are major obstacles including the preservation of the privacy of contributors. This consideration has led to two main approaches, sometimes combined, which are, respectively, to trade privacy for rewards, and to take advantage of privacy-enhancing technologies anonymizing the collected data. Although relevant, we claim that these approaches do not sufficiently take into account the consent of the participants to the use of the personal data provided and may even lead to defects in consent even in the presence of a trusted system. To address this issue, we introduce the
Mobile participatory sensing (MPS) could benefit many application domains. A major domain is smart transportation, with applications such as vehicular traffic monitoring, vehicle routing, or driving behavior analysis. However, MPS’s success depends on finding a solution for querying large numbers of smart phones or vehicular systems, which protects user location privacy and works in real-time. This work proposes PAMPAS, a privacy-aware mobile distributed system for efficient data aggregation in MPS. In PAMPAS, mobile devices enhanced with secure hardware, called secure probes (SPs), perform distributed query processing, while preventing users from accessing other users' data. A supporting server infrastructure (SSI) coordinates the inter-SP communication and the computation tasks executed on SPs. PAMPAS ensures that SSI cannot link the location reported by SPs to the user identities even if SSI has additional background information. Moreover, an enhanced version of the protocol, named PAMPAS+, makes the system robust even against advanced hardware attacks on the SPs. Hence, the risk of user location privacy leakage remains very low even for an attacker controlling the SSI and a few corrupted SPs. Our experimental results demonstrate that these protocols work efficiently on resource constrained SPs being able to collect the data, aggregate them, and share statistics or derive models in real-time. This work 12 has been accomplished in collaboration with NJIT and DePaul University.
A Personal Data Management System (PDMS) allows its owner to easily collect, store and manage data, directly generated by her devices, or resulting from her interactions with companies or administrations. PDMSs unlock innovative usages by crossing multiple data sources from one or many users, thus requiring aggregation primitives. Indeed, aggregation primitives are essential to compute statistics on user data and are also a fundamental building block for machine learning algorithms. Typically, CozyCloud (the PETRUS partner for Julien's CIFRE) wishes to use distributed aggregation for a Naive Bayes classifier. In this work, we study a protocol allowing for secure aggregation in a massively distributed PDMS environment, which adapts to selective participation and PDMSs characteristics, and is reliable with respect to failures, with no compromise on accuracy. Initial results were published in 15 and 16 with preliminary experiments showing the effectiveness of our protocol. It can adapt to several contexts with varying PDMSs characteristics in terms of communication speed or CPU resources and can adjust the aggregation strategy to the estimated selective participation.
We call Edgelet computing the current convergence between Opportunistic Network (OppNet) and Trusted Execution Environment (TEE) at the very edge of the network. We believe that this convergence bears the seeds of a novel and important class of applications leveraging fully decentralized and highly secure computations among data scattered on multiple personal devices. In this work, we tackle the data management and distributed system issues related to this environment : we characterize the Edgelet computing paradigm by a combination of architectural and computing assumptions, an unusual threat model and three properties that guarantee the liveness, safety and security of executions; we propose two alternative execution strategies enforcing liveness in opposite ways and discuss their impact on safety; we provide preliminary quantitative and qualitative evaluations of these strategies and derives from them design rules to adapt centralized computations to the Edgelet computing paradigm. This work was submitted recently and we are now focusing more specifically on machine learning algorithms within an Edgelet computing environment.
The execution of distributed queries on populations of PDMS involves communication patterns between computing nodes (see 6), which may depend on the values of the personal data being processed (a computing node may aggregate personal data corresponding to a given range of sensitive values). An attacker observing communications, even encrypted, can potentially infer personal information about participants. The use of traditional solutions to conceal data-dependent communications at runtime results in either significant performance penalties or privacy gains that are difficult to quantify. Chapter 4 of Riad Ladel's PhD manuscript formulates as an
The place of individuals and the control of their data have emerged as central issues in the European data protection regulation. The empowerment of the individual has notably resulted in the recognition of a new prerogative for the individual: the right to the portability of personal data. The corollary of this new right is the design and deployment of technical platforms, commonly known as Personal Data Mangement Servers (PDMS) or PIMS, allowing the individual to consolidate all his or her data in a single system managed under his or her control. On the strength of these technical and legal innovations, several questions arise: what forms of empowerment are targeted in practice? What are the appropriate conditions to guarantee the objective pursued? At the crossroads of these questions, one dimension appears to be insufficiently exploited: that of agency. During this period, in collaboration with law researchers Célia Zolynski (IRJS, Sorbonne Univ.) and Sébastien Chaudat (DANTE, Univ. of Versailles), we have transposed this notion from the social sciences to the management of personal data and provided a new reading of the empowerment measures of Big Data functionalities on personal data. This study led in 2021 to a journal publication in the Global Privacy Law Review 11.
Partners: PETRUS, Hippocad
End 2016, the Yvelines district lauched a public call for tender to deploy a personal medical-social folder aiming at covering the whole distinct (10.000 patients). The Hippocad company, in partnership with Inria, won this call for tender with a solution called DomYcile and the project was launched in July 2017. DomYcile is based on a home box combining the PlugDB hardware/software technology developed by the Petus team and a communication layer based on SigFox. Hippocad and Petrus then decided to launch a joint II-Lab (Inria Innovation Lab) named OwnCare in January 2018. The objective is threefold: (1) build an industrial solution based on PlugDB and deploy it in the Yvelines district in the short-term, (2) use this Yvelines testbed to improve the solution and try to deploy it at the national/international level in the medium-term and (3) design flexible/secure/mobile personal medical folder solutions addressing new usages (IoT, machine learning models, distributed statistics, etc.) in the long-term. In 2021, important efforts have been put (1) on the deployment of the boxes in the field despite the Covid pandemia, (2) on pursuing the design of decentralized privacy-preserving distributed computations and (3) on initiating research work related to daily activity discovery thanks to home sensors and machine learning algorithms. On the administrative part, a followup of the II-Lab for the next 4 years is under preparation.
Partners: Cozy Cloud, PETRUS
A third CIFRE PhD thesis has been started between Cozy Cloud and Julien Mirval from PETRUS. Cozy Cloud is a French startup providing a personal Cloud platform. The Cozy product is a software stack that anyone can deploy to run his personal server in order to host his personal data and web services. The objective of this thesis is to propose appropriate solutions to effectively train an AI model (e.g., a deep neural network) in a fully distributed system while providing strong security guarantees to the participating nodes. The results, in the form of protocols and distributed and secure execution algorithms, will be applied to practical cases provided by the Cozy Cloud company.
Partners: Orange Labs (coordinator), PETRUS (Inria-UVSQ), Cozy Cloud, U. of Versailles.
The objective of PerSoCloud is to design, implement and validate a full-fledged Privacy-by-Design Personal Cloud Sharing Platform. One of the major difficulties linked to the concept of personal cloud lies in organizing and enforcing the security of the data sharing while the data is no longer under the control of a central server. We identify three dimensions to this problem. Devices-sharing: assuming that the primary copy of user U1's personal data is hosted in a secure place, how to share and synchronize it with U1's multiple (mobile) devices without compromising security? Peers-sharing: how user U1 could exchange a subset of his-her data with an identified user U2 while providing to U1 tangible guarantees about the usage made by U2 of this data? Community-sharing: how user U1 could exchange a subset of his-her data with a large community of users and contribute to personal big data analytics while providing to U1 tangible guarantees about the preservation of his-her anonymity? In addition to tackling these three scientific and technical issues, a legal analysis will guarantee compliance of this platform with the security and privacy French and UE regulation, which firmly promotes the Privacy by Design principle, including the current reforms of personal data regulation.
Partners: DANTE (U. of Versailles), PETRUS (Inria-UVSQ).
The role of individuals and the control of their data is a central issue in the new European regulation (GDPR) enforced on 25th May 2018. Data portability is a new right provided under those regulations. It allows citizens to retrieve their personal data from the companies and governmental agencies that collected them, in an interoperable digital format. The goals are to enable the individual to get out of a captive ecosystem, and to favor the development of innovative personal data services beyond the existing monopolistic positions. The consequence of this new right is the design and deployment of technical platforms, commonly known as Personal Cloud. But personal cloud architectures are very diverse, ranging from cloud based solutions where millions of personal cloud are managed centrally, to self-hosting solutions. These diversity is not neutral both in terms of security and from the point of view of the chain of liabilities. The GDP-ERE project tends to study those issues in an interdisciplinary approach by the involvement of jurists and computers scientists. The two main objectives are (i) to analyze the effects of the personal cloud architectures on legal liabilities, enlightened by the analysis of the rules provided under the GDPR and (ii) to propose legal and technological evolutions to highlight the share of liability between each relevant party and create adapted tools to endorse those liabilities.
Partners: Inria (PETRUS).
The objective of the Prevadom project is to integrate IoT facilities and machine learning algorithms into the DomYcile PlugDB-enabled boxes to detect and ideally prevent loneliness and despair situations. Such situations are peculiarly alarming during periods where patients are confined at home (e.g., during a pandemy). To this end, the PlugDB-enabled boxes will be augmented with IoT communication facilities in order to collect data from sensors (e.g., light, temperature, hygrometry, motion) and quantified-self devices (e.g., thermometer, oxymeter, blood pressure). Such raw data are highly intrusive. The objective is then to store them and analyse them with machine learning models locally (i.e., inside the box) and only externalize aggregated data or launch alarms when required.