Valda's focus is on both foundational and systems aspects of
complex data management, especially human-centric data.
The data we are interested in is typically heterogeneous, massively
distributed, rapidly evolving, intensional, and often subjective,
possibly erroneous, imprecise, incomplete. In this setting, Valda is
in particular concerned with the optimization of complex resources
such as computer time and space, communication, monetary, and privacy
budgets. The goal is to extract value from data, beyond simple query answering.
Data management 40, 49 is now an old, well-established field, for which many scientific results and techniques have been accumulated since the sixties. Originally, most works dealt with static, homogeneous, and precise data. Later, works were devoted to heterogeneous data 3841, and possibly distributed 72 but at a small scale.
However, these classical techniques are poorly adapted to handle the new challenges of data management. Consider human-centric data, which is either produced by humans, e.g., emails, chats, recommendations, or produced by systems when dealing with humans, e.g., geolocation, business transactions, results of data analysis. When dealing with such data, and to accomplish any task to extract value from such data, we rapidly encounter the following facets:
These problems have already been studied individually and have led to
techniques such as
query rewriting62 or
distributed query
optimization68.
Among all these aspects, intensionality is perhaps the one that has least been studied, so we pay particular attention to it. Consider a user's query, taken in a very broad sense: it may be a classical database query, some information retrieval search, a clustering or classification task, or some more advanced knowledge extraction request. Because of intensionality of data, solving such a query is a typically dynamic task: each time new data is obtained, the partial knowledge a system has of the world is revised, and query plans need to be updated, as in adaptive query processing 55 or aggregated search 80. The system then needs to decide, based on this partial knowledge, of the best next access to perform. This is reminiscent of the central problem of reinforcement learning 78 (train an agent to accomplish a task in a partially known world based on rewards obtained) and of active learning 74 (decide which action to perform next in order to optimize a learning strategy) and we intend to explore this connection further.
Uncertainty of the data interacts with its intensionality: efforts are required to obtain more precise, more complete, sounder
results, which yields a trade-off between processing cost and
data quality.
Other aspects, such as heterogeneity and massive distribution, are of major importance as well. A standard data management task, such as query answering, information retrieval, or clustering, may become much more challenging when taking into account the fact that data is not available in a central location, or in a common format. We aim to take these aspects into account, to be able to apply our research to real-world applications.
We intend to tackle hard technical issues such as query answering, data integration,
data monitoring, verification of data-centric systems,
truth finding, knowledge extraction, data analytics, that take a different
flavor in this modern context. In particular, we are interested in
designing strategies to minimize data access cost towards a
specific goal, possibly a massive data analysis task. That cost
may be in terms of communication (accessing data in distributed
systems, on the Web), of computational resources (when data is
produced by complex tools such as information extraction, machine
learning systems, or complex query processing), of monetary budget
(paid-for application programming interfaces, crowdsourcing
platforms), or of a privacy budget (as in the standard framework of
differential privacy).
A number of data management tasks in Valda are inherently intractable. In addition to properly characterizing this intractability in terms of complexity theory, we intend to develop solutions for solving these tasks in practice, based on approximation strategies, randomized algorithms, enumeration algorithms with constant delay, or identification of restricted forms of data instances lowering the complexity of the task.
We now detail some of the scientific foundations of our research on complex data management. This is the occasion to review connections between data management, especially on complex data as is the focus of Valda, with related research areas.
Data management has been connected to logic since the advent of the relational model as main representation system for real-world data, and of first-order logic as the logical core of database querying languages 40. Since these early developments, logic has also been successfully used to capture a large variety of query modes, such as data aggregation 67, recursive queries (Datalog), or querying of XML databases 49. Logical formalisms facilitate reasoning about the expressiveness of a query language or about its complexity.
The main problem of interest in data management is that of query
evaluation, i.e., computing the results of a query over a database.
The complexity of this problem has far-reaching consequences.
For example, it is because first-order logic is in the data complexity, where the query is
considered to be fixed, from combined complexity, where both the
query and the data are considered to be part of the input. Thus, though
conjunctive queries, corresponding to a simple SELECT-FROM-WHERE fragment
of SQL, have PTIME data complexity, they are NP-hard in combined
complexity. Making this distinction is important, because data is often
far larger (up to the order of terabytes) than queries (rarely more than
a few hundred bytes). Beyond simple query evaluation, a central question
in data management remains
that of complexity; tools from algorithm analysis,
and complexity theory can be used to pinpoint the tractability frontier
of data management tasks.
Automata theory and formal languages arise as important
components of the study of many data management tasks: in temporal
databases 39, queries, expressed in temporal
logics, can often by compiled to automata; in graph
databases 45, queries are naturally given as
automata; typical query and schema languages for XML databases such as
XPath and XML Schema
can be compiled to tree automata 71, or for more
complex languages to data tree
automata 65. Another
reason of the importance of automata theory, and tree automata in
particular, comes from Courcelle's results 53
that show that very expressive queries (from the language of monadic
second-order language) can be evaluated as tree automata over tree
decompositions of the original databases, yielding linear-time
algorithms (in data complexity) for a wide variety of applications.
Complex data management also has connections
to verification and static analysis. Besides query evaluation, a central
problem in data management is that of deciding whether two queries are
equivalent40. This is critical
for query optimization, in order to determine
if the rewriting of a query, maybe cheaper to evaluate, will return
the same result as the original query. Equivalence can easily be seen to
be an instance of the problem of (non-)satisfiability:
The orchestration of distributed activities (under the responsibility of a conductor) and their choreography (when they are fully autonomous) are complex issues that are essential for a wide range of data management applications including notably, e-commerce systems, business processes, health-care and scientific workflows. The difficulty is to guarantee consistency or more generally, quality of service, and to statically verify critical properties of the system. Different approaches to workflow specifications exist: automata-based, logic-based, or predicate-based control of function calls 37.
To deal with the uncertainty attached to data, proper models need to
be used (such as attaching
provenance information to data items
and viewing the whole database as being
probabilistic) and
practical methods and systems need to be developed to both reliably
estimate the uncertainty in data items and properly manage provenance
and uncertainty information throughout a long, complex system.
The simplest model of data uncertainty is the NULLs of SQL databases,
also called Codd tables 40. This
representation system is too basic for any complex task, and has the
major inconvenient of not being closed under even simple queries or
updates. A solution to this has been proposed in the form of
conditional tables64 where every tuple is
annotated with a Boolean formula over independent Boolean random events.
This model has been recognized as foundational and extended in two
different directions: to more expressive models of provenance than
what Boolean functions capture, through a semiring
formalism 60, and to a
probabilistic formalism by assigning independent probabilities to the
Boolean events 61. These two extensions form the basis of
modern provenance and probability management, subsuming in a large way
previous works 52, 46. Research in the past
ten years has focused on a better understanding of the tractability of
query answering with provenance and probabilistic annotations, in a
variety of specializations of this
framework 7766, 43.
Statistical machine learning, and its applications to data mining and data analytics, is a major foundation of data management research. A large variety of research areas in complex data management, such as wrapper induction 73, crowdsourcing 44, focused crawling 59, or automatic database tuning 47 critically rely on machine learning techniques, such as classification 63, probabilistic models 58, or reinforcement learning 78.
Machine learning is also a rich source of complex data management problems: thus, the probabilities produced by a conditional random field 69 system result in probabilistic annotations that need to be properly modeled, stored, and queried.
Finally, complex data management also brings new twists to some classical
machine learning problems. Consider for instance the area of active
learning74, a subfield of machine
learning concerned with how to optimally use a (costly) oracle, in an
interactive manner, to label training data that will be used to build a
learning model, e.g., a classifier. In most of the active learning
literature, the cost model is very basic (uniform or fixed-value costs),
though some works 75 consider
more realistic costs. Also, oracles are usually assumed to be perfect
with only a few exceptions 56. These
assumptions usually break when applied to complex data management
problems on real-world data, such as crowdsourcing.
At the beginning of the Valda team, the project was to focus on the following directions:
We believe the first two directions have been followed in a satisfactory manner. The focus on personal information management has not been kept for various organizational reasons, however, but the third axis of the project is reoriented to more general aspects of Web data management.
New permanent arrivals in the group since its creation have impacted its research directions in the following manner:
We intend to keep producing leading research on the foundations of data management. Generally speaking, the goal is to investigate the borders of feasibility of various tasks. For instance, what are the assumptions on data that allow for computable problems? When is it not possible at all? When can we hope for efficient query answering, when is it hopeless? This is a problem of theoretical nature which is necessary for understanding the limit of the methods and driving research towards the scenarios where positive results may be obtainable. Only when we have understood the limitation of different methods and have many examples where this is possible, we can hope to design a solid foundation that allowing for a good trade-off between what can be done (needs from the users) and what can be achieved (limitation from the system).
Similarly, we will continue our work, both foundational and practical, on various aspects of provenance and uncertainty management. One overall long-term goal is to reach a full understanding of the interactions between query evaluation or other broader data management tasks and uncertain and annotated data models. We would in particular want to go towards a full classification of tractable (typically polynomial-time) and intractable (typically NP-hard for decision problems, or #P-hard for probability evaluation) tasks, extending and connecting the query-based dichotomy 54 on probabilistic query evaluation with the instance-based one of 42, 43. Another long-term goal is to consider more dynamic scenarios than what has been considered so far in the uncertain data management literature: when following a workflow, or when interacting with intensional data sources, how to properly represent and update uncertainty annotations that are associated with data. This is critical for many complex data management scenarios where one has to maintain a probabilistic current knowledge of the world, while obtaining new knowledge by posing queries and accessing data sources. Such intensional tasks requires minimizing jointly data uncertainty and cost to data access.
As application area, in addition to the historical focus on personal information management which is now less stressed, we target Web data (Web pages, the semantic Web, social networks, the deep Web, crowdsourcing platforms, etc.).
We aim at keeping a delicate balance between theoretical, foundational research, and systems research, including development and implementation. This is a difficult balance to find, especially since most Valda researchers have a tendency to favor theoretical work, but we believe it is also one of the strengths of the team.
We recall that Valda's focus is on human-centric data, i.e., data produced by humans, explicitly or implicitly, or more generally containing information about humans. Quite naturally, we have used as a privileged application area to validate Valda’s results that of personal information management systems (Pims for short) 36.
A Pims is a system that allows a user to integrate her own data, e.g., emails and other kinds of messages, calendar, contacts, web search, social network, travel information, work projects, etc. Such information is commonly spread across different services. The goal is to give back to a user the control on her information, allowing her to formulate queries such as “What kind of interaction did I have recently with Alice B.?”, “Where were my last ten business trips, and who helped me plan them?”. The system has to orchestrate queries to the various services (which means knowing the existence of these services, and how to interact with them), integrate information from them (which means having data models for this information and its representation in the services), e.g., align a GPS location of the user to a business address or place mentioned in an email, or an event in a calendar to some event in a Web search. This information must be accessed intensionally: for instance, costly information extraction tools should only be run on emails which seem relevant, perhaps identified by a less costly cursory analysis (this means, in turn, obtaining a cost model for access to the different services). Impacted people can be found by examining events in the user's calendar and determining who is likely to attend them, perhaps based on email exchanges or former events' participant lists. Of course, uncertainty has to be maintained along the entire process, and provenance information is needed to explain query results to the user (e.g., indicate which meetings and trips are relevant to each person of the output). Knowledge about services, their data models, their costs, need either to be provided by the system designer, or to be automatically learned from interaction with these services, as in 73.
One motivation for that choice is that Pims concentrate many of the problems we intend to investigate: heterogeneity (various sources, each with a different structure), massive distribution (information spread out over the Web, in numerous sources), rapid evolution (new data regularly added), intensionality (knowledge from Wikidata, OpenStreetMap...), confidentiality and security (mostly private data), and uncertainty (very variable quality). Though the data is distributed, its size is relatively modest; other applications may be considered for works focusing on processing data at large scale, which is a potential research direction within Valda, though not our main focus. Another strong motivation for the choice of Pims as application domain is the importance of this application from a societal viewpoint.
A Pims is essentially a system built on top of a user's personal
knowledge base; such knowledge bases are reminiscent of those found in
the Semantic Web, e.g., linked open data. Some issues, such as ontology
alignment 76 exist in both scenarios. However,
there are some fundamental differences in building personal knowledge
bases vs collecting information from the Semantic Web: first, the scope
is quite smaller, as one is only interested in knowledge related to a
given individual; second, a small proportion of the data is already present
in the form of semantic information, most needs to be extracted and
annotated through appropriate wrappers and enrichers; third, though the
linked open data is meant to be read-only, the only update possible to a
user being adding new triples, a personal knowledge base is very much
something that a user needs to be able to edit, and propagating updates
from the knowledge base to original data sources is a challenge in
itself.
The choice of Pims is not exclusive. We also consider other application areas as well. In particular, we have worked in the past and have a strong expertise on Web data 41 in a broad sense: semi-structured, structured, or unstructured content extracted from Web databases 73; knowledge bases from the Semantic Web 76; social networks 70; Web archives and Web crawls 57; Web applications and deep Web databases 50; crowdsourcing platforms 44. We intend to continue using Web data as a natural application domain for the research within Valda when relevant. For instance 48, deep Web databases are a natural application scenario for intensional data management issues: determining if a deep Web database contains some information requires optimizing the number of costly requests to that database.
A common aspect of both personal information and Web data is that their exploitation raises ethical considerations. Thus, a user needs to remain fully in control of the usage that is made of her personal information; a search engine or recommender system that ranks Web content for display to a specific user needs to do so in an unbiased, justifiable, manner. These ethical constraints sometimes forbid some technically solutions that may be technically useful, such as sharing a model learned from the personal data of a user to another user, or using blackboxes to rank query result. We fully intend to consider these ethical considerations within Valda. One of the main goals of a Pims is indeed to empower the user with a full control on the use of this data.
Data-driven algorithmic systems raise ethical and legal concerns, that need to be taken into account within research. Serge Abiteboul, with collaborators from NYU, U. Washington, U. Michigan, U. Amsterdam, wrote a position article detailing the role that data management research needs to play in ensuring responsible design and use of algorithmic data-driven systems. 17
Michaël Thomazo, together with Maxime Buron and Marie-Laure Mugnier, received the BDA (French database community) award for their work on Parallelisable Existential Rules: a Story of Pieces31, also published at KR 2021 31
The work of the Valda team in 2022 was affected by several issues within Inria; in particular major issues with the deployment of a new information system (Eksae) negatively impacted the work of our administrative assistant and made it impossible for the team leader to keep track of expenses.
The team also would like to thank the Inria evaluation committee for its admirable work in support of the research community, for its transparency, and for the integrity in which it conducts its activities.
dissem.in, the openly accessible platform for promoting full-text deposit of scientific articles of researchers, which is based on the dissem.in (7.1.5) software, has been maintained by Valda since 2021. Works on the platform in 2022, in addition to works on the base software, include updating information about journals and publisher policies from the Sherpa/Romeo API.
We present the results we obtained and published in 2022. Much research within Valda centers around the central problem of query answering in databases, but exploring various side questions: How to handle incomplete or inconsistent information? How to efficiently access query results when there are many of them? How to incorporate external ontologies within query answering? How to keep track of the provenance of queries? We describe our works in each of these areas in turn, and finish with other theoretical research conducted in the team, beyond data management.
We first consider databases containing incomplete (missing) or inconsistent (contradictory) information.
One of the most common scenarios of handling incomplete information occurs in relational databases. They describe incomplete knowledge with three truth values, using Kleene’s logic for propositional formulae and a rather peculiar extension to predicate calculus. This design by a committee from several decades ago is now part of the standard adopted by vendors of database management systems. But is it really the right way to handle incompleteness in propositional and predicate logics? Our goal in 13 is to answer this question. Using an epistemic approach, we first characterize possible levels of partial knowledge about propositions, which leads to six truth values. We impose rationality conditions on the semantics of the connectives of the propositional logic, and prove that Kleene’s logic is the maximal sublogic to which the standard optimization rules apply, thereby justifying this design choice. For extensions to predicate logic, however, we show that the additional truth values are not necessary: every many-valued extension of first-order logic over databases with incomplete information represented by null values is no more powerful than the usual two-valued logic with the standard Boolean interpretation of the connectives. We use this observation to analyze the logic underlying SQL query evaluation, and conclude that the many-valued extension for handling incompleteness does not add any expressiveness to it.
We continue on the topic of incomplete information in 18, where our goal is to collect and analyze the shortcomings of nulls and their treatment by SQL, and to re-evaluate existing research in this light. To this end, we designed and conducted a survey on the everyday usage of null values among database users. From the analysis of the results we reached two main conclusions. First, null values are ubiquitous and relevant in real-life scenarios, but SQL's features designed to deal with them cause multiple problems. The severity of these problems varies depending on the SQL features used, and they cannot be reduced to a single issue. Second, foundational research on nulls is misdirected and has been addressing problems of limited practical relevance. We urge the community to view the results of this survey as a way to broaden the spectrum of their researches and further bridge the theory-practice gap on null values.
To answer database queries over incomplete data the gold standard is finding certain answers: those that are true regardless of how incomplete data is interpreted. Such answers can be found efficiently for conjunctive queries and their unions, even in the presence of constraints such as keys or functional dependencies. With negation added, the complexity of finding certain answers becomes intractable however. In 28 we exhibit a well-behaved class of queries that extends unions of conjunctive queries with a limited form of negation and that permits efficient computation of certain answers even in the presence of constraints by means of rewriting into Datalog with negation. The class consists of queries that are the closure of conjunctive queries under Boolean operations of union, intersection and difference. We show that for these queries, certain answers can be expressed in Datalog with negation, even in the presence of functional dependencies, thus making them tractable in data complexity. We show that in general Datalog cannot be replaced by first-order logic, but without constraints such a rewriting can be done in first-order.
While all relational database systems are based on the bag data model, much of theoretical research still views relations as sets. Recent attempts to provide theoretical foundations for modern data management problems under the bag semantics concentrated on applications that need to deal with incomplete relations, i.e., relations populated by constants and nulls. Our goal in 12 is to provide a complete characterization of the complexity of query answering over such relations in fragments of bag relational algebra. The main challenges that we face are twofold. First, bag relational algebra has more operations than its set analog (e.g., additive union, max-union, min-intersection, duplicate elimination) and the relationship between various fragments is not fully known. Thus we first fill this gap. Second, we look at query answering over incomplete data, which again is more complex than in the set case: rather than certainty and possibility of answers, we now have numerical information about occurrences of tuples. We then fully classify the complexity of finding this information in all the fragments of bag relational algebra.
Finally, we turn to inconsistent data. In 19, 20, we investigate practical algorithms for inconsistency-tolerant query answering over prioritized knowledge bases, which consist of a logical theory, a set of facts, and a priority relation between conflicting facts. We consider three well-known semantics (AR, IAR and brave) based upon two notions of optimal repairs (Pareto and completion). Deciding whether a query answer holds under these semantics is (co)NP-complete in data complexity for a large class of logical theories, and SAT-based procedures have been devised for repair-based semantics when there is no priority relation, or the relation has a special structure. We introduce the first SAT encodings for Pareto- and completion-optimal repairs w.r.t. general priority relations and proposes several ways of employing existing and new encodings to compute answers under (optimal) repair-based semantics, by exploiting different reasoning modes of SAT solvers. The comprehensive experimental evaluation of our implementation compares both (i) the impact of adopting semantics based on different kinds of repairs, and (ii) the relative performances of alternative procedures for the same semantics.
Many queries have as output sets of results which are too big to be generated at once. Two strategies can then be used: either to design algorithms for efficient enumeration of the query results, one after the other, or for efficient direct access to one specific result among the set of results.
In 16, we consider the evaluation of first-order queries over classes of databases that are nowhere dense. The notion of nowhere-dense classes was introduced by Nešetřil and Ossona de Mendez as a formalization of classes of “sparse” graphs and generalizes many well-known classes of graphs, such as classes of bounded degree, bounded tree-width, or bounded expansion. It has recently been shown by Grohe, Kreutzer, and Siebertz that over nowhere-dense classes of databases, first-order sentences can be evaluated in pseudo-linear time (pseudo-linear time means that for all
A class of relational databases has low degree if for all
Finally, we consider in 25 the task of lexicographic direct access to query answers. That is, we want to simulate an array containing the answers of a join query sorted in a lexicographic order chosen by the user. A recent dichotomy showed for which queries and orders this task can be done in polylogarithmic access time after quasilinear preprocessing, but this dichotomy does not tell us how much time is required in the cases classified as hard. We determine the pre-processing time needed to achieve polylogarithmic access time for all self-join free queries and all lexicographical orders. To this end, we propose a decomposition-based general algorithm for direct access on join queries. We then explore its optimality by proving lower bounds for the preprocessing time based on the hardness of a certain online Set-Disjointness problem, which shows that our algorithm’s bounds are tight for all lexicographic orders on self-join free queries. Then, we prove the hardness of Set-Disjointness based on the Zero-Clique Conjecture which is an established conjecture from fine-grained complexity theory. We also show that similar techniques can be used to prove that, for enumerating answers to Loomis-Whitney joins, it is not possible to significantly improve upon trivially computing all answers at preprocessing. This, in turn, gives further evidence (based on the Zero-Clique Conjecture) to the enumeration hardness of self-join free cyclic joins with re-spect to linear preprocessing and constant delay.
We know consider cases where to answer a query, we need to take into account external knowledge given in the form of a logical ontology (e.g., described in description logics, or through existential rules).
While ontology-mediated query answering most often adopts (unions of) conjunctive queries as the query language, some recent works have explored the use of counting queries coupled with DL-Lite ontologies. The aim of 22, 21
is to extend the study of counting queries to Horn description logics outside the DL-Lite family. Through a combination of novel techniques, adaptations of existing constructions, and new connections to closed predicates, we achieve a complete picture of the data and combined complexity of answering counting conjunctive queries (CCQs) and cardinality queries (a restricted class of CCQs) in
Existential rules are a very popular ontology-mediated query language for which the chase represents a generic computational approach for query answering. It is straightforward that existential rule queries exhibiting chase termination are decidable and can only recognize properties that are preserved under homomorphisms. 24 is an extended abstract of our eponymous publication at KR 2021 where we show the converse: every decidable query that is closed under homomorphism can be expressed by an existential rule set for which the standard chase universally terminates. Membership in this fragment is not decidable, but we show via a diagonalisation argument that this is unavoidable.
In the literature, existential rules are often supposed to be in some normal form that simplifies technical developments. For instance, a common assumption is that rule heads are atomic, i.e., restricted to a single atom. Such assumptions are considered to be made without loss of generality as long as all sets of rules can be normalised while preserving entailment. However, an important question is whether the properties that ensure the decidability of reasoning are preserved as well. We provide in 26 a systematic study of the impact of these procedures on the different chase variants with respect to chase (non-)termination and FO-rewritability. This also leads us to study open problems related to chase termination of independent interest.
Data provenance consists in bookkeeping meta information during query evaluation, in order to enrich query results with their trust level, likelihood, evaluation cost, and more. The framework of semiring provenance abstracts from the specific kind of meta information that annotates the data.
While the definition of semiring provenance is uncontroversial for unions of conjunctive queries, the picture is less clear for Datalog. Indeed, the original definition might include infinite computations, and is not consistent with other proposals for Datalog semantics over annotated data. In 23, we propose and investigate several provenance semantics, based on different approaches for defining classical Datalog semantics. We study the relationship between these semantics, and introduce properties that allow us to analyze and compare them.
In 30, 33, we establish a translation between a formalism for dynamic programming over hypergraphs and the computation of semiring-based provenance for Datalog programs. The benefit of this translation is a new method for computing the provenance of Datalog programs for specific classes of semirings, which we apply to provenance-aware querying of graph databases. Theoretical results and practical optimizations lead to an efficient implementation using Soufflé, a state-of-the-art Datalog interpreter. Experimental results on real-world data suggest this approach to be efficient in practical contexts, competing with dedicated solutions for graphs.
Valda's research has always encompassed other foundational topics. We conclude with the description of other theoretical computer science works (namely, in algebraic automata theory and logic), which does not fit within the previous areas of research.
The program-over-monoid model of computation originates with Barrington's proof that the model captures the complexity class NC
When we bundle quantifiers and modalities together (as in
Leonid Libkin is involved in the standardization process of the GQL and SQL query languages. In particular, he is a chair of the LDBC working group on semantics of GQL, and a member of ISO/IEC JTC1 SC32 WG3 (SQL committee). He is also a member of INCITS, the US InterNational Committee for Information Technology Standards.
As part of this standardization effort, 27 presents the key elements of the graph pattern matching language at the core of both SQL/PGQ and GQL, in advance of the publication of the corresponding new standards.
Valda has strong collaborations with the following international groups:
A bilateral French–German ANR project, entitled EQUUS – Efficient Query answering Under UpdateS started in 2020. It involves CNRS (CRIL, CRIStAL, IMJ), Télécom Paris, HU Berlin, and Bayreuth University, in addition to Inria Valda.
Valda has been part of three national ANR projects in 2022:
Camille Bourgaux has been participating in the AI Chair of Meghyn Bienvenu on
INTENDED (Intelligent handling of imperfect data) since 2020.
Pierre Senellart has held a chair within the PR[AI]RIE institute for artificial intelligence in Paris since 2019.
Licence:
Databases, L3, École
normale supérieure – Leonid Libkin, Yann Ramusat
Pierre Senellart holds various teaching responsibilities (L3 internships, M1 projects, M2 administration, entrance competition) at ENS. Pierre Senellart is in the managing board of the graduate program. Leonid Libkin is co-responsible of the international entrance competition at ENS. Yann Ramusat was the secretary of the entrance competition at ENS for computer science. Michaël Thomazo is an adjunct professor at PSL.
Most members of the group are also involved in tutoring ENS students, advising them on their curriculum, their internships, etc. They are also occasionally involved with reviewing internship reports, supervising student projects, etc.
PhD completed: Yann Ramusat, Provenance-based routing in probabilistic graphs 33, 2018–2022, Silviu Maniu & Pierre Senellart
PhD in progress: Étienne Toussaint, Paolo Guagliardo & Leonid Libkin (as he is based in Edinburgh, he is not considered a Valda member)