Scool is a machine learning (ML) research group. Scool's research focuses on the study of the sequential decision making under uncertainty problem (SDMUP). In particular, we consider bandit problems 50 and the reinforcement learning (RL) problem 49. In a simplified way, RL considers the problem of learning an optimal policy in a Markov Decision Problem (MDP) 47; when the set of states collapses to a single state, this is known as the bandit problem which focuses on the exploration/exploitation problem.

Bandit and RL problems are interesting to study on their own; both types of problems share a number of fundamental issues (convergence analysis, sample complexity, representation, safety, etc.); both problems have real applications, different though closely related; the fact that while solving an RL problem, one faces an exploration/exploitation problem and has to solve a bandit problem in each state connects the two types of problems very intimately.

In our work, we also consider settings going beyond the Markovian assumption, in particular non-stationary settings, which represents a challenge common to bandits and RL. A distinctive aspect of the SDMUP with regards to the rest of the field of ML is that the learning problem takes place within a closed-loop interaction between a learning agent and its environment. This feedback loop makes our field of research very different from the two other sub-fields of ML, supervised and unsupervised learning, even when they are defined in an incremental setting. Hence, SDMUP combines ML with control: the learner is not passive, the learner acts on its environment, and learns from the consequences of these interactions; hence, the learner can act in order to obtain information from the environment. Naturally, the optimal control community is getting more and more interested by RL (see e.g. 48).

We wish to go on, studying applied questions and developing theory to come up with sound approaches to the practical resolution of SDMUP tasks, and guide their resolution. Non-stationary environments are a particularly interesting setting; we are studying this setting and developing new tools to approach it in a sound way, in order to have algorithms to detect environment changes as fast as possible, and as reliably as possible, adapt to them, and prove their behavior, in terms of their performance, measured with the regret for instance. We mostly consider non parametric statistical models, that is models in which the number of parameters is not fixed (a parameter may be of any type: a scalar, a vector, a function, etc.), so that the model can adapt along learning, and to its changing environment; this also lets the algorithm learn a representation that fits its environment.

Our research is mostly dealing with bandit problems, and reinforcement learning problems. We investigate each thread separately and also in combination, since the management of the exploration/exploitation trade-off is a major issue in reinforcement learning.

On bandit problems, we focus on:

Regarding reinforcement learning, we focus on:

Beyond these objectives, we put a particular emphasis on the study of non-stationary environments. Another area of great concern is the combination of symbolic methods with numerical methods, be it to provide knowledge to the learning algorithm to improve its learning curve, or to better understand what the algorithm has learned and explain its behavior, or to rely on causality rather than on mere correlation.

We also put a particular emphasis on real applications and how to deal with their constraints: lack of a simulator, difficulty to have a realistic model of the problem, small amount of data, dealing with risks, availability of expert knowledge on the task.

Scool has 2 main topics of application:

In each of these two domains, we put forward the investigation and the application of the idea of sequential decision making under uncertainty. Though supervised and non supervised learning have already been studied and applied extensively, sequential decision making remains far less studied; bandits have already been used in many applications of e-commerce (e.g. for computational advertising and recommendation systems). However, in applications where human beings may be severely impacted, bandits and reinforcement learning have not been studied much; moreover, these applications come along with a scarcity of data, and the non availability of a simulator, which prevents heavy computational simulations to come up with safe automatic decision making.

In 2022, in health, we investigate patient follow-up with Prof. F. Pattou's research group (CHU Lille, Inserm, Université de Lille) in project B4H. This effort comes along with investigating how we may use medical data available locally at CHU Lille, and also the national social security data. We also investigate drug repurposing with Prof. A. Delahaye-Duriez (Inserm, Université de Paris) in project Repos. We also study catheter control by way of reinforcement learning with Inria Lille group Defrost, and company Robocath (Rouen).

Regarding sustainable development, we have a set of projects and collaborations regarding agriculture and gardening. With Cirad and CGIAR, we investigate how one may recommend agricultural practices to farmers in developing countries. Through an associate team with Bihar Agriculture University (India), we investigate data collection. Inria exploratory action SR4SG concerns recommender systems at the level of individual gardens.

There are two important aspects that are amply shared by these two application fields. First, we consider that data collection is an active task: we do not passively observe and record data, we design methods and algorithms to search for useful data. This idea is exploited in most of these works oriented towards applications. Second, many of these projects include a careful management of risks for human beings. We have to take decisions taking care of their consequences on human beings, on eco-systems and life more generally.

Sustainable development is a major field of research and application of Scool. We investigate what machine learning can bring to sustainable development, identifiying challenges and obstacles, and studying how to overcome them.

Let us mention here:

More details can be found in Section 4.

The “Weight Tracjectory Predictor” algorithm is part of a larger project, whose goal is to leverage artificial intelligence techniques to improve patient care. This code is the product of a collaboration between Inria SCOOL and the UMR 1190-EGID team of the CHU Lille. It aims to predict the weight loss trajectory of a patient following bariatric surgery (treatment of severe obesity) from a set of preoperative characteristics.

We performed a retrospective study of clinical data collected prospectively on patients with up to five years postoperative follow-up (ABOS cohort, CHU Lille) and trained a supervised model to predict the relative total weight loss (“%TWL”) of a patient 1, 3, 12, 24 and 60 months after surgery. This model consists in a decision tree, written in python, taking as input a selected subset of preoperative attributes (weight, height, type of intervention, age, presence or absence of type 2 diabetes or impaired glucose tolerance, diabetes duration, smoking habits) and returns an estimation of %TWL as well as a prediction interval based on the interquartile range of %TWL observed on similar patients. The predictions of this tool have been validated both internally and externally (on French and Dutch cohorts).

The goal of this software is to improve patient follow-up after bariatric surgery: - during preoperative visits, by providing clinicians with a quantitative tool to inform the patient regarding potential weight loss outcome. - during postoperative control visits, by comparing the predicted and realized weight trajectories, which may facilitate early detection of complications.

This software component will be embedded in a web app for ease of use.

Nov 2022: Déploiement de FarmGym sur gitlab, auparavant en développement interne par Odalric-Ambrym Maillard (2021-2022). Nov-Dec 2022: Organisation de compétition interne par Timothée Mathieu.

Printemps 2023: Arrivée de Brahim Driss sur le projet, pour mettre en place tests unitaires/fonctionnels et assister Odalric-Ambrym Maillard dans le développement de fonctionnalité.

We organize our research results in a set of categories. The main categories are: bandit problems, reinforcement learning problems, and applications.

An $\u03f5$-Best-Arm Identification Algorithm for Fixed-Confidence and Beyond, 25

We propose EB-TC

Optimistic PAC Reinforcement Learning: the Instance-Dependent View, 32

Optimistic algorithms have been extensively studied for regret minimization in episodic tabular Markov Decision Processes (MDPs), both from a minimax and an instance-dependent view. However, for the PAC RL problem, where the goal is to identify a near-optimal policy with high probability, little is known about their instance-dependent sample complexity. A negative result of Wagenmaker et al. (2022) suggests that optimistic sampling rules cannot be used to attain the (still elusive) optimal instance-dependent sample complexity. On the positive side, we provide the first instance-dependent bound for an optimistic algorithm for PAC RL, BPI-UCRL, for which only minimax guarantees were available (Kaufmann et al., 2021). While our bound features some minimal visitation probabilities, it also features a refined notion of sub-optimality gap compared to the value gaps that appear in prior work. Moreover, in MDPs with deterministic transitions, we show that BPI-UCRL is actually near instance-optimal (up to a factor of the horizon). On the technical side, our analysis is very simple thanks to a new ”target trick” of independent interest. We complement these findings with a novel hardness result explaining why the instance-dependent complexity of PAC RL cannot be easily related to that of regret minimization, unlike in the minimax regime.

Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration & Planning, 30

We study the problem of episodic reinforcement learning in continuous stateaction spaces with unknown rewards and transitions. Specifically, we consider the setting where the rewards and transitions are modeled using parametric bilinear exponential families. We propose an algorithm, BEF-RLSVI, that a) uses penalized maximum likelihood estimators to learn the unknown parameters, b) injects a calibrated Gaussian noise in the parameter of rewards to ensure exploration, and c) leverages linearity of the exponential family with respect to an underlying RKHS to perform tractable planning. We further provide a frequentist regret analysis of BEF-RLSVI that yields an upper bound of

Bregman Deviations of Generic Exponential Families, 21

We revisit the method of mixture technique, also known as the Laplace method, to study the concentration phenomenon in generic exponential families. Combining the properties of Bregman divergence associated with log-partition function of the family with the method of mixtures for super-martingales, we establish a generic bound controlling the Bregman divergence between the parameter of the family and a finite sample estimate of the parameter. Our bound is time-uniform and makes appear a quantity extending the classical information gain to exponential families, which we call the Bregman information gain. For the practitioner, we instantiate this novel bound to several classical families, e.g., Gaussian, Bernoulli, Exponential, Weibull, Pareto, Poisson and Chi-square yielding explicit forms of the confidence sets and the Bregman information gain. We further numerically compare the resulting confidence bounds to state-of-the-art alternatives for time-uniform concentration and show that this novel method yields competitive results. Finally, we highlight the benefit of our concentration bounds on some illustrative applications.

Active Coverage for PAC Reinforcement Learning, 15

Collecting and leveraging data with good coverage properties plays a crucial role in different aspects of reinforcement learning (RL), including reward-free exploration and offline learning. However, the notion of ”good coverage” really depends on the application at hand, as data suitable for one context may not be so for another. In this paper, we formalize the problem of active coverage in episodic Markov decision processes (MDPs), where the goal is to interact with the environment so as to fulfill given sampling requirements. This framework is sufficiently flexible to specify any desired coverage property, making it applicable to any problem that involves online exploration. Our main contribution is an instance-dependent lower bound on the sample complexity of active coverage and a simple game-theoretic algorithm, COVGAME, that nearly matches it. We then show that COVGAME can be used as a building block to solve different PAC RL tasks. In particular, we obtain a simple algorithm for PAC reward-free exploration with an instance-dependent sample complexity that, in certain MDPs which are ”easy to explore”, is lower than the minimax one. By further coupling this exploration algorithm with a new technique to do implicit eliminations in policy space, we obtain a computationally-efficient algorithm for best-policy identification whose instance-dependent sample complexity scales with gaps between policy values.

On the Existence of a Complexity in Fixed Budget Bandit Identification, 23

In fixed budget bandit identification, an algorithm sequentially observes samples from several distributions up to a given final time. It then answers a query about the set of distributions. A good algorithm will have a small probability of error. While that probability decreases exponentially with the final time, the best attainable rate is not known precisely for most identification tasks. We show that if a fixed budget task admits a complexity, defined as a lower bound on the probability of error which is attained by the same algorithm on all bandit problems, then that complexity is determined by the best non-adaptive sampling procedure for that problem. We show that there is no such complexity for several fixed budget identification tasks including Bernoulli best arm identification with two arms: there is no single algorithm that attains everywhere the best possible rate.

Non-Asymptotic Analysis of a UCB-based Top Two Algorithm, 27

A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms, a leader and a challenger. Due to their simplicity and good empirical performance, they have received increased attention in recent years. For fixed-confidence best arm identification, theoretical guarantees for Top Two methods have only been obtained in the asymptotic regime, when the error level vanishes. We derive the first non-asymptotic upper bound on the expected sample complexity of a Top Two algorithm holding for any error level. Our analysis highlights sufficient properties for a regret minimization algorithm to be used as leader. They are satisfied by the UCB algorithm and our proposed UCB-based Top Two algorithm enjoys simultaneously non-asymptotic guarantees and competitive empirical performance.

Fast Asymptotically Optimal Algorithms for Non-Parametric Stochastic Bandits, 18

We consider the problem of regret minimization in non-parametric stochastic bandits. When the rewards are known to be bounded from above, there exists asymptotically optimal algorithms, with asymptotic regret depending on an infimum of Kullback-Leibler divergences (KL). These algorithms are computationally expensive and require storing all past rewards, thus simpler but non-optimal algorithms are often used instead. We introduce several methods to approximate the infimum KL which reduce drastically the computational and memory costs of existing optimal algorithms, while keeping their regret guaranties. We apply our findings to design new variants of the MED and IMED algorithms, and demonstrate their interest with extensive numerical simulations.

CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption, 40

We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions. Similar to the classical setup, the agent receives rewards generated independently from the distribution of the arm chosen at each time. However, these rewards are not directly observed. Instead, with a fixed

Risk-aware linear bandits with convex loss, 31

In decision-making problems such as the multi-armed bandit, an agent learns sequentially by optimizing a certain feedback. While the mean reward criterion has been extensively studied, other measures that reflect an aversion to adverse outcomes, such as mean-variance or conditional value-at-risk (CVaR), can be of interest for critical applications (healthcare, agriculture). Algorithms have been proposed for such risk-aware measures under bandit feedback without contextual information. In this work, we study contextual bandits where such risk measures can be elicited as linear functions of the contexts through the minimization of a convex loss. A typical example that fits within this framework is the expectile measure, which is obtained as the solution of an asymmetric least-square problem. Using the method of mixtures for supermartingales, we derive confidence sequences for the estimation of such risk measures. We then propose an optimistic UCB algorithm to learn optimal risk-aware actions, with regret guarantees similar to those of generalized linear bandits. This approach requires solving a convex problem at each round of the algorithm, which we can relax by allowing only approximated solution obtained by online gradient descent, at the cost of slightly higher regret. We conclude by evaluating the resulting algorithms on numerical experiments.

On the Complexity of Differentially Private Best-Arm Identification with Fixed Confidence, 17

Best Arm Identification (BAI) problems are progressively used for data-sensitive applications, such as designing adaptive clinical trials, tuning hyper-parameters, and conducting user studies to name a few. Motivated by the data privacy concerns invoked by these applications, we study the problem of BAI with fixed confidence under

Interactive and Concentrated Differential Privacy for Bandits, 16

Bandits play a crucial role in interactive learning schemes and modern recommender systems. However, these systems often rely on sensitive user data, making privacy a critical concern. This paper investigates privacy in bandits with a trusted centralized decision-maker through the lens of interactive Differential Privacy (DP). While bandits under pure

Dealing with Unknown Variances in Best-Arm Identification, 26

The problem of identifying the best arm among a collection of items having Gaussian rewards distribution is well understood when the variances are known. Despite its practical relevance for many applications, few works studied it for unknown variances. In this paper we introduce and analyze two approaches to deal with unknown variances, either by plugging in the empirical variance or by adapting the transportation costs. In order to calibrate our two stopping rules, we derive new time-uniform concentration inequalities, which are of independent interest. Then, we illustrate the theoretical and empirical performances of our two sampling rule wrappers on Track-and-Stop and on a Top Two algorithm. Moreover, by quantifying the impact on the sample complexity of not knowing the variances, we reveal that it is rather small.

Soft Action Priors: Towards Robust Policy Transfer, 20

Despite success in many challenging problems, reinforcement learning (RL) is still confronted with sample inefficiency, which can be mitigated by introducing prior knowledge to agents. However, many transfer techniques in reinforcement learning make the limiting assumption that the teacher is an expert. In this paper, we use the action prior from the Reinforcement Learning as Inference framework (Levine 2018)-that is, a distribution over actions at each state which resembles a teacher policy, rather than a Bayesian prior-to recover state-of-the-art policy distillation techniques. Then, we propose a class of adaptive methods that can robustly exploit action priors by combining reward shaping and auxiliary regularization losses. In contrast to prior work, we develop algorithms for leveraging suboptimal action priors that may nevertheless impart valuable knowledge-which we call soft action priors. The proposed algorithms adapt by adjusting the strength of teacher feedback according to an estimate of the teacher's usefulness in each state. We perform tabular experiments, which show that the proposed methods achieve state-of-the-art performance, surpassing it when learning from suboptimal priors. Finally, we demonstrate the robustness of the adaptive algorithms in continuous action deep RL problems, in which adaptive algorithms considerably improved stability when compared to existing policy distillation methods.

Pure Exploration in Bandits with Linear Constraints, 19

We address the problem of identifying the optimal policy with a fixed confidence level in a multi-armed bandit setup, when the arms are subject to linear constraints. Unlike the standard best-arm identification problem which is well studied, the optimal policy in this case may not be deterministic and could mix between several arms. This changes the geometry of the problem which we characterize via an information-theoretic lower bound. We introduce two asymptotically optimal algorithms for this setting, one based on the Track-and-Stop method and the other based on a game-theoretic approach. Both these algorithms try to track an optimal allocation based on the lower bound and computed by a weighted projection onto the boundary of a normal cone. Finally, we provide empirical results that validate our bounds and visualize how constraints change the hardness of the problem.

Online Instrumental Variable Regression: Regret Analysis and Bandit Feedback, 42

The independence of noise and covariates is a standard assumption in online linear regression with unbounded noise and linear bandit literature. This assumption and the following analysis are invalid in the case of endogeneity, i.e., when the noise and covariates are correlated. In this paper, we study the online setting of Instrumental Variable (IV) regression, which is widely used in economics to identify the underlying model from an endogenous dataset. Specifically, we upper bound the identification and oracle regrets of the popular Two-Stage Least Squares (2SLS) approach to IV regression but in the online setting. Our analysis shows that Online 2SLS (O2SLS) achieves

Farm-gym: A modular reinforcement learning platform for stochastic agronomic games, 36

We introduce Farm-gym, an open-source farming environment written in Python, that models sequential decisionmaking in farms using Reinforcement Learning (RL). Farm-gym conceptualizes a farm as a dynamical system with many interacting entities. Leveraging a modular design, it enables us to instantiate from very simple to highly complicated environments. Contrasting many available gym environments, Farm-gym features intrinsically stochastic games, using stochastic growth models and weather data. Further, it enables to create farm games in a modular way, activating or not the entities (e.g. weeds, pests, pollinators), and yielding non-trivial coupled dynamics. Finally, every game can be customized with .yaml files for rewards, feasible actions, and initial/end-game conditions. We illustrate some interesting features on simple farms. We also showcase the challenges posed by Farm-gym to the deep RL algorithms, in order to stimulate studies in the RL community.

Learning crop management by reinforcement: gym-DSSAT, 35

We introduce gym-DSSAT, a gym environment for crop management tasks, that is easy to use for training Reinforcement Learning (RL) agents. gym-DSSAT is based on DSSAT, a state-of-the-art mechanistic crop growth simulator. We modify DSSAT so that an external software agent can interact with it to control the actions performed in a crop field during a growing season. The RL environment provides predefined decision problems without having to manipulate the complex crop simulator. We report encouraging preliminary results on a use case of nitrogen fertilization for maize. This work opens up opportunities to explore new sustainable crop management strategies with RL, and provides RL researchers with an original set of challenging tasks to investigate.

Adaptive Algorithms for Relaxed Pareto Set Identification, 29

In this paper we revisit the fixed-confidence identification of the Pareto optimal set in a multi-objective multi-armed bandit model. As the sample complexity to identify the exact Pareto set can be very large, a relaxation allowing to output some additional near-optimal arms has been studied. In this work we also tackle alternative relaxations that allow instead to identify a relevant subset of the Pareto set. Notably, we propose a single sampling strategy, called Adaptive Pareto Exploration, that can be used in conjunction with different stopping rules to take into account different relaxations of the Pareto Set Identification problem. We analyze the sample complexity of these different combinations, quantifying in particular the reduction in sample complexity that occurs when one seeks to identify at most k Pareto optimal arms. We showcase the good practical performance of Adaptive Pareto Exploration on a real-world scenario, in which we adaptively explore several vaccination strategies against Covid-19 in order to find the optimal ones when multiple immunogenicity criteria are taken into account.

Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer, 43

In this paper, we study the problem of transferring the available Markov Decision Process (MDP) models to learn and plan efficiently in an unknown but similar MDP. We refer to it as Model Transfer Reinforcement Learning (MTRL) problem. First, we formulate MTRL for discrete MDPs and Linear Quadratic Regulators (LQRs) with continuous state actions. Then, we propose a generic two-stage algorithm, MLEMTRL, to address the MTRL problem in discrete and continuous settings. In the first stage, MLEMTRL uses a constrained Maximum Likelihood Estimation (MLE)-based approach to estimate the target MDP model using a set of known MDP models. In the second stage, using the estimated target MDP model, MLEMTRL deploys a model-based planning algorithm appropriate for the MDP class. Theoretically, we prove worst-case regret bounds for MLEMTRL both in realisable and non-realisable settings. We empirically demonstrate that MLEMTRL allows faster learning in new MDPs than learning from scratch and achieves near-optimal performance depending on the similarity of the available MDPs and the target MDP.

AdaStop: sequential testing for efficient and reliable comparisons of Deep RL Agents, 45

The reproducibility of many experimental results in Deep Reinforcement Learning (RL) is under question. To solve this reproducibility crisis, we propose a theoretically sound methodology to compare multiple Deep RL algorithms. The performance of one execution of a Deep RL algorithm is random so that independent executions are needed to assess it precisely. When comparing several RL algorithms, a major question is how many executions must be made and how can we assure that the results of such a comparison is theoretically sound. Researchers in Deep RL often use less than 5 independent executions to compare algorithms: we claim that this is not enough in general. Moreover, when comparing several algorithms at once, the error of each comparison accumulates and must be taken into account with a multiple tests procedure to preserve low error guarantees. To address this problem in a statistically sound way, we introduce AdaStop, a new statistical test based on multiple group sequential tests. When comparing algorithms, AdaStop adapts the number of executions to stop as early as possible while ensuring that we have enough information to distinguish algorithms that perform better than the others in a statistical significant way. We prove both theoretically and empirically that AdaStop has a low probability of making an error (Family-Wise Error). Finally, we illustrate the effectiveness of AdaStop in multiple use-cases, including toy examples and difficult cases such as Mujoco environments.

Impact of Robotic Assistance on Complications in Bariatric Surgery at Expert Laparoscopic Surgery Centers: A Retrospective Comparative Study With Propensity Score, 12

Objective: To investigate the way robotic assistance affected rate of complications in bariatric surgery at expert robotic and laparoscopic surgery facilities. Background: While the benefits of robotic assistance were established at the beginning of surgical training, there is limited data on the robot's influence on experienced bariatric laparoscopic surgeons. Methods: We conducted a retrospective study using the BRO clinical database (2008–2022) collecting data of patients operated on in expert centers. We compared the serious complication rate (defined as a Clavien score

Development and validation of an interpretable machine learning-based calculator for predicting 5-year weight trajectories after bariatric surgery: a multinational retrospective cohort SOPHIA study, 14

Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participants (aged

Elbow trauma in children: development and evaluation of radiological artificial intelligence models, 13

Rationale and Objectives: To develop a model using artificial intelligence (A.I.) able to detect post-traumatic injuries on pediatric elbow X-rays then to evaluate its performances in silico and its impact on radiologists' interpretation in clinical practice. Material and Methods: A total of 1956 pediatric elbow radiographs performed following a trauma were retrospectively collected from 935 patients aged between 0 and 18 years. Deep convolutional neural networks were trained on these X-rays. The two best models were selected then evaluated on an external test set involving 120 patients, whose X-rays were performed on a different radiological equipment in another time period. Eight radiologists interpreted this external test set without then with the help of the A.I. models. Results: Two models stood out: model 1 had an accuracy of 95.8% and an AUROC of 0.983 and model 2 had an accuracy of 90.5% and an AUROC of 0.975. On the external test set, model 1 kept a good accuracy of 82.5% and AUROC of 0.916 while model 2 had a loss of accuracy down to 69.2% and of AUROC to 0.793. Model 1 significantly improved radiologist's sensitivity (0.82 to 0.88, P = 0.016) and accuracy (0.86 to 0.88, P = 0,047) while model 2 significantly decreased specificity of readers (0.86 to 0.83, P = 0.031). Conclusion: End-to-end development of a deep learning model to assess post-traumatic injuries on elbow Xray in children was feasible and showed that models with close metrics in silico can unpredictably lead radiologists to either improve or lower their performances in clinical settings.

Reinforcement-learning robotic sailboats: simulator and preliminary results, 33

This work focuses on the main challenges and problems in developing a virtual oceanic environment reproducing real experiments using Unmanned Surface Vehicles (USV) digital twins. We introduce the key features for building virtual worlds, considering using Reinforcement Learning (RL) agents for autonomous navigation and control. With this in mind, the main problems concern the definition of the simulation equations (physics and mathematics), their effective implementation, and how to include strategies for simulated control and perception (sensors) to be used with RL. We present the modeling, implementation steps, and challenges required to create a functional digital twin based on a real robotic sailing vessel. The application is immediate for developing navigation algorithms based on RL to be applied on real boats.

General System Architecture and COTS Prototyping of an AIoT-Enabled Sailboat for Autonomous Aquatic Ecosystem Monitoring, 11

Unmanned vehicles keep growing attention as they facilitate innovative commercial and civil applications within the Internet of Things (IoT) realm. In this context, autonomous sailing boats are becoming important marine platforms for performing different tasks, such as surveillance, water, and environmental monitoring. Most of these tasks heavily depend on artificial intelligence (AI) technologies, such as visual navigation and path planning, and comprise the so-called Artificial Intelligence of Things (AIoT). In this paper, we propose (i) the OpenBoat, an automating system architecture for AIoTenabled sailboats with application-agnostic autonomous environment monitoring capability, and (ii) the F-Boat, a fully functional prototype of OpenBoat built with Commercial Off-The-Shelf (COTS) components on a real sailboat. F-Boat includes low-level control strategies for autonomous path following, communication infrastructure for remote operation and cooperation with other systems, edge computing with AI accelerator, modular support for application-specific monitoring systems, and navigation aspects. F-Boat is also designed and built for robustness situations to guarantee its operation under extreme events, such as high temperatures and bad weather, through extended periods of time. We show the results of field experiments running in Guanabara Bay, an important aquatic ecosystem in Brazil, that demonstrate the functionalities of the prototype and demonstrate the AIoT capability of the proposed architecture.

A Formalization of Doob's Martingale Convergence Theorems in mathlib, 34

We present the formalization of Doob's martingale convergence theorems in the mathlib library for the Lean theorem prover. These theorems give conditions under which (sub)martingales converge, almost everywhere or in

Optimal Interpretability-Performance Trade-off of Classification Trees with Black-Box Reinforcement Learning, 44

Interpretability of AI models allows for user safety checks to build trust in these models. In particular, decision trees (DTs) provide a global view on the learned model and clearly outlines the role of the features that are critical to classify a given data. However, interpretability is hindered if the DT is too large. To learn compact trees, a Reinforcement Learning (RL) framework has been recently proposed to explore the space of DTs. A given supervised classification task is modeled as a Markov decision problem (MDP) and then augmented with additional actions that gather information about the features, equivalent to building a DT. By appropriately penalizing these actions, the RL agent learns to optimally trade-off size and performance of a DT. However, to do so, this RL agent has to solve a partially observable MDP. The main contribution of this paper is to prove that it is sufficient to solve a fully observable problem to learn a DT optimizing the interpretability-performance trade-off. As such any planning or RL algorithm can be used. We demonstrate the effectiveness of this approach on a set of classical supervised classification datasets and compare our approach with other interpretability-performance optimizing methods.

“How Biased are Your Features?": Computing Fairness Influence Functions with Global Sensitivity Analysis, 24

Fairness in machine learning has attained significant focus due to the widespread application in high-stake decision-making tasks. Unregulated machine learning classifiers can exhibit bias towards certain demographic groups in data, thus the quantification and mitigation of classifier bias is a central concern in fairness in machine learning. In this paper, we aim to quantify the influence of different features in a dataset on the bias of a classifier. To do this, we introduce the Fairness Influence Function (FIF). This function breaks down bias into its components among individual features and the intersection of multiple features. The key idea is to represent existing group fairness metrics as the difference of the scaled conditional variances in the classifier’s prediction and apply a decomposition of variance according to global sensitivity analysis. To estimate FIFs, we instantiate an algorithm that applies variance decomposition of classifier’s prediction following local regression. Experiments demonstrate that captures FIFs of individual feature and intersectional features, provides a better approximation of bias based on FIFs, demonstrates higher correlation of FIFs with fairness interventions, and detects changes in bias due to fairness affirmative/punitive actions in the classifier. The code is available online.

From Noisy Fixed-Point Iterations to Private ADMM for Centralized and Federated Learning, 22

We study differentially private (DP) machine learning algorithms as instances of noisy fixed-point iterations, in order to derive privacy and utility results from this well-studied framework. We show that this new perspective recovers popular private gradient-based methods like DP-SGD and provides a principled way to design and analyze new private optimization algorithms in a flexible manner. Focusing on the widely-used Alternating Directions Method of Multipliers (ADMM) method, we use our general framework to derive novel private ADMM algorithms for centralized, federated and fully decentralized learning. For these three algorithms, we establish strong privacy guarantees leveraging privacy amplification by iteration and by subsampling. Finally, we provide utility guarantees using a unified analysis that exploits a recent linear convergence result for noisy fixed-point iterations.

Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack using Public Data, 28

We study design of black-box model extraction attacks that can send minimal number of queries from a publicly available dataset to a target ML model through a predictive API with an aim to create an informative and distributionally equivalent replica of the target. First, we define distributionally equivalent and Max-Information model extraction attacks, and reduce them into a variational optimisation problem. The attacker sequentially solves this optimisation problem to select the most informative queries that simultaneously maximise the entropy and reduce the mismatch between the target and the stolen models. This leads to an active sampling-based query selection algorithm, Marich, which is model-oblivious. Then, we evaluate Marich on different text and image data sets, and different models, including CNNs and BERT. Marich extracts models that achieve

International collaborators:

Scool is involved in 4 ANR projects:

Scool is involved in 2 PEPR:

Agroecology necessarily involves crop diversification, but also the early detection of diseases, deficiencies and stresses (hydric, etc.), as well as better management of biodiversity. The main stumbling block is that this paradigm shift in agricultural practices requires expert skills in botany, plant pathology and ecology that are not generally available to those working in the field, such as farmers or agri-food technicians. Digital technologies, and artificial intelligence in particular, can play a crucial role in overcoming this barrier to access to knowledge.

The aim of the Pl@ntAgroEco project will be to design, experiment with and develop new high-impact agro-ecology services within the Pl@ntNet platform. This includes :

Ce programme de travail a pour but de produire une amélioration de la détection et reconnaissance des maladies végétales, de l’identification des niveaux infraspécifiques. Il permettra le développement d’outils d’estimation de la sévérité des symptômes, carences, stades de déclin et stress hydrique ou de caractérisation des associations d’espèces à partir d’images multi-spécimens. Il améliorera la connaissance des espèces.

Le projet Pl@ntAgroEco rassemble des forces complémentaires en matière de recherche, de développement et d'animation. S’ajouteront à l'équipe pluridisciplinaire chargée de la plateforme Pl@ntNet de nouvelles forces de recherche ayant une expertise reconnue dans les sciences participatives. Le consortium rassemblera 10 partenaires incluant des organismes de recherche, des universités, des acteurs de la société civile et des partenaires internationaux

Scool has been involved in the Challenge HY_AIAI. In this challenge, we collaborated with L. Gallaraga, CR Inria Rennes, about the combination of statistical and symbolic approaches in machine learning.

Scool is involved in the Regalia pilot-project.

Other collaborations:

“Bandits for Health” project (B4H) funded by I-Site Lille. 2020–2023. PI: Ph. Preux.

Collaboration between Scool and Prof. F. Pattou Inserm Unit 1190/CHU de Lille. We investigate how the exploitation of data may be used to improve patient follow-up.

O.-A. Maillard and Ph. Preux are supported by an AI chair. 3/5 of this chair is funded by the Metropole Européenne de Lille, the other 2/5 by the Université de Lille and Inria, through the AI Ph.D. ANR program. 2020–2024.

This chair is dedicated to the advancement of research on reinforcement learning.

Ph. Preux is:

Ph. Preux is the scientific head of the CornelIA CPER project.

D. Basu was a member of the Ph.D. defense committee for:

E. Kaufmann was a member of the Ph.D. defense committees for:

E. Kaufmann was a reviewer of the HDR defense of Raphaël Feraud (Orange Labs, Université Paris-Saclay) who defended in Feburary.

O.-A. Maillard was a member of the Ph.D. defense committees for:

Ph. Preux was a member of the Ph.D. defense committees for:

O.-A. Maillard was interviewed for an article for Inria Demain, un compagnon d’aide à la décision pour l’agriculture ?

E. Kaufmann did 4 one hour sessions of “CHICHE: Un Scientifique, Une Classe” in the Lycée Kernanec, Marcq-en-Barœul.

Ph. Preux made a presentation on AI for a group of pupils at Collège M. Yourcenar, April, Marchiennes.

E. Kaufmann gave a presentation at the “Rencontre des Jeunes Mathématiciennes et Informaticiennes” (RJMI) organized at Inria. A. Tuynman (PhD student in Scool) organized a problem session at RJMI for a group a high school female students.

O.-A. Maillard gave a presentation at table ronde “Decision making in a uncertain environment” organized for all master students of graduate programs of Université de Lille, December.

Ph. Preux: