2024Activity reportProject-TeamHEKA
RNSR: 202124127N- Research center Inria Paris Centre
- In partnership with:INSERM, Université Paris Cité
- Team name: Health data- and model- driven approaches for Knowledge Acquisition
- In collaboration with:CENTRE DE RECHERCHE DES CORDELIERS
- Domain:Digital Health, Biology and Earth
- Theme:Computational Neuroscience and Medicine
Keywords
Computer Science and Digital Science
- A3.3. Data and knowledge analysis
- A3.4. Machine learning and statistics
- A6.1. Methods in mathematical modeling
- A6.2. Scientific computing, Numerical Analysis & Optimization
- A9.1. Knowledge
- A9.2. Machine learning
- A9.4. Natural language processing
- A9.6. Decision support
Other Research Topics and Application Domains
- B2.2. Physiology and diseases
- B2.3. Epidemiology
- B2.6. Biological and medical imaging
1 Team members, visitors, external collaborators
Research Scientists
- Sarah Zohar [Team leader, INSERM, Senior Researcher, HDR]
- Armelle Arnoux [APHP]
- Adrien Coulet [INRIA, Associate Professor Detachement, HDR]
- Jean Feydy [INRIA, Researcher]
- Agathe Guilloux [INRIA, Professor Detachement, HDR]
- Claire Leconte Rives-Lange [INRIA, Advanced Research Position, until Sep 2024]
- Moreno Ursino [INSERM, Researcher, HDR]
Faculty Members
- Stephanie Allassonnière [UNIV PARIS CITE, Professor, HDR]
- François Angoulvant [UNIV PARIS CITE, Professor, Associate Member Detachment Univ de Lausanne, HDR]
- Tom Boeken [UNIV PARIS CITE - APHP, Associate Professor, PhD Student until June, and Faculty from Sep 2024]
- Anita Burgun [UNIV PARIS CITE - APHP, Professor, until Dec 2024, HDR]
- David Drummond [UNIV PARIS CITE - APHP, Associate Professor]
- Anne-Sophie Jannot [UNIV PARIS CITE - APHP, Associate Professor, HDR]
- Sandrine Katsahian [UNIV PARIS CITE - APHP, Professor, HDR]
- Andrea Lazzati [APHP, Associate Professor, HDR]
- Bastien Rance [UNIV PARIS CITE - APHP, Associate Professor, until Dec 2024, HDR]
- Brigitte Sabatier [APHP, Professor, HDR]
- Rosy Tsopra [UNIV PARIS CITE - APHP, Associate Professor, from Sept, and until Dec 2024]
Post-Doctoral Fellows
- Charbella Abou Khalil [INRIA, Post-Doctoral Fellow, from Oct 2024]
- Nadim Ballout [INRIA, Post-Doctoral Fellow]
- Sarah Berdot [UNIV PARIS CITE - APHP, Post-Doctoral Fellow]
- Sandrine Boulet [INSERM, Post-Doctoral Fellow]
- Perrine Chassat [INRIA, Post-Doctoral Fellow, from Oct 2024]
- Lucas Ducrot [INRIA, Post-Doctoral Fellow, from Jun 2024]
- Fleur Gaudfernau [UNIV PARIS - CITE, Post-Doctoral Fellow, until Jun 2024]
- Jong Ho Jhee [INRIA, Post-Doctoral Fellow]
- Letao Li [INSERM, Post-Doctoral Fellow, from Nov 2024]
- Robin Magnet [INRIA, Post-Doctoral Fellow, from Oct 2024]
- Germain Perrin [UNIV PARIS CITE - APHP, Post-Doctoral Fellow]
PhD Students
- Rachid Abbas [Roche Holding, until Jun 2024]
- Eya Abid [SORBONNE UNIVERSITE]
- Safa Alsaidi [INRIA]
- Jean-Baptiste Baitairian [SANOFI, CIFRE]
- Hadrien Bigo-Balland [MyFit Solutions, CIFRE, from Feb 2024]
- Théau Blanchard [GE HELTHCARE, CIFRE, from Jul 2024]
- Ilona Blanchard [INRIA, from Oct 2024]
- Linus Bleistein [UNIV EVRY]
- Pierre Clavier [LIX, until Oct 2024]
- Lea Comin [CEA]
- Aziliz Cottin [DASSAULT SYSTEMES, CIFRE, until Mar 2024]
- Charles De Ponthaud [UNIV PARIS CITE - APHP]
- Pierre Epron [INRIA, from Nov 2024]
- Thibaut Fabacher [UNIV STRASBOURG]
- Corentin Faujour [Cemka, CIFRE]
- Fabrice Gambaraza [APHP]
- Louis Goldenberg [DASSAULT SYSTEMES, CIFRE, from Oct 2024]
- Guillaume Houry [INRIA, from Sep 2024]
- Romain Jaquet [GHNE, APHP]
- Emilien Jemelen [EPICONCEPT, CIFRE]
- Enora Laas-Faron [INSTITUT CURIE, until Oct 2024]
- Renee Le Clech [INRIA, from Sep 2024]
- Emma Le Priol [KAP CODE, CIFRE]
- Ivan Lerner [UNIV PARIS CITE - APHP]
- Tristan Margate [DASSAULT SYSTEMES, from Sep 2024]
- Benjamin Maurel [INSERM, from Sep 2024]
- Fabien Maury [INSERM]
- Yifan Mei [UNIV PARIS CITE, from Nov 2024]
- Giulia Monchietto [INSERM, from Dec 2024]
- Juliette Murris [PIERRE FABRE, CIFRE, until Oct 2024]
- Lillian Muyama [INRIA]
- Van Tuan Nguyen [Califrais, CIFRE, until Sep 2024]
- Valentin Pohyer [APHP]
- Asok Rajkumar [APHP]
- Alice Rogier [Inserm & INRIA, until Sep 2024]
- Louis Romengas [APHP, from Dec 2024]
- Agathe Senellart [UNIV PARIS CITE]
- Guillaume Serieys [UNIV PARIS CITE]
- Woosub Shin [UNIV PARIS CITE, from Dec 2024]
- Tarini Singh [INSERM, from Nov 2024]
- Stylianos Tzedakis [APHP]
- Dylan Vellas [UNIV PARIS CITE, from Nov 2024]
- Louis Vincent [IMPLICITY, CIFRE, until Nov 2024]
- Axel Vuorinen [INSERM]
Technical Staff
- Vincent Damotte [INSERM, from Jun 2024]
- Diana Mandache [UNIV PARIS CITE]
- Van Tuan Nguyen [INRIA, Engineer, from Oct 2024, PhD student until Sept 2024]
- Antoine Poirot-Bourdain [CNRS, Engineer]
- Louis Pujol [UNIV PARIS CITE, Engineer, until Jun 2024]
- Caglayan Tuna [INRIA, Engineer, from Nov 2024]
- Ghislain Vaillant [INRIA, Engineer]
Interns and Apprentices
- Nour Al Ghazel [Doshas Consulting, Intern, from Apr 2024 until Sep 2024]
- Ilan Bacry [INRIA, Intern, from Jun 2024 until Jul 2024]
- Syrine Belahcene [APHP]
- Felix Berthou [INRIA, Apprentice, from Sep 2024]
- Ilona Blanchard [INSERM, Intern, from Apr 2024 until Sep 2024]
- Elisa Castagnari [INRIA, Intern, from Mar 2024 until Jul 2024]
- Naomi Daval-Pommier [APHP, from Feb 2024 until Aug 2024]
- Clarie De Mouy [INRIA, Intern, from Mar 2024 until Sep 2024]
- Loris Dematini [UNIV PARIS - CITE, Intern, from Apr 2024 until Aug 2024]
- Archibald Fraikin [Let it Care, until Jan 2024]
- Thibault Hubert [INRIA, Intern, until Feb 2024]
- Alberto Megina Gonzalo [INRIA, Intern, from Mar 2024 until Aug 2024]
- Yves Schaefer [UNIV PARIS CITE, from Mar 2024 until Aug 2024]
- Meilame Tayebjee [UNIV PARIS - CITE, Intern, from Apr 2024 until Sep 2024]
- Marine Tognia Tonou [UNIV PARIS - CITE, Intern, from Mar 2024 until Aug 2024]
- Theo Vandenhole [Doshas Consulting, from Oct 2024]
Administrative Assistant
- Meriem Guemair [INRIA]
2 Overall objectives
2.1 Context
Clinicians routinely have to take decisions upon the diagnosis and treatment of complex patients for which clinical guidelines do not provide clear recommendations or do not exist. This is particularly the case for very heterogeneous diseases (e.g., rare diseases or cancer in which clinical manifestations or response to treatment differ frequently from one individual to another) or when patients are seen for a new emerging disease for which no recommendation has been established yet, as it was the case in March, 2020 for the management of COVID-19 infections. Similar situations, leading to delicate decisions, happened in the past but, unfortunately, this experience is hardly taken into account or rationalized for clinicians. Indeed, data related to these past experiences are captured, but until now, these data are not accessible to clinicians and not transformed into high level evidences. This level of evidence is currently only reached by highly controlled analyses (such as controlled clinical trials), for which patients might differ strongly from those treated in routine care, or might have never been seen before in the case of new conditions. Besides, hospital information systems are used at every step of patient care, collecting continuously longitudinal data, both unstructured and structured, including clinical reports, drug prescriptions, physiology, laboratory results, imaging and omics data. These data may even be enriched with medical wearable devices or large-scale claims data such as those of the French Health Data Hub that informs on the global clinical course of patients. Likewise, progresses on artificial intelligence approaches, such as supervised machine learning, have found applications in healthcare enabling for instance clinical decision support systems, patient prioritization, drug repurposing or monitoring drug safety. However, many particularities of health data (e.g., their sensitivity, noisiness, incompleteness, heterogeneity and small volume when one very specific feature or several time points are required), and particularities of health data analysis (e.g., the risk of confoundedness, the need for explanations and fairness) make it challenging to develop tools that are both reliable and usable within hospital work- and patient- flows. For instance, precision medicine requires stratifying smaller and smaller groups of patients, which may seem contradictory in regards with the general strategy of deep learning that requires large amounts of data to be efficient. Another challenging task is the development of tools that enable gaining knowledge from data agilely, i.e., to update knowledge gain continuously (without compromising on reliability). In summary, methodological developments are required at each step of the health data chain, including: (1) data access, (2) data transformation e.g., via representation learning, (3) data analysis, predictive modelling and knowledge discovery with data- and model- driven approaches, and (4) agile, fast and reliable access to data, implementation of these approaches through applications such as decision support systems, medical devices, next generation clinical trials for the assessment of medical knowledge.
2.2 General aim
The main objective of HeKA is to develop methodologies, tools and their applications in clinics towards a learning health system, i.e., a health system that leverages clinical data collected to extract agilely and reliably novel medical knowledge that, in turn, continuously improves healthcare. Indeed, the availability of EHRs (Electronic Health Records), cohorts and other linked data such as the national Health Data Hub, offers the opportunity to develop models for stratification and prediction with the potential of improving the precision and the personalization of treatments, and thus the quality of healthcare.
3 Research program
The HeKA project-team is following 3 interdependent axes, that are (1) knowledge extraction from clinical data, (2) stochastic and data-driven predictive modelling of health data, and (3) data-driven and designs for next generation clinical trials. Theses axes can be view and interpreted form a patient care point of view as (1) from patient data to patient representations, (2) from patient data to prediction and decision, and (3) from models to improve patient-related knowledge, respectively. All these axes participate in the development of a learning health system. Axes 1 and 2 can be related to observational studies (either retrospective or prospective) and Axis 3 is related to interventional studies. As a remind, in observational studies the investigator is not acting upon study participants, but only observing relationships between factors and outcomes, while in interventional studies (i.e., clinical trials) the investigator intercedes as part of the study design.
3.1 Axis 1 - Knowledge extraction from clinical data
The development of clinical decision support and statistical predictive models has been historically made by manually selecting and tuning sets of predictive variables. This is a task-dependent and time-consuming operation that neglects most of the available data. Real-word data, such as EHRs or cohorts, offer an access to many variables, even those not initially thought of as predictive. For instance, EHRs consist of structured data, such as demographics, diagnosis, procedures, biological laboratory results, and medication exposures, which can be associated with unstructured data, such as clinical notes, discharge summaries, pathology and imaging reports. In addition, this core EHR data may be complemented with others, including images, omics data, patient-reported outcomes, or conversation transcriptions. However, the use of EHR data for any precision medicine application represents an initial and significant information extraction challenge because of their heterogeneity, incompleteness, and dynamic nature. The aim of this axis is to develop methods and tools for leveraging patients’ data in their wide variety and complexity. This encompasses the extraction and transformation of raw data into engineered, featured and learned representations of good quality that will enable or facilitate the development of further clinical decision support and knowledge discovery approaches, as those presented in Axes 2 and 3.
3.1.1 Methods
Methods developed in Axis 1 can be associated with three type of tasks: (i) deep phenotyping, (ii) patient representation and (iii) reasoning with clinical knowledge. (i) Deep phenotyping consists in defining algorithms that enable to identify patients with a particular and potentially complex profile within large healthcare databases. It encompasses the development of natural language processing tools capable of extracting complex features and their context out of clinical texts; it also includes the ability to consider simultaneously structured and unstructured data of these databases to identify relevant patients. To this aim our methods rely on expert rules, distant-supervision and deep learning language representations. (ii) Regarding patient representation we focus on two distinct kind of representations. The first one is an explicit representation of patients in the form of knowledge graphs, using Semantic Web standards and tools. The second one is a representation of patients within a latent space, using representation learning methods largely inspired from results obtained by deep learning models to learn language representations. (iii) Tasks concerned with reasoning on clinical knowledge are mutliple. It encompasses methods to measure patients’ similarity between elaborated patients representations (either in the form of knowledge graph or embeddings), hybrid approaches for analogical reasoning and logical and statistical inference.
3.2 Axis 2 - Stochastic and data-driven predictive modelling of health trajectories
The recent availability of high dimensional health data enables the emergence of data-driven models to guide clinical decisions. In this high dimensional setting, prediction methods based on machine learning have been demonstrated to be efficient, but may not be the best option in every setting. We are interested in these borders between settings, where deep learning approaches fail, but alternatives succeed and reciprocally. More specifically, we are considering borders found in temporal modelling and small-sample settings. Health data provided by EHRs have several specificities among which: (i) patient care trajectories are high dimensional, and (ii) are censored, i.e., data are observed until a certain timepoint. Current models do not succeed in simultaneously tackling these two previous concerns. For example, patient trajectories include comorbidities pathways where one disease may impact the others. Embedding patient trajectories in prognostic models remains a challenge, especially, when low-sample-size high-dimension setting. Machine learning in such setting should be adapted and compared with efficient statistical learning models.
3.2.1 Methods
During this year, we focused on adapting machine learning based methods on the following challenges: signal detection using spatio-temporal data to be able to build health geographical vigilance system, medical image classification in low sample setting, pharmacovigilance signal detection using large database. We proposed a method to model and monitor population distributions over space and time, in order to build an alert system for spatio-temporal data changes based on a new version of the Expectation-Maximization (EM) algorithm to better estimate the number of clusters and their parameters at the same time. We validate this approach on a real data set of positive diagnosed patients to coronavirus disease 2019. We show that our pipeline correctly models evolving real data and detects epidemic changes. We proposed a new method to perform data augmentation in a reliable way in the High Dimensional Low Sample Size setting using a geometry-based variational autoencoder. We validated this approach on a medical imaging classification task where a small number of 3D brain magnetic resonance images are considered and augmented using the proposed framework. We adapted a very classical pharmacoepidemiological method to high dimensional setting, the Weigthed Cumulative Exposure statistical model, which makes it possible to model the temporal relationship between the prescription of a drug and a side effect, by implementing it using Graphics Processing Unit (GPU) programming. We analysed several real life datasets using this implementation and showed that it could now be scaled to the millions of patients of the National Health Insurance Database (SNDS).
3.3 Axis 3: Data-driven and designs for next generation clinical trials
New model-based fundamental research, made from preclinical to clinical stages, could play a pivotal role in screening patients and predicting their responses prior to clinical trials. This is made possible through predictive modeling of patient's responses using biomarkers or electronic health records (EHRs). This is especially important in rare diseases, or in paediatrics, where the patient pool is limited and the use of all acquired knowledge is vital. Moreover, nowadays, new types of values/responses can be collected from patients. Indeed Digital Medical Devices are health technologies that meet the definition of medical devices as outlined in Regulation EU 2017/745, which includes their use for “prevention, diagnosis, monitoring, treatment, or alleviation of disease." Many of these devices are based on machine learning algorithms. Among them, Software as a Medical Device (SaMD) technologies now hold significant potential for translation into patient care. Consequently, their effectiveness and safety should be rigorously evaluated through clinical trials. However, by contrast to drugs for which the chemical formula does not change over time, the performances of a predictive model are constantly enriched by new observational data. The challenge we face now is to be able at any time to ensure the safety and efficacy of SaMD and other advances, for patient care. The objective of this axis is twofold: how clinical trials can help machine/statistical learning? And inversely, how can machine learning and other information sources help clinical trials? Accordingly, the first objective is to propose clinical trial methods and designs adapted to continuous learning and adapting tools. The second objective, is to produce innovative clinical trial designs acquiring all possible patient-related knowledge: disease and translational models, EHRs, clinical trial data and synthetic patients.
3.3.1 Methods
This year we focused more on the second objective while starting working on the first one. We developed (i) dose-finding methods for first-in-human trials in healthy volunteers using an activity endpoint as surrogate for efficacy; (ii) a framework to include and select several preclinical sources when stepping to clinical trial and selecting the dose-range in human; (iii) methods which allow for analsing disruped clinical trial by COVID pandemic ; (iv) deep learning models that allow to analyse multistate EHR .
4 Application domains
4.1 Multimodal approaches generalizable for several diseases
4.1.1 PEPR Digital Health
The PEPR ("Programmes et équipements prioritaires de recherche") Digital Health aims at gathering national multidisciplinary community active in digital health for the development and exploitation of the concept of digital twin in health (started in September 2023).
HeKA’s involvement in this PEPR are the following; (i) within project ShareFAIR, to learn protocols from clinical data collected along healthcare activity in Electronic Health Records (EHRs) to explicitize the medical decision processes, (steps to reach a particular diagnosis or therapeutic choice) and the management of particular conditions (steps in the management of a particular condition). Protocols extracted from EHRs provide a view on the real-word clinical practice and may then be compared together or with CPG (clinical practice guidelines) which can be seen as more theoretical protocols in that they provide recommendations, or clinical pathways (CP) to standardize clinical practice. It will be applied within NEUROVASC in the impact reduction of intracranial aneurysm and stroke, in which we will extract and the proposed clinical pathways, (ii) within REWIND, to develop of new mathematical and statistical approaches for the analysis of multimodal multiscale longitudinal data to predict patient’s response. These models will be designed, implemented as prototypes and then transferred to an easy-used-well-documented platform where people from diverse communities, in particular physicians, will be able to use them on their own data set, (iii) within DIGPHAT, to develop Bayesian modelling of meta-models pathways for the development of digital pharmacological twin; it consists in the analysis of data from omics experiments and selection of relevant covariables and to combine meta-models in pathways to select the most reliable twin model, (iv) within project M4DI, to develop a generic method for identifying subgroups of patients with the same phenotype from health databases, using jointly variable correlations and expert data, and to implement it within a computer package, (v) all these previous projects will purpose models or Clinical Decision Support Systems (CDSS) to be translated to clinical practice, however, proof based on data only is not sufficient and it should be evaluated in real life through prospective and interventional clinical trials or studies, this will be done within SMATCH. In this project we will propose new methodological paradigms for the clinical evaluation of Digital Medical Devices (DMD) including CDSS and AI based models and algorithms.
4.1.2 SurvivalGPU – “Using Graphics Processing Units (GPUs) to scale up survival analysis to nation-wide cohorts"
The recent availability of health insurance databases such as the SNDS opens the door to the detection of adverse drug reactions in the general population. The aim is to generalize the survival analyses usually carried out during clinical trials on cohorts of N=1k to 10k patients to the full French population. This line of research is appealing but poses real methodological challenges. Notably, it requires the development of statistical analysis models that meet the robustness and interpretability requirements of public health physicians while taking full advantage of recent hardware accelerators to scale up to millions of patients per cohort. In this context, our team has been working since 2022 on an efficient re-implementation, on Graphics Processing Units (GPUs), of the standard software tool in the field: the R package "survival". The new "survivalGPU" library leverages recent software tools (PyTorch, PyG, KeOps, reticulate) to bridge the gap between high-performance computing and traditional survival analysis. It now provides a complete re-implementation of the Cox proportional hazards model that is around 100 times faster on GPU than the survival package on CPU. Going further, it supports time-varying drug exposures via the Weighted Cumulative Exposures model and is accessible via an R interface which is fully retro-compatible with that of the survival package. We now intend to perform extensive validation and comparison with other models, prior to pharmaco-epidemiological studies on the SNDS data via the Health Data Hub platform.
4.1.3 Messidore-Inserm BEEP - “Bayesian methods for Early Enriched Platform trials"
The recent pandemic has shown the need of speeding up the clinical trial development of novel or repurposed therapies. Indeed, following the usual drug development paradigm, where clinical trial phases are performed sequentially and separately, the time required to the full process easily exceeds a decade. Our objective is to propose innovative Bayesian enriched “platform” designs for early phase trials, which are adapted to the clinical context and go towards precision medicine. Since we are focusing on early phases of clinical trial, in this setting “platform” cannot be linked to classical RCT. Thus, we aim at defining how “platform” trial should be translated into these early phases. As in the original definition, early platform phases will allow for flexibility, such as adding new arm or stopping treatments for futility (and/or safety in our case). The word enriched refers to the use of new information, or at least not usually used in such early trials, such as positron emission tomography (PET) scan, pharmacokinetics/pharmacodynamics (PK/PD) modelling, mathematical modelling of immune responses, and to the enrichment of the enrolled patient based on their biomarkers. The project is built around workpackages (WPs). In WP1, we develop platform trials in phase 0/I, based on PET-scan; microdosing on several (preclinical) animal species and humans will be adaptively compared, added or deleted to better characterize the extrapolation to human. In WP2, we develop phase I/II dose-finding trial using PK/PD or mechanistic PD models. In WP3, enrichment designs for phase I/II, in survival settings, are proposed when selected biomarkers are available, and the design will be extended in case of combination therapies.
4.1.4 ANR AT2TA - “Analogies: from Theory to Tools and Applications"
Analogical reasoning is a kind of reasoning that is based on finding a common relational system between two situations, exemplars, or domains. In computer science, analogical reasoning can be supported by two main axes of artificial intelligence: knowledge representation and reasoning, and machine learning. The AT2TA projet particularly aims at studying the role that machine learning can play in analogical reasoning; and the HeKA team is in charge of exploring the application of their interplay in the healthcare domain. A PhD student, co-supervised with Inria Paris, IHU Imagine and Université de Lorraine, is learning representations of patients, relying on clinical texts, and study how these representations can first compose analogical propositions, and second serve as bricks to a machine learning architecture for analogical reasoning.
4.1.5 HDH BOAS ADHERENCE - “Phenotyping algorithms for the assessment of therapeutic adherence and its clinical consequences in the management of chronic diseases"
Therapeutic adherence is a complex and multifactorial phenomenon that qualifies the degree to which the patient conforms to medical prescriptions. This phenomenon has both detrimental clinical consequences for the patient (in terms of prognosis and quality of life), and negative economic consequences for the society. One difficulty in the study of adherence is that it measurement is mainly indirect and tainted with many bias. Another difficulty is that adherence is not an end, but should be considered through the lens of patient outcomes. The ADHERENCE project, funded under the Health Data Hub BOAS program (Bibliothèque Ouverte d’Algorithmes en Santé) aims at proposing clearly documented measurements for the level of patient adherence and of its potential clinical outcomes in the context of three chronic diseases (cancer, hypertension and transplantation). One challenge here is to enable these measurements in both EHRs from hopsitals and claims from the national insurance database (SNDS).
4.1.6 iDEMO Meditwin - Dassault Systems - "Virtual twin for personalised medicine "(started in 2024)
Meditwin is a collaborative project funded by BPI ("Banque Publique d'Investissement") with Dassault Systems (leader of the projects), Inria, IHU institutes across France and Medtech startups. The aim of the project is to provide a digital platform relying on virtual twins of individuals who faithfully reproduce their state of health and which make it possible to test different therapeutic options. It will promote interdisciplinarity by facilitating interoperability of multimodal medical data. Our team will use AI approaches to propose Clinical Decision Support Systems (CDSS) in cardiovascular diseases and cancer. We will also develop the clinical trial methodology evaluation these CDSS. In particular, HeKA will develop stratification and classification algorithms, synthetics patient's generators, statistical and mathematical models for multi-modal and multidimensional health data and clinical evaluation methods for the resulted CDSS as Digital Medical Devices.
4.1.7 RHU ReBone - “Surgery planning for multiple fractures"
The RHU ReBone is a French consortium led by the orthopedic surgery unit of the Nice hospital. It is funded by the ANR from 2024 to 2029, and aims at producing robust anatomical software to automate the planning of complex fracture reductions. Jean Feydy and Stéphanie Allassonnière work on the image pre-processing and analysis, in close collaboration with Hervé Delingette from the Epione team at Inria Sophia.
4.2 Cancer
4.2.1 SIRIC InsiTu - “Insights into cancer: from inflammation to Tumor"
To turn scientific knowledge into sustainable healthcare, cancer research must identify who is at risk of cancer, when and in whom a new cancer arises, and how best to treat it and gauge response. Aligned with Europe's Beating Cancer Plan, InsiTu takes on the three challenges of cancer prevention, interception, and treatment in digestive, lung, skin cancer, and heme malignancies. Chronic inflammation is a key cancer niche fostering tumor initiation. Leveraging a transformative Tissue-Hub interfacing diagnostics and research, our program ‘From inflammation to clonal emergence and cancer’ will unite experts in chronic diseases damage to monitor patients with chronic tissue inflammation and cancer predisposition, mirrored by animal modelling, to understand the critical transition from chronic tissue damage to cancer progression, opening opportunities for prevention and interception. Such longitudinal (and sometimes invasive) interactions between patients and healthcare practitioners can be improved by empowering patients, taking psychic, social and ethical dimensions in consideration. Our program ‘Imaging cancer and its environment’ will take a different approach to this challenge. Through synergetic interactions with mathematicians and physicists, it will provide novel frameworks for multiscale integration of molecular alterations, cellular processes, and tissue complexity. This effort will result in image- based, non-invasive ‘virtual biopsies’ as proxies of key biological processes underlying tumor heterogeneity and drug resistance. Along with novel biomarkers such as circulating extracellular vesicles, these virtual biopsies will gauge responses to new therapeutic approaches developed in our third program ‘From new targets to new trials’. There, experts in leukemias and skin cancers will use cutting-edge in vivo functional screens and multi-omic interrogation of Tissue-Hub samples to identify new targetable vulnerabilities and develop next-generation cell-based immunotherapies. To fasten the transfer of these innovations into care, new adaptive clinical trial designs will be engineered.
4.2.2 Combo - Sanofi - "Evaluating drug combinations in oncology with Real-World Data and state-of-the-art knowledge"
Combo is a collaborative project with Industry, national health data platforms and cancer institute: Sanofi Pharma, The Health Data Hub, Centre Léon Berard and Inria-Inserm-HeKA. The objective of the project is to identify promising families of drug combinations in oncology using multisource and multi-modal data modelling and prediction, including RWD (cancer patients’ care data from CLCC cancer centre), genomic public databases, literature, clinical trials depository and expert’s opinion. Once these combinations will be identified mechanistic models will be used to determine dose-regimen and build dose-finding trial designs for the combinations to be evaluated through formal clinical trials. In this project we lead the following WPs (1) AI based analysis of the multimodal RWD and subgroup discovery for the identification of relevant combinations, (2) Bayesian multi-modal analysis accounting RWD modelling as well as expert’s opinion and literature and public clinical platforms AI analysis from Sanofi, and (3) proposing candidates and designs for future phase I designs associated with dose, regimen and associated molecule for the combination on selected family of combinations using preclinical PK/PD model.
4.2.3 RHU OPERANDI - “Optimisation and imProved Efficacy of targeted RAdioNuclide therapy in Digestive cancers by Imagomics"
Advanced stage hepatocellular carcinoma (HCC) and gastroenteropancreatic neuroendocrine tumours (GEP-NET) are currently treated with targeted radionuclide therapy (TRT), a highly advanced method that consists of either intra- arterial injections of radioactive microspheres (transarterial radioembolisation - TARE) or targeted peptides radioactively labelled and administered systemically (Peptide Receptor Radionuclide Therapy - PRRT). While highly effective, patient stratification and early identification of responders are currently managed insufficiently due to the lack of pertinent imaging biomarkers, either non-invasive or invasive. Furthermore, therapy-induced DNA damage leads to tumour resistance, reducing TRT efficacy. We aim to overcome those current limits through the OPERANDI project via innovative approaches in engineering, novel imaging biomarkers, and new concepts for DNA repair mechanisms, combined with a fundamental understanding of causal links. Our ambitions go beyond the current state-of-art, embracing even new combinations of drugs and -emitters to enhance dose localization and efficiency. Methodology will try to understand fundamentally whether current patient management using CT/PET/MRI allows to predict response and survival using cutting edge imaging-based artificial intelligence (AI) approaches in combination with data augmentation techniques to reach statistical significance.
4.3 Rare diseases and pediatrics
4.3.1 EU INVENTS Horizon project - “Innovative designs, extrapolation, simulation methods and evidence-tools for rare diseases addressing regulatory needs" (started in 2024)
The evaluation of new medicines for rare diseases (RD) including rare paediatric RDs is challenging for several reasons, among which are the small patient sample sizes, heterogeneity of patients and diseases and heterogeneity in disease knowledge. Due to these difficulties, access to effective treatments and the number of treatment options are often limited in RDs. INVENTS aims to provide clinical trial trialists, researchers and regulators with a global framework encompassing methods, workflows and evidence assessment tools to be implemented in orphan and paediatric drug development. Our ambition is to significantly improve the evaluation of evidence and regulatory decision-making through the development and validation of: refined longitudinal model-based diseases trajectories and treatment effect, improved extrapolation models, in silico trials (e.g., virtual patient cohorts), optimised model-based clinical trial designs and evidence synthesis methods. These will be evaluated through simulation studies and tested on extensive data from a range of use cases provided by our industrial partners Roche and Novartis and Real World data (RWD) from RD registry. The INVENTS framework will improve consistency and efficiency of the drug evaluation process for RD by augmenting clinical evidence without compromising its scientific integrity and providing regulators assessment credibility criteria. At the end of this 5 years project, the European industry will be able to exploit novel and improved clinical trial designs, in silico trials and RWD analysis approaches supporting drug development in RD. The European Medicine Agency and European national regulators (including Health Technology Assessment bodies) will be supplied with a general framework allowing better informed decision making. Most importantly, RD patients will benefit from an increased and faster access to efficacious and safe treatments.
4.3.2 EU MSCA Doctoral Networks Orgestra project (started 2024)
Organoids experimental models are in vitro 3D cell cultures which can be generated from embryonic stem cells, induced pluripotent stem cells or adult stem cells, and can replicate organs functionally and structurally. Their physiological resemblance to target organs and ability to cryopreserve make organoids a powerful tool for biomedical research and advancing understanding of the mechanisms underlying certain disorders, including rare diseases. The ORGESTRA Joint Doctoral Network will propose innovative organoid technologies for two genetic disorders, i.e., cystic fibrosis and cystinosis. In this project we will supervise 2 PhDs project which will propose statistical development for; (1) linking in silico trials to organoids data and innovative trial design. These designs will incorporate biomarkers-based findings, as organoids, i.e., that reduce unnecessary exposure of patients (screening) or allow drugs to be screened more effectively for non-effectiveness before embarking on human trials. This will be done via a joint doctoral degree with University of Utrecht. (2) Estimand framework involving Bayesian principles on organoids data for clinical trial outcomes and models. The estimands framework will be based on expert’s elicitation to understand which questions are more relevant in term of clinical efficacy/toxicity, to select the proper outcomes, to identify the possible intercurrent events and to provide a robust statistical model whose parameters will be estimated under a Bayesian setting. This will be done via a joint doctoral degree with Katholieke Universiteit Leuven.
4.3.3 CIL LICO- Ciliopathies: group of disorders associated with genetic mutation leading to rare and severe genetic diseases - Projet RHU - Recherche Hospitalo-Universitaire- 2018-24
Ciliopathies are a large group of rare and severe genetic diseases caused by ciliary dysfunction associated with clinical and genetic heterogeneity, as well as a lack of knowledge on patients' natural history (i.e., the evolution of the disease). The aim of this RHU-3 project, funded through the Investissements d'avenir program, is to develop innovative, diagnostic, prognostic and tailored therapeutic approaches for patients suffering from ciliopathies to prevent them from developing renal failure. Following this aim, we will develop a mechanistic stratification of ciliopathies, in order to regroup suspected and already diagnosed ciliopathies in a treatment-orientated classification. One goal is to engineer a ready-to-use bio-kit for assessing both diagnosis and prognosis of developing renal alteration, in ciliopathy patients. Finally, personalized therapeutic approaches will be proposed for patients using predictive approaches. We will use methods developed in Axis 1 as well as methods proposed in Axis 2 to model patient trajectory. This project is in collaboration with Imagine Institute, AP-HP, the Medetia company and Ecole Polytechnique.
4.3.4 BNDMR- Banque Nationale des Maladies Rares
The French National Registry for Rare Diseases (BNDMR) is a national tool for epidemiology and public health purposes in the field of rare diseases. In line with the objectives defined by the 2nd and 3rd French National Plan for Rare Diseases, the BNDMR team develops a secure national information system which gathers anonymized clinical data of patients affected by rare diseases in its BNDMR data warehouse. As medical head of the BNDMR, AS Jannot has several research projects strongly connected with HeKA team including CDE.AI and Dromos project. CDE.ai aims to create a set of natural language processing algorithms that will allow the semi-automatic completion of the rare disease minimal data set that is currently completed manually for all patients followed up in the rare disease expert centres. In this project, we will use methods developped in Axe 1 (collaboration with N Garcelon). The DROMOS project is a project that uses the National Data Bank for Rare Diseases by linking it to health insurance data. This matching will allow the description of the care of rare disease patients at the national level for rare diseases, including the characteristic care of the most frequent rare diseases. We will use methods developped in the from of Axis 2 to model these longitudinal data.
4.4 Other diseases
4.4.1 Antibiotic resistance – FAIR project EU Horizon (on going)
The aim of the FAIR project is to evaluate Flagelin aerosol therapy for stimulation of immunity as an alternative treatment against pneumonia with multidrug resistant bacteria. In this project, we are developing a full model using pharmacometrics expertise as well as statistical designs for extrapolation purpose and the design of dose-finding study in healthy volunteers. As written above, in this project, S Zoharvco-lead along with C Kloft (Freie Universitaet Berlin) the WP entitled “Development of a translational modelling and simulation platform for flagellin PK/PD”. The aim of the WP is to propose an optimal design for the first-in-man clinical trials, maximizing knowledge gained from in vitro experimentation, expert knowledge and pre-clinical experiments along the way. By incorporating mechanistic approaches earlier in the development process along with a continuous learning modelling under Bayesian inference, we hope to increase the probability of success of the translation process to the clinical setting and thus, optimizing the statistical design and sample size. This project is in relation with axis 3.
4.4.2 Virtual reality (Ongoing)
Several projects led at the HEGP are currently ongoing to evaluate the analgesia provided by the use of Virtual Reality in different care settings (extracorporeal lithotripsy, after colorectal cancer surgery, and fiberoptic bronchoscopy in critical care). In these projects, not so close but still related to axis-2, we will provide methodological approach and use statistical methods to conclude on the clinical questions, by working closely with all Coordinating Investigators (Prof. D Clausse, G Manceau and A Rastello)
5 Social and environmental responsibility
5.1 Impact of research results
Our methods and designs are applied in collaboration with medical research team members at HEGP and Necker hospitals (among others). During the last four years, and in consequence of the COVID-19 sanitary pandemic, we developed and deployed tools for knowledge extraction from clinical text to the central clinical data warehouse of the AP-HP (Entrepôt de Données de Santé, AP-HP).
Through others, these tools enable the recognition of named entities, their context (e.g., negation, personal or family history, hypothesis). They have been widely reused for the development of clinical studies, which led to 13 international publication of clinical epidemiology related to COVID-19. It also serve as a basis for our implication in the 4CE international consortium, which led to 15 publications. These tools have been refactored and their use is facilitated within the medkit library.
6 Highlights of the year
- Nominations by the IA Cluster Pr[AI]rie-PSAI: Stéphanie Allassonière is chair, Jean Feydy and Guillaume Chassagnon is chair tremplin and Anne-Sophie Jannot is fellow.
- EU Horizon project funding obtained by Sarah Zohar, as a PI.
7 New software, platforms, open data
7.1 New software
7.1.1 medkit
-
Name:
a toolkit for a learning health system
-
Keywords:
Learning health system, Biomedical data, Decision support, Python, Information extraction, Natural language processing, Audio signal processing, Machine learning
-
Scientific Description:
medkit aims to facilitate information and knowledge extraction from data of various modalities by providing software modules for data preparation or analysis and facilitating their chaining. The development of new modules is motivated either by needs within the core of the library or by application projects. The initial projects of the team that motivated developments were related to knowledge extraction from healthcare data warehouses, particularly their textual content.
-
Functional Description:
This library aims at (1) facilitating the manipulation of healthcare data of various modalities (e.g., structured, text, audio data) for the extraction of relevant features and (2) developing supervized models from these various modalities for decision support in healthcare.
-
Release Contributions:
## Fixed
- Use ISO 8601 timestamp for model checkpoint paths - Fix test of iamsystem matcher on Python 3.12
- URL:
-
Contact:
Adrien Coulet
-
Participants:
Deycy Camila Arias Villamil, Olivier Birot, Kim-Tam Huynh, Antoine Neuraz, Ivan Lerner, Bastien Rance, Adrien Coulet, Ghislain Vaillant
7.1.2 Pythae
-
Keywords:
Generative Models, Benchmarking, Reproducibility
-
Functional Description:
This library implements some of the most common (Variational) Autoencoder models under a unified implementation. In particular, it provides the possibility to perform benchmark experiments and comparisons by training the models with the same autoencoding neural network architecture. The feature make your own autoencoder allows you to train any of these models with your own data and own Encoder and Decoder neural networks. It integrates experiment monitoring tools such wandb, mlflow or comet-ml and allows model sharing and loading from the HuggingFace Hub in a few lines of code.
- URL:
-
Contact:
Clément Chadebec
7.1.3 Pyraug
-
Keywords:
Generative Models, Data augmentation
-
Functional Description:
This library provides a way to perform Data Augmentation using Variational Autoencoders in a reliable way even in challenging contexts such as high dimensional and low sample size data.
- URL:
-
Contact:
Clément Chadebec
7.1.4 MultiVae
-
Keywords:
Multimodality, Variational Autoencoder
-
Functional Description:
This library gathers some of the most common multi-modal Variational AutoEncoder (VAE) implementation in PyTorch as well as benchmarking tools (datasets, metrics...).
-
Contact:
Agathe Senellart
7.2 Open data
7.2.1 ChemoKG
We published ChemoKG-open, a knowledge graph in RDF (Resource Description Framework) of chemotherapy protocols. It encompasses 513 protocols, which provides the necessary classes and relations to represent the various dimensions of protocols. The protocols themselves have been extracted from a local database of the Pharmacy Service of the European Georges Pompidou Hospital of the AP-HP, Paris. A description of ChemoKG-open and its implementation is available in 64. We implemented this knowledge graph by reusing as much as possible elsewhere defined entites and by respecting the FAIR principles of Open Science.
8 New results
The team have generated many results in the last year, here are few illustrations for each axis.
8.1 Axis 1
We developped methods and tools for the extraction of information from Electronic Health Records, the support of clinical decisions, and for patient representaitons.
Neuraz A, Vaillant G, Arias C, Birot O, Huynh KT, Fabacher T, Rogier A, Garcelon N, Lerner I, Rance B, Coulet A. Facilitating phenotyping from clinical texts: the medkit library. Bioinformatics. 2024 Dec;40(12):btae681. doi: 10.1093/bioinformatics/btae681.
To facilitate the development, evaluation and reproducibility of phenotyping pipelines, we developed an open-source Python library named medkit. It enables composing data processing pipelines made of easy-to-reuse software bricks, named medkit operations. In addition to the core of the library, we share the operations and pipelines we already developed and invite the phenotyping community for their reuse and enrichment.
Participants: Antoine Neuraz, Ghislain Vaillant, Kim Huynh, Thibaut Fabacher, Alice Rogier, Nicolas Garcelon, Ivan Lerner, Bastien Rance, Adrien Coulet.
Muyama L, Neuraz A, Coulet A. Deep Reinforcement Learning for Personalized Diagnostic Decision Pathways Using Electronic Health Records: A Comparative Study on Anemia and Systemic Lupus Erythematosus. (157)102994. 2024. doi: 10.1016/j.jbi.2024.104746.
Inspired by diagnosis guidelines, we formulate the task of diagnosis as a sequential decision-making problem and studied in this work the use of Deep Reinforcement Learning (DRL) algorithms to learn the optimal sequence of actions to perform in order to obtain a correct diagnosis from Electronic Health Records (EHRs), which we name a diagnostic decision pathway. We apply DRL to synthetic yet realistic EHRs and develop two clinical use cases: Anemia diagnosis, where the decision pathways follow a decision tree schema, and Systemic Lupus Erythematosus (SLE) diagnosis, which follows a weighted criteria score. We particularly evaluate the robustness of our approaches to noise and missing data, as these frequently occur in EHRs. In both use cases, even with imperfect data, our best DRL algorithms exhibit competitive performance compared to traditional classifiers, with the added advantage of progressively generating a pathway to the suggested diagnosis, which can both guide and explain the decision-making process.
Participants: Lillian Muyama, Antoine Neuraz, Adrien Coulet.
Jhee JH, Rogier A, Giraud D, Pinet E, Sabatier B, Rance B, Coulet A. Representation and comparison of chemotherapy protocols with ChemoKG and graph embeddings. In SWAT4HCLS 2024 - 15th International Semantic Web Applications and Tools for Health Care and Life Sciences Conference, Feb 2024, Leiden, Netherlands. CEUR-WS: Vol-3890/paper-3.pdf.
We proposed ChemoKG, a knowledge graph for chemotherapy protocols that encompasses first administration programs such as drugs, dosages, treatment durations, and second drug properties and classes imported from ChEBI, DrugBank and the ATC classification. Three resources on drugs provide complementary hierarchies and chemical properties that help to better identify similar chemotherapy protocols. To this aim, we tested on ChemoKG a novel graph embedding method employing graph neural networks (GNNs) to compare nodes in the graph that represent protocols. Unlike previous approaches that focus on triple-based embeddings, the proposed method captures subgraph structures inherited from the aggregation scheme in GNNs. We demonstrated that this contributes in facilitating the comparison of chemotherapy themselves, and by extension to their potential effectiveness.
Participants: Jong Ho Jhee, Alice Rogier, Brigitte Sabatier, Bastien Rance, Adrien Coulet.
Gonsard, Apolline, Martin Genet, and David Drummond. "Digital twins for chronic lung diseases." European Respiratory Review 33.174 (2024). doi: 10.1183/16000617.0159-2024.
This year, a part of our research has focused on defining and conceptualizing patient digital twins, synthesizing existing literature, and proposing their application to respiratory medicine. Based on a systematic review of 80 claimed digital twins of patients, we proposed to define a patient digital twin as “a viewable digital replica of a patient, organ, or biological system that contains multidimensional, patient-specific information and informs decisions”, and distinguished the two types of patient digital twins currently available: monitoring twins, which integrate real-time health data for continuous feedback and risk prediction, and simulation twins, which use advanced modeling to predict disease progression and therapy outcomes 22. We then applied these concepts in respiratory medicine. Our proposed definitions and subtypes offer a framework to guide research into realizing the potential of these personalized, integrative technologies to advance clinical care.
Participants: David Drummond.
8.2 Axis 2
We adapted and developed innovative methods based both on machine and deep learning for longitudinal health data in high dimensional setting in different contexts including cluster detection using geographic data, image classification, cut-point detection of prognostic factor, drug side effect detection.
Fraikin AF, Bennetot A, Allassonniere S. T-Rep: Representation Learning for Time Series using Time-Embeddings. In The Twelfth International Conference on Learning Representations (ICML). Poster. 2024. online: 3y2TfP966N.
Multivariate time series present challenges to standard machine learning techniques, as they are often unlabeled, high dimensional, noisy, and contain missing data. To address this, we propose T-Rep, a self-supervised method to learn time series representations at a timestep granularity. T-Rep learns vector embeddings of time alongside its feature extractor, to extract temporal features such as trend, periodicity, or distribution shifts from the signal. These time-embeddings are leveraged in pretext tasks, to incorporate smooth and fine-grained temporal dependencies in the representations, as well as reinforce robustness to missing data. We evaluate T-Rep on downstream classification, forecasting, and anomaly detection tasks. It is compared to existing self-supervised algorithms for time series, which it outperforms in all three tasks. We test T-Rep in missing data regimes, where it proves more resilient than its counterparts. Finally, we provide latent space visualisation experiments, highlighting the interpretability of the learned representations.
Participants: Franklin Fraikin, Stéphanie Allassonière.
Bleistein L, Fermanian A, Jannot AS, Guilloux A. "Learning the dynamics of sparsely observed interacting systems." International Conference on Machine Learning. PMLR 202:2603-2640, 2023. online: v202/bleistein23a.
We consider the task of learning individual-specific intensities of counting processes from a set of static variables and irregularly sampled time series. We introduce a novel modelization approach in which the intensity is the solution to a controlled differential equation. We first design a neural estimator by building on neural controlled differential equations. In a second time, we show that our model can be linearized in the signature space under sufficient regularity conditions, yielding a signature-based estimator which we call CoxSig. We provide theoretical learning guarantees for both estimators, before showcasing the performance of our models on a vast array of simulated and real-world datasets from finance, predictive maintenance and food supply chain management.
Participants: Linus Bleinstein, Anne-Sophie Jannot, Agathe Guilloux.
Do MH, Feydy J, Mula O, "Sparse Wasserstein barycenters and application to reduced order modeling", Journal of Scientific Computing 102, 64 (2025). online: 10.1007/s10915-024-02766-0.
We develop a general theoretical and algorithmic framework for sparse approximation and structured prediction in spaces of probability measures with Wasserstein barycenters. The barycenters are sparse in the sense that they are computed from an available dictionary of measures, but the approximations only involve a reduced number of atoms. We show that the best reconstruction from the class of sparse barycenters is characterized by a notion of best n-term barycenter which we introduce, and which can be understood as a natural extension of the classical concept of best n-term approximation in Banach spaces. We show that the best n-term barycenter is the minimizer of a highly non-convex, bi-level optimization problem, and we develop algorithmic strategies for practical numerical computation. We next leverage this approximation tool to build interpolation strategies that involve a reduced computational cost, and that can be used for structured prediction, and metamodeling of parametrized families of measures. We illustrate the potential of the method through the specific problem of Model Order Reduction (MOR) of parametrized PDEs. Since our approach is sparse, adaptive and preserves mass by construction, it has potential to overcome known bottlenecks of classical linear methods in hyperbolic conservation laws transporting discontinuities. It also paves the way towards MOR for measure-valued PDE problems such as gradient flows.
Participants: Jean Feydy.
8.3 Axis 3
Boulet S, Ursino M, Michelet R, Aulin LB, Kloft C, Comets E, Zohar S. (2024). Bayesian framework for multi-source data integration-Application to human extrapolation from preclinical studies. Statistical Methods in Medical Research, 33(4), 574-588. doi: 10.1177/09622802241231493.
In preclinical studies, pharmacokinetic, pharmacodynamic, and toxicological data are often analyzed independently, limiting their integration for human dose prediction. To address this, we propose a customizable Bayesian framework for multi-source data integration, enabling precise extrapolation of preclinical results to humans. Our four-step approach includes sequential parameter estimation, human extrapolation, commensurability checks, and information merging, reducing uncertainty and improving dose prediction accuracy. Evaluated through simulations based on an oncology case, this framework enhances data utilization, potentially leading to more efficient and reliable dose selection.
Participants: Sandrine Boulet, Moreno Ursino, Sarah Zohar.
Cottin A, Zulian M, Pécuchet N, Guilloux A, Katsahian S. MS-CPFI: A model-agnostic Counterfactual Perturbation Feature Importance algorithm for interpreting black-box Multi-State models. Artificial Intelligence in Medicine. 2024 Jan 1;147:102741. doi: 10.1016/j.artmed.2023.102741.
Multi-state processes are commonly used to model the complex clinical evolution of diseases where patients progress through different states. However, acceptability of machine learning and deep learning models by patients and clinicians, as well as for regulatory compliance, require interpretability of these algorithms’s predictions. Existing methods, such as the Permutation Feature Importance algorithm, have been adapted for interpreting predictions in black-box models for 2-state processes (corresponding to survival analysis). For generalizing these methods to multi-state models, we introduced in this work a novel model-agnostic interpretability algorithm called Multi-State Counterfactual Perturbation Feature Importance (MS-CPFI) that computes feature importance scores for each transition of a general multi-state model, including survival, competing-risks, and illness-death models. MS-CPFI uses a new counterfactual perturbation method that allows interpreting feature effects while capturing the non-linear effects and potentially capturing time-dependent effects.
Participants: Aziliz Cottin, Agathe Guilloux, Sandrine Katsahian.
Guénégou-Arnoux A, Murris J, Bechet S, Jung C, Auchabie J, Dupeyrat J, et al. (2024). Protocol for fever control using external cooling in mechanically ventilated patients with septic shock: SEPSISCOOL II randomised controlled trial. BMJ open, 14(1), e069430. doi: 10.1136/bmjopen-2022-069430.
SEPSISCOOL II is a pragmatic, investigator-initiated, adaptive, multicentre, open-label, randomised controlled, superiority trial in patients admitted to the intensive care unit with febrile septic shock. After stratification based on the acute respiratory distress syndrome status, patients will be randomised between two arms: (1) cooling and (2) no cooling. The primary endpoint is mortality at day 60 after randomisation. The secondary endpoints include the evolution of organ failure, early mortality and tolerance. The target sample size is 820 patients.
9 Bilateral contracts and grants with industry
9.1 Bilateral Grants with Industry
- CEMKA - Cifre : Anne-Sophie Jannot is supervising Corentin Faujour on the project: Development of classifiers based on data of the National Health Insurance Database (SNDS),
- Dassault Systèmes - Cifre : Sandrine Katsahian and Agathe Guilloux are supervising Aziliz Cottin on the project: Survival dynamic prediction for personalization of cancer patients’ follow-up,
- Dassault Systèmes - Cifre : Jean Feydy is supervising Louis Goldenberg on the project: Deep learning for statistical modeling of vascular networks,
- Epiconcept - Cifre : Sandrine Katsahian and Agathe Guilloux are co-supervising Emilien Jemelen on the project: Evaluation of the contribution of the use of artificial intelligence in the French breast cancer screening program.
- GE Healthcare - Cifre : Stéphanie Allassonière is supervising Théau Blanchard on the project: Virtual liver tumor pathology using self-supervised learning and multimodal data integration.
- Exystat - Cifre : Sarah Zohar and Moreno Ursino are co-supervising Benjamin Duputel on the project: Platform Phase II/III seamless clinical trials Bayesian designs,
- Implicity - Cifre : Stéphanie Allassonnière is supervising Louis Vincent on latent modelling of cardiac time series including external information. A special focus is on missing data and heterogeneous populations.
- MyFit Solutions - Cifre : Jean Feydy and Stéphanie Allassonnière are supervising Hadrien Bigo-Balland on the project: Tuning and automation on 3D models of human body parts, reconstructed via smartphone,
- Pierre Fabre - Cifre : Sandrine Katsahian is supervising Juliette Murris on the project: prediction of large-scale recurrent events in digestive cancers,
- Sanofi - Cifre : Sandrine Katsahian and Agathe Guilloux are co-supervising Jean-Baptiste Baitairian on the project: Quantitative Bias Assessment for causal inference.
- Sanofi - Projet CombO : "Evaluating drug combinations in oncology with Real-World Data and state-of-the-art knowledge". Adrien Coulet, Moreno Ursino and Sarah Zohar are part of this project. Please see section 4 for further details.
- BOTdesign - Exploitation des locigiels : This involves the Pythae and MultiVAE software. This will involve generating artificial data to complete cohorts for clinical trials with difficult recruitment, or learning digital EHR to increase statistical power.
Participants: Stéphanie Allassonière, Jean-Baptiste Baitairian, Nadim Ballout, Hadrien Bigo-Balland, Théo Blanchard, Aziliz Cottin, Adrien Coulet, Benjamin Duputel, Corentin Faujour, Jean Feydy, Louis Goldenberg, Agathe Guilloux, Anne-Sophie Jannot, Emilien Jemelen, Jong Ho Jhee, Sandrine Katsahian, Juliette Murris, Agathe Senellart, Moreno Ursino, Louis Vincent, Sarah Zohar.
10 Partnerships and cooperations
Participants: Stéphanie Allassonnière, Guillaume Chassagnon, Adrien Coulet, Jean Feydy, Agathe Guilloux, Anne-Sophie Jannot, Moreno Ursino, Sarah Zohar.
10.1 European initiatives
10.1.1 Horizon Europe
INVENTS - Sarah Zohar is the PI of this EU Horizon project. Please see section 4 for further details.
10.1.2 H2020 projects
FAIR - Sarah Zohar is a WP leader in this H2020 project. Please see section 4 for further details.
10.1.3 Other european programs/initiatives
The European taskforce lead by EIT Health and French Ministry of Solidarity and Health, for the “harmonization of clinical studies criteria andmethodologies in Europe for the evaluation of digital medical devices”. In this taskforce, Sarah Zohar co-lead the WP2 on “Evidence in clinical evaluation” with Corinne Collignon (Head of the Digital Mission at HAS, France) and Barbara Höfgen (Head of the Unit DiGA-Fast- Track at Bfarm, Germany)
EUMSCA Doctoral Networks - Project ORGESTRA "Organoid technologies for disease modeling, drug discovery and development for rare diseases". Sarah Zohar is the WP leader of methodological design for using organoids outcomes and transfer it to clinic.
10.2 National initiatives
Stéphanie Allassonnière directed the writing of a White Book on Artifical Healthcare Data
ReCAP - Moreno Ursino is the referent of the « Early trials » group.The national network RECaP of Research in Clinical Epidemiology and in Public Health has for purpose to mutualize original research projects and to produce innovations in clinical epidemiology and public health.
10.3 Regional initiatives
Stéphanie Allassonière is chair, Jean Feydy and Guillaume Chassagnon is chair tremplin and Anne-Sophie Jannot is fellow in the renewal of the IA Cluster Prairie-PSAI.
Adrien Coulet , along with Mehwish Alam from TELECOM Paris, have been awarded with a PhD grant from the Ile-de-France DIM (Domaine de recherche et d’Innovation Majeurs) named AI4IDF, Human-centered artificial intelligence in the Ile-de-France region to study
10.4 Public policy support
Stéphanie Allassonnière and Anne-Sophie Jannot were two of the four experts missionned by the Ministry of Health and the Ministry of Research on the writing a National Report on the Secondary Use of HealthCare Data.
11 Dissemination
11.1 Promoting scientific activities
11.1.1 Scientific events: organisation and selection
- Stéphanie Allassonière served as a member of the SAB of the AI for Health Summit
- Adrien Coulet co-organized PFIA journée IA & Santé (1 day, 70 participants), which brings together researchers interested in healthcare applications of knowledge representation and machine learning.
Members of the group are involved in the program committees of several events such as ML4H, MedInfo, SWAT4HCLS, the OHDSI Symposium.
11.1.2 Journal
Member of the editorial boards
Sarah Zohar is Associated Editor of two scientific journals: "Biometrics" and "Statistics in Biopharmaceutical Research" Bastien Rance is member of the editorial board of the "International Journal ofMedical Informatics". Adrien Coulet served as a guest editor of the “Annual Review of Biomedical Data Science”.
Reviewer - reviewing activities
Members of HeKA are regular reviewers for the following journals: Journal of Biomedical Informatics, Artificial Intelligence in Medicine, Statistics in Medicine, Biometrics, Statistics in Medical Research, and others.
11.1.3 Invited talks
HeKA memebers are regularly invited to present their research to both scientific and general public.
11.1.4 Leadership within the scientific community
Antia Burgun serves as the Representative of the French medical informatics community at the IMIA (International Medical Informatics Association). She also serves as a member of the Executive Board of the Imagine Institute.
Stéphanie Allassonière is the vice-president of research economical transfer and industrial partnerships, Université Paris Cité.
11.1.5 Scientific expertise
Sarah Zohar is a voting member at the Cnedimts (“Commission nationale d’évaluation des dispositifs médicaux et des technologies de santé”) at HAS (“Haute Autorité de Santé”). Sarah Zohar was in the review committee of BpiFrance grant evaluation "évaluation DM à base d’IA et de numérique". Members of the HeKA team are involved in several ethic committees, such as the Comité d’Ethique de la Recherche APHP.Centre, the EDS AP-HP Comité Scientifique et Ethique. Members of HeKA the scientific board of the ANR generic call Axe H.14 : Interfaces : mathématiques, sciences du numérique – biologie, santé and of the FC3R (French Center for the 3R).
11.1.6 Research administration
Sandrine Katsahian is responsible for the PRIME department dedicated to the Research, Innovation, Digital Medicine located in the HEGP in APHP.Centre.
David Drummond serves as Secretary of the mHealth/eHealth group at the European Respiratory Society.
11.2 Teaching - Supervision - Juries
Anne Sophie Jannot and Sandrine Katsahian lead the speciality “Big Data in Healthcare” in the Master of Science of Public Health at Université Paris Cité.
Anita Burgun and Bastien Rance lead the speciality “Biomedical Informatics” in the same Master at Université Paris Cité.
Adrien Coulet, Jean Feydy, Agathe Guilloux, Sarah Zohar teach in this same Master of Science of Public Health at Université Paris Cité.
Stéphanie Allassonniere coordinates the SMPS Bioentrepreneur (Université Paris Cité)
Anne Sophie Jannot co-leads the quantitative biomedicine course, which is part from the medical degree course. Anne Sophie Jannot leads a professional degree of “health data reuse” at the Paris Cité University in collaboration with Marseille and Bordeaux Universities. Moreno Ursino co-leads the course “Science des données” in the L2SIAS at Université Paris Cité. Stéphanie Allassonniere runs a course focusing on the analysis of real-life health data within the M2 MVA (Mathematics, Vision, Learning). Jean Feydy runs a class of geometric data analysis in the same program, while also teaching linear statistical models to 2nd year medical students at Université Paris-Cité.
Rosy Tsopra est co-porteuse du projet d'enseignement DigiHealth Paris Cité qui vise à développer les connaissances et compétences en santé numérique dans tous les cursus de santé et certaines filières scientifiques au sein d’Université Paris Cité (Facultés Santé et Sciences).
11.3 Popularization
Members of the team are very regularly invited to present their research activities or provide their point of view in events with various types of public. For instances Stéphanie Allassonnière took part to roundtables at the Convention on Health Analysis and Management (CHAM) , annual colloquium of the AFCDP on the European Health Data Space ; Agathe Guilloux to a roundtable at Futurapolis Santé
12 Scientific production
12.1 Major publications
- 1 articleData Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder.IEEE Transactions on Pattern Analysis and Machine IntelligenceJune 2022HAL
- 2 articleIDNetwork: A deep illness‐death network based on multi‐state event history process for disease prognostication.Statistics in Medicine419April 2022, 1573-1598HALDOI
- 3 articleBayesian dose‐regimen assessment in early phase oncology incorporating pharmacokinetics and pharmacodynamics.BiometricsFebruary 2022HALDOI
- 4 articleMining Electronic Health Records for Drugs Associated With 28-day Mortality in COVID-19: Pharmacopoeia-wide Association Study (PharmWAS).JMIR Medical Informatics1032022, e35190HALDOI
- 5 articleSpatio-temporal mixture process estimation to detect population dynamical changes.Artificial Intelligence in Medicine1262022, 102258HALDOI
- 6 inproceedingsUsing an ontological representation of chemotherapy toxicities for guiding information extraction and integration from EHRs.Medinfo 2021 - 18th World Congress on Medical and Health InformaticsVirtual conference, AustraliaOctober 2021HAL
12.2 Publications of the year
International journals
International peer-reviewed conferences
Conferences without proceedings
Doctoral dissertations and habilitation theses
Reports & preprints
Other scientific publications
Patents