2023Activity reportProject-TeamHEKA
RNSR: 202124127N- Research center Inria Paris Centre
- In partnership with:INSERM, Université Paris Cité
- Team name: Health data- and model- driven approaches for Knowledge Acquisition
- In collaboration with:CENTRE DE RECHERCHE DES CORDELIERS
- Domain:Digital Health, Biology and Earth
- Theme:Computational Neuroscience and Medicine
Keywords
Computer Science and Digital Science
- A3.3. Data and knowledge analysis
- A3.4. Machine learning and statistics
- A6.1. Methods in mathematical modeling
- A6.2. Scientific computing, Numerical Analysis & Optimization
- A9.1. Knowledge
- A9.2. Machine learning
- A9.4. Natural language processing
- A9.6. Decision support
Other Research Topics and Application Domains
- B2.2. Physiology and diseases
- B2.3. Epidemiology
- B2.6. Biological and medical imaging
1 Team members, visitors, external collaborators
Research Scientists
- Sarah Zohar [Team leader, INSERM, Senior Researcher, HDR]
- Adrien Coulet [INRIA, Associate Professor Detachement, HDR]
- Jean Feydy [INRIA, Researcher]
- Nicolas Garcelon [Fondation Imagine]
- Agathe Guilloux [INRIA, Professor Detachement, HDR]
- Claire Leconte Rives-Lange [INRIA, Advanced Research Position, from Oct 2023]
- Moreno Ursino [INSERM, Researcher]
Faculty Members
- Stephanie Allassonnière [UNIV PARIS CITE, Professor, HDR]
- François Angoulvant [Détachement Univ de Lausanne, Professor, HDR]
- Anita Burgun [UNIV PARIS CITE - APHP, Professor, HDR]
- David Drummond [UNIV PARIS CITE - APHP, Associate Professor]
- Anne-Sophie Jannot [UNIV PARIS CITE - APHP, HDR]
- Sandrine Katsahian [UNIV PARIS CITE - APHP, Professor, HDR]
- Antoine Neuraz [UNIV PARIS CITE - APHP until 31/12/2023]
- Bastien Rance [UNIV PARIS CITE - APHP, Associate Professor]
- Brigitte Sabatier [APHP, Professor, HDR]
Post-Doctoral Fellows
- Nadim Ballout [INRIA, Post-Doctoral Fellow]
- Sarah Berdot [APHP, Post-Doctoral Fellow]
- Sandrine Boulet [INSERM, Post-Doctoral Fellow]
- Xiaoyi Chen [Institut Imagine, until Aug 2023]
- Jong Ho Jhee [INRIA, Post-Doctoral Fellow, from Aug 2023]
- Germain Perrin [APHP, Post-Doctoral Fellow]
- Pierre Sabatier [APHP, until Aug 2023]
- Rosy Tsopra [UNIV PARIS CITE - APHP]
PhD Students
- Safa Alsaidi [INRIA]
- Jean-Baptiste Baitairian [SANOFI, CIFRE, from Apr 2023]
- Nesrine Bannour [Univ Saclay]
- Linus Bleistein [ENS PARIS]
- Tom Boeken [UNIV PARIS CITE - APHP]
- Clément Chadebec [UNIV PARIS CITE, until Sep 2023]
- Pierre Clavier [LIX]
- Lea Comin [CEA, from Oct 2023]
- Aziliz Cottin [DASSAULT SYSTEMES, CIFRE]
- Charles De Ponthaud [UNIV PARIS - CITE, from Nov 2023]
- Benjamin Duputel [EXYSTAT, CIFRE, until Mar 2023]
- Thibaut Fabacher [UNIV STRASBOURG]
- Fabrice Gambaraza [APHP]
- Romain Jaquet [GHNE]
- Emilien Jemelen [Epiconcept, CIFRE, from Nov 2023]
- Enora Laas-Faron [INSTITUT CURIE]
- Judith Lambert [INSERM]
- Ivan Lerner [INSERM]
- Fabien Maury [INSERM, from Oct 2023]
- Juliette Murris [PIERRE FABRE, CIFRE]
- Lillian Muyama [INRIA]
- Sophie Quennelle [UNIV PARIS, until Nov 2023]
- Alice Rogier [INSERM, until 30/11/2023, then INRIA from 01/12/2023, until Nov 2023]
- Agathe Senellart [UNIV PARIS - CITE]
- Guillaume Serieys [UNIV PARIS - CITE]
- Stylianos Tzedakis [UNIV PARIS - CITE]
- Alexis Van Straaten [UNIV PARIS - CITE]
- Louis Vincent [IMPLICITY, CIFRE]
- Axel Vuorinen [INSERM, from Dec 2023]
Technical Staff
- Deycy Camila Arias Villamil [INRIA, until 31/12/2023, Engineer]
- Armelle Arnoux [AP/HP]
- Olivier Birot [INRIA, until 31/12/2023, Engineer]
- Kim Tam Huynh [Inria, until 31/08/2023, Engineer]
- Louis Pujol [UNIV PARIS - CITE, Engineer]
Interns and Apprentices
- Killian Bakong Epoune [INSERM, Intern, until Jul 2023]
- Lea Comin [AMU, Intern, until Jul 2023]
- Naomi Daval-Pommier [UNIV PARIS - CITE, Intern, from Feb 2023 until Aug 2023]
- Maïsen Hassani [INRIA, Intern, from Apr 2023 until Sep 2023]
- Pierre Olivo-Marin [INSERM, Intern, from Feb 2023 until Aug 2023]
- Mohamed Raissi [INRIA, Intern, from Jun 2023 until Jul 2023]
- Axel Vuorinen [ENSAI, Intern, from Apr 2023 until Sep 2023]
Administrative Assistant
- Meriem Guemair [INRIA]
2 Overall objectives
2.1 Context
Clinicians routinely have to take decisions upon the diagnosis and treatment of complex patients for which clinical guidelines do not provide clear recommendations or do not exist. This is particularly the case for very heterogeneous diseases (e.g., rare diseases or cancer in which clinical manifestations or response to treatment differ frequently from one individual to another) or when patients are seen for a new emerging disease for which no recommendation has been established yet, as it was the case in March, 2020 for the management of COVID-19 infections. Similar situations, leading to delicate decisions, happened in the past but, unfortunately, this experience is hardly taken into account or rationalized for clinicians. Indeed, data related to these past experiences are captured, but until now, these data are not accessible to clinicians and not transformed into high level evidences. This level of evidence is currently only reached by highly controlled analyses (such as controlled clinical trials), for which patients might differ strongly from those treated in routine care, or might have never been seen before in the case of new conditions. Besides, hospital information systems are used at every step of patient care, collecting continuously longitudinal data, both unstructured and structured, including clinical reports, drug prescriptions, physiology, laboratory results, imaging and omics data. These data may even be enriched with medical wearable devices or large-scale claims data such as those of the French Health Data Hub that informs on the global clinical course of patients. Likewise, progresses on artificial intelligence approaches, such as supervised machine learning, have found applications in healthcare enabling for instance clinical decision support systems, patient prioritization, drug repurposing or monitoring drug safety. However, many particularities of health data (e.g., their sensitivity, noisiness, incompleteness, heterogeneity and small volume when one very specific feature or several time points are required), and particularities of health data analysis (e.g., the risk of confoundedness, the need for explanations and fairness) make it challenging to develop tools that are both reliable and usable within hospital work- and patient- flows. For instance, precision medicine requires stratifying smaller and smaller groups of patients, which may seem contradictory in regards with the general strategy of deep learning that requires large amounts of data to be efficient. Another challenging task is the development of tools that enable gaining knowledge from data agilely, i.e., to update knowledge gain continuously (without compromising on reliability). In summary, methodological developments are required at each step of the health data chain, including: (1) data access, (2) data transformation e.g., via representation learning, (3) data analysis, predictive modelling and knowledge discovery with data- and model- driven approaches, and (4) agile, fast and reliable access to data, implementation of these approaches through applications such as decision support systems, medical devices, next generation clinical trials for the assessment of medical knowledge.
2.2 General aim
The main objective of HeKA is to develop methodologies, tools and their applications in clinics towards a learning health system, i.e., a health system that leverages clinical data collected to extract agilely and reliably novel medical knowledge that, in turn, continuously improves healthcare. Indeed, the availability of EHRs (Electronic Health Records), cohorts and other linked data such as the national Health Data Hub, offers the opportunity to develop models for stratification and prediction with the potential of improving the precision and the personalization of treatments, and thus the quality of healthcare.
3 Research program
The HeKA project-team is following 3 interdependent axes, that are (1) knowledge extraction from clinical data, (2) stochastic and data-driven predictive modelling of health data, and (3) data-driven and designs for next generation clinical trials. Theses axes can be view and interpreted form a patient care point of view as (1) from patient data to patient representations, (2) from patient data to prediction and decision, and (3) from models to improve patient-related knowledge, respectively. All these axes participate in the development of a learning health system. Axes 1 and 2 can be related to observational studies (either retrospective or prospective) and Axis 3 is related to interventional studies. As a remind, in observational studies the investigator is not acting upon study participants, but only observing relationships between factors and outcomes, while in interventional studies (i.e., clinical trials) the investigator intercedes as part of the study design.
3.1 Axis 1 - Knowledge extraction from clinical data
The development of clinical decision support and statistical predictive models has been historically made by manually selecting and tuning sets of predictive variables. This is a task-dependent and time-consuming operation that neglects most of the available data. Real-word data, such as EHRs or cohorts, offer an access to many variables, even those not initially thought of as predictive. For instance, EHRs consist of structured data, such as demographics, diagnosis, procedures, biological laboratory results, and medication exposures, which can be associated with unstructured data, such as clinical notes, discharge summaries, pathology and imaging reports. In addition, this core EHR data may be complemented with others, including images, omics data, patient-reported outcomes, or conversation transcriptions. However, the use of EHR data for any precision medicine application represents an initial and significant information extraction challenge because of their heterogeneity, incompleteness, and dynamic nature. The aim of this axis is to develop methods and tools for leveraging patients’ data in their wide variety and complexity. This encompasses the extraction and transformation of raw data into engineered, featured and learned representations of good quality that will enable or facilitate the development of further clinical decision support and knowledge discovery approaches, as those presented in Axes 2 and 3.
3.1.1 Methods
Methods developed in Axis 1 can be associated with three type of tasks: (i) deep phenotyping, (ii) patient representation and (iii) reasoning with clinical knowledge. (i) Deep phenotyping consists in defining algorithms that enable to identify patients with a particular and potentially complex profile within large healthcare databases. It encompasses the development of natural language processing tools capable of extracting complex features and their context out of clinical texts; it also includes the ability to consider simultaneously structured and unstructured data of these databases to identify relevant patients. To this aim our methods rely on expert rules, distant-supervision and deep learning language representations. (ii) Regarding patient representation we focus on two distinct kind of representations. The first one is an explicit representation of patients in the form of knowledge graphs, using Semantic Web standards and tools. The second one is a representation of patients within a latent space, using representation learning methods largely inspired from results obtained by deep learning models to learn language representations. (iii) Tasks concerned with reasoning on clinical knowledge are mutliple. It encompasses methods to measure patients’ similarity between elaborated patients representations (either in the form of knowledge graph or embeddings), hybrid approaches for analogical reasoning and logical and statistical inference.
3.2 Axis 2 - Stochastic and data-driven predictive modelling of health trajectories
The recent availability of high dimensional health data enables the emergence of data-driven models for description, analyses with further possibility to guide clinical decisions. In this high dimensional setting, machine learning-based prediction tasks have been demonstrated to be efficient, although they may not be the best option in every setting. We are interested in these borders between settings, where deep learning approaches fail, but alternatives succeed and reciprocally. Particularly, we are considering borders found in temporal modelling and small-sample settings. Health data provided by EHRs have several specificities among which: (i) patient care trajectories are high dimensional, and (ii) are censored, i.e., data are observed until a certain timepoint. Current models do not succeed in simultaneously tackling these two previous concerns. For example, patient trajectories include comorbidities pathways where one disease may impact the others. Embedding patient trajectories in prognostic models remains a challenge, especially, when low-sample-size high-dimension setting. Machine learning in such setting should be adapted and compared with efficient statistical learning models.
3.2.1 Methods
During this year, we focused on adapting machine learning based methods on the following challenges: signal detection using spatio-temporal data to be able to build health geographical vigilance system, medical image classification in low sample setting, pharmacovigilance signal detection using large database. We proposed a method to model and monitor population distributions over space and time, in order to build an alert system for spatio-temporal data changes based on a new version of the Expectation-Maximization (EM) algorithm to better estimate the number of clusters and their parameters at the same time. We validate this approach on a real data set of positive diagnosed patients to coronavirus disease 2019. We show that our pipeline correctly models evolving real data and detects epidemic changes. We proposed a new method to perform data augmentation in a reliable way in the High Dimensional Low Sample Size setting using a geometry-based variational autoencoder. We validated this approach on a medical imaging classification task where a small number of 3D brain magnetic resonance images are considered and augmented using the proposed framework. We adapted a very classical pharmacoepidemiological method to high dimensional setting, the Weigthed Cumulative Exposure statistical model, which makes it possible to model the temporal relationship between the prescription of a drug and a side effect, by implementing it using Graphics Processing Unit (GPU) programming. We analysed several real life datasets using such implementation and show that it was possible using this adaptation to apply this method to the National Health Insurance Database (SNDS).
3.3 Axis 3: Data-driven and designs for next generation clinical trials
New model-based fundamental researches could help in screening patients and predicting response prior to clinical trials, thanks to biomarkers or EHRs based predictive modelling of patients’ response.Digital Medical Devices, are health technologies falling into the definition of medical devices intending to “Prevention, diagnosis, monitoring, treatment or alleviation of disease” (Regulation EU 2017/745) and are mostly based on machine learning algorithms. Among them, learning software as medical devices technologies (SaMD) now have the potential to be translated to patients’ care. As such, they should be evaluated in clinical trials. However, by contrast to drugs for which the chemical formula does not change over time, the performances of a predictive model are constantly enriched by new observational data. The challenge we face now is to be able at any time to ensure the safety and efficacy of SaMD and other advances, for patient care. The objective of this axis is twofold : how clinical trials can help machine learning? And inversely, how can machine learning and other information sources help clinical trials? Accordingly, the first objective is to propose clinical trial methods and designs adapted to continuous learning and adapting tools (DMDs). The second objective, is to produce innovative clinical trial designs acquiring all possible patient-related knowledge: disease and translational models, EHRs, clinical trial data and synthetic patients.
3.3.1 Methods
This year we focused more on the second objective. We developed (i) dose-finding methods that include pharmacokinetics and pharmacodynamics modelling to better choose the appropriate dose-regimen in oncology ; (II) methods based on logistic regression and random effects meta-analysis to syntetise information from multiple early phase trials with several doses ; (iii)methods which allow for analsing disruped clinical trial by COVID pandemic ; (iv) deep learning models that allow to analyse multistate EHR ...
4 Application domains
4.1 Multimodal approaches generalizable for several diseases
4.1.1 PEPR Digital Health
The PEPR ("Programmes et équipements prioritaires de recherche") Digital Health aims at gathering national multidisciplinary community active in digital health for the development and exploitation of the concept of digital twin in health (started in September 2023).
HeKA’s involvement in this PEPR are the following; (i) within project ShareFAIR, to learn protocols from clinical data collected along healthcare activity in Electronic Health Records (EHRs) to explicitize the medical decision processes, (steps to reach a particular diagnosis or therapeutic choice) and the management of particular conditions (steps in the management of a particular condition). Protocols extracted from EHRs provide a view on the real-word clinical practice and may then be compared together or with CPG (clinical practice guidelines) which can be seen as more theoretical protocols in that they provide recommendations, or clinical pathways (CP) to standardize clinical practice. It will be applied within NEUROVASC in the impact reduction of intracranial aneurysm and stroke, in which we will extract and the proposed clinical pathways, (ii) within REWIND, to develop of new mathematical and statistical approaches for the analysis of multimodal multiscale longitudinal data to predict patient’s response. These models will be designed, implemented as prototypes and then transferred to an easy-used-well-documented platform where people from diverse communities, in particular physicians, will be able to use them on their own data set, (iii) within DIGPHAT, to develop Bayesian modelling of meta-models pathways for the development of digital pharmacological twin; it consists in the analysis of data from omics experiments and selection of relevant covariables and to combine meta-models in pathways to select the most reliable twin model, (iv) within project M4DI, to develop a generic method for identifying subgroups of patients with the same phenotype from health databases, using jointly variable correlations and expert data, and to implement it within a computer package, (v) all these previous projects will purpose models or Clinical Decision Support Systems (CDSS) to be translated to clinical practice, however, proof based on data only is not sufficient and it should be evaluated in real life through prospective and interventional clinical trials or studies, this will be done within SMATCH. In this project we will propose new methodological paradigms for the clinical evaluation of Digital Medical Devices (DMD) including CDSS and AI based models and algorithms.
4.1.2 SurvivalGPU – “Using Graphics Processing Units (GPUs) to scale up survival analysis to nation-wide cohorts"
The recent availability of health insurance databases such as the SNDS opens the door to the detection of adverse drug reactions in the general population. The aim is to generalize the survival analyses usually carried out during clinical trials on cohorts of N=1k to 10k patients to the full French population. This line of research is appealing but poses real methodological challenges. Notably, it requires the development of statistical analysis models that meet the robustness and interpretability requirements of public health physicians while taking full advantage of recent hardware accelerators to scale up to millions of patients per cohort. In this context, our team has been working since 2022 on an efficient re-implementation, on Graphics Processing Units (GPUs), of the standard software tool in the field: the R package "survival". The new "survivalGPU" library (https://github.com/jeanfeydy/survivalGPU) leverages recent software tools (PyTorch, PyG, KeOps, reticulate) to bridge the gap between high-performance computing and traditional survival analysis. It now provides a complete re-implementation of the Cox proportional hazards model that is around 100 times faster on GPU than the survival package on CPU. Going further, it supports time-varying drug exposures via the Weighted Cumulative Exposures model and is accessible via an R interface which is fully retro-compatible with that of the survival package. We now intend to perform extensive validation and comparison with other models, prior to pharmaco-epidemiological studies on the SNDS data via the Health Data Hub platform.
4.1.3 Messidore-Inserm BEEP - “Bayesian methods for Early Enriched Platform trials"
The recent pandemic has shown the need of speeding up the clinical trial development of novel or repurposed therapies. Indeed, following the usual drug development paradigm, where clinical trial phases are performed sequentially and separately, the time required to the full process easily exceeds a decade. Our objective is to propose innovative Bayesian enriched “platform” designs for early phase trials, which are adapted to the clinical context and go towards precision medicine. Since we are focusing on early phases of clinical trial, in this setting “platform” cannot be linked to classical RCT. Thus, we aim at defining how “platform” trial should be translated into these early phases. As in the original definition, early platform phases will allow for flexibility, such as adding new arm or stopping treatments for futility (and/or safety in our case). The word enriched refers to the use of new information, or at least not usually used in such early trials, such as positron emission tomography (PET) scan, pharmacokinetics/pharmacodynamics (PK/PD) modelling, mathematical modelling of immune responses, and to the enrichment of the enrolled patient based on their biomarkers. The project is built around workpackages (WPs). In WP1, we develop platform trials in phase 0/I, based on PET-scan; microdosing on several (preclinical) animal species and humans will be adaptively compared, added or deleted to better characterize the extrapolation to human. In WP2, we develop phase I/II dose-finding trial using PK/PD or mechanistic PD models. In WP3, enrichment designs for phase I/II, in survival settings, are proposed when selected biomarkers are available, and the design will be extended in case of combination therapies.
4.1.4 ANR AT2TA - “Analogies: from Theory to Tools and Applications"
Analogical reasoning is a kind of reasoning that is based on finding a common relational system between two situations, exemplars, or domains. In computer science, analogical reasoning can be supported by two main axes of artificial intelligence: knowledge representation and reasoning, and machine learning. The AT2TA projet particularly aims at studying the role that machine learning can play in analogical reasoning; and the HeKA team is in charge of exploring the application of their interplay in the healthcare domain. A PhD student, co-supervised with Inria Paris, IHU Imagine and Université de Lorraine, is learning representations of patients, relying on clinical texts, and study how these representations can first compose analogical propositions, and second serve as bricks to a machine learning architecture for analogical reasoning.
4.1.5 HDH BOAS ADHERENCE - “Phenotyping algorithms for the assessment of therapeutic adherence and its clinical consequences in the management of chronic diseases"
Therapeutic adherence is a complex and multifactorial phenomenon that qualifies the degree to which the patient conforms to medical prescriptions. This phenomenon has both detrimental clinical consequences for the patient (in terms of prognosis and quality of life), and negative economic consequences for the society. One difficulty in the study of adherence is that it measurement is mainly indirect and tainted with many bias. Another difficulty is that adherence is not an end, but should be considered through the lens of patient outcomes. The ADHERENCE project, funded under the Health Data Hub BOAS program (Bibliothèque Ouverte d’Algorithmes en Santé) aims at proposing clearly documented measurements for the level of patient adherence and of its potential clinical outcomes in the context of three chronic diseases (cancer, hypertension and transplantation). One challenge here is to enable these measurements in both EHRs from hopsitals and claims from the national insurance database (SNDS).
4.1.6 iDEMO Meditwin - Dassault Systems - "Virtual twin for personalised medicine "(stating in 2024)
Meditwin is a collaborative project funded by BPI ("Banque Publique d'Investissement") with Dassault Systems (leader of the projects), Inria, IHU institutes across France and Medtech startups. The aim of the project is to provide a digital platform relying on virtual twins of individuals who faithfully reproduce their state of health and which make it possible to test different therapeutic options. It will promote interdisciplinarity by facilitating interoperability of multimodal medical data. Our team will use AI approaches to propose Clinical Decision Support Systems (CDSS) in cardiovascular diseases and cancer. We will also develop the clinical trial methodology evaluation these CDSS. In particular, HeKA will develop stratification and classification algorithms, synthetics patient's generators, statistical and mathematical models for multi-modal and multidimensional health data and clinical evaluation methods for the resulted CDSS as Digital Medical Devices.
4.2 Cancer
4.2.1 EraPerMed PeCAN (Personalized medicine in CANcer) (EU funding 2020-2023)
Breast cancer is the most common cancer in women worldwide. The most aggressive type is triple negative breast cancer (TNBC), characterized by a very poor prognosis. New targeted and personalized therapies are urgently required. In this context, the European project PeCAN aims at developing and assessing new "AI algorithms" able to predict treatment response and personalize therapy of TNBC patients. They will propose new approaches for personalization of drug therapy based on a deep molecular characterization of individual tumors and patients and the establishment and use of digital medicine approaches to model effects and side effects of all therapy options. In these projects, our team is in charge of developing multi-scale models that integrate more clinical data in the models (as biomarkers, patients' characteristics, chemo and targeted molecule treatments, biology, clinical outcomes, etc.). We are designing a framework for assessing the clinical performance and safety of the AI algorithms itself, by using real-world data collected in clinical data warehouses and biobanks. These projects are associated with Axes 1-3.
4.2.2 SIRIC InsiTu - “Insights into cancer: from inflammation to Tumor"
To turn scientific knowledge into sustainable healthcare, cancer research must identify who is at risk of cancer, when and in whom a new cancer arises, and how best to treat it and gauge response. Aligned with Europe's Beating Cancer Plan, InsiTu takes on the three challenges of cancer prevention, interception, and treatment in digestive, lung, skin cancer, and heme malignancies. Chronic inflammation is a key cancer niche fostering tumor initiation. Leveraging a transformative Tissue-Hub interfacing diagnostics and research, our program ‘From inflammation to clonal emergence and cancer’ will unite experts in chronic diseases damage to monitor patients with chronic tissue inflammation and cancer predisposition, mirrored by animal modelling, to understand the critical transition from chronic tissue damage to cancer progression, opening opportunities for prevention and interception. Such longitudinal (and sometimes invasive) interactions between patients and healthcare practitioners can be improved by empowering patients, taking psychic, social and ethical dimensions in consideration. Our program ‘Imaging cancer and its environment’ will take a different approach to this challenge. Through synergetic interactions with mathematicians and physicists, it will provide novel frameworks for multiscale integration of molecular alterations, cellular processes, and tissue complexity. This effort will result in image- based, non-invasive ‘virtual biopsies’ as proxies of key biological processes underlying tumor heterogeneity and drug resistance. Along with novel biomarkers such as circulating extracellular vesicles, these virtual biopsies will gauge responses to new therapeutic approaches developed in our third program ‘From new targets to new trials’. There, experts in leukemias and skin cancers will use cutting-edge in vivo functional screens and multi-omic interrogation of Tissue-Hub samples to identify new targetable vulnerabilities and develop next-generation cell-based immunotherapies. To fasten the transfer of these innovations into care, new adaptive clinical trial designs will be engineered.
4.2.3 Combo - Sanofi - "Evaluating drug combinations in oncology with Real-World Data and state-of-the-art knowledge"
Combo is a collaborative project with Industry, national health data platforms and cancer institute: Sanofi Pharma, The Health Data Hub, Centre Léon Berard and Inria-Inserm-HeKA. The objective of the project is to identify promising families of drug combinations in oncology using multisource and multi-modal data modelling and prediction, including RWD (cancer patients’ care data from CLCC cancer centre), genomic public databases, literature, clinical trials depository and expert’s opinion. Once these combinations will be identified mechanistic models will be used to determine dose-regimen and build dose-finding trial designs for the combinations to be evaluated through formal clinical trials. In this project we lead the following WPs (1) AI based analysis of the multimodal RWD and subgroup discovery for the identification of relevant combinations, (2) Bayesian multi-modal analysis accounting RWD modelling as well as expert’s opinion and literature and public clinical platforms AI analysis from Sanofi, and (3) proposing candidates and designs for future phase I designs associated with dose, regimen and associated molecule for the combination on selected family of combinations using preclinical PK/PD model.
4.2.4 RHU OPERANDI - “Optimisation and imProved Efficacy of targeted RAdioNuclide therapy in Digestive cancers by Imagomics"
Advanced stage hepatocellular carcinoma (HCC) and gastroenteropancreatic neuroendocrine tumours (GEP-NET) are currently treated with targeted radionuclide therapy (TRT), a highly advanced method that consists of either intra- arterial injections of radioactive microspheres (transarterial radioembolisation - TARE) or targeted peptides radioactively labelled and administered systemically (Peptide Receptor Radionuclide Therapy - PRRT). While highly effective, patient stratification and early identification of responders are currently managed insufficiently due to the lack of pertinent imaging biomarkers, either non-invasive or invasive. Furthermore, therapy-induced DNA damage leads to tumour resistance, reducing TRT efficacy. We aim to overcome those current limits through the OPERANDI project via innovative approaches in engineering, novel imaging biomarkers, and new concepts for DNA repair mechanisms, combined with a fundamental understanding of causal links. Our ambitions go beyond the current state-of-art, embracing even new combinations of drugs and -emitters to enhance dose localization and efficiency. Methodology will try to understand fundamentally whether current patient management using CT/PET/MRI allows to predict response and survival using cutting edge imaging-based artificial intelligence (AI) approaches in combination with data augmentation techniques to reach statistical significance.
4.3 Rare diseases and pediatrics
4.3.1 EU INVENTS Horizon project - “Innovative designs, extrapolation, simulation methods and evidence-tools for rare diseases addressing regulatory needs" (starting in 2024)
The evaluation of new medicines for rare diseases (RD) including rare paediatric RDs is challenging for several reasons, among which are the small patient sample sizes, heterogeneity of patients and diseases and heterogeneity in disease knowledge. Due to these difficulties, access to effective treatments and the number of treatment options are often limited in RDs. INVENTS aims to provide clinical trial trialists, researchers and regulators with a global framework encompassing methods, workflows and evidence assessment tools to be implemented in orphan and paediatric drug development. Our ambition is to significantly improve the evaluation of evidence and regulatory decision-making through the development and validation of: refined longitudinal model-based diseases trajectories and treatment effect, improved extrapolation models, in silico trials (e.g., virtual patient cohorts), optimised model-based clinical trial designs and evidence synthesis methods. These will be evaluated through simulation studies and tested on extensive data from a range of use cases provided by our industrial partners Roche and Novartis and Real World data (RWD) from RD registry. The INVENTS framework will improve consistency and efficiency of the drug evaluation process for RD by augmenting clinical evidence without compromising its scientific integrity and providing regulators assessment credibility criteria. At the end of this 5 years project, the European industry will be able to exploit novel and improved clinical trial designs, in silico trials and RWD analysis approaches supporting drug development in RD. The European Medicine Agency and European national regulators (including Health Technology Assessment bodies) will be supplied with a general framework allowing better informed decision making. Most importantly, RD patients will benefit from an increased and faster access to efficacious and safe treatments.
4.3.2 EU MSCA Doctoral Networks Orgestra project (starting 2024)
Organoids experimental models are in vitro 3D cell cultures which can be generated from embryonic stem cells, induced pluripotent stem cells or adult stem cells, and can replicate organs functionally and structurally. Their physiological resemblance to target organs and ability to cryopreserve make organoids a powerful tool for biomedical research and advancing understanding of the mechanisms underlying certain disorders, including rare diseases. The ORGESTRA Joint Doctoral Network will propose innovative organoid technologies for two genetic disorders, i.e., cystic fibrosis and cystinosis. In this project we will supervise 2 PhDs project which will propose statistical development for; (1) linking in silico trials to organoids data and innovative trial design. These designs will incorporate biomarkers-based findings, as organoids, i.e., that reduce unnecessary exposure of patients (screening) or allow drugs to be screened more effectively for non-effectiveness before embarking on human trials. This will be done via a joint doctoral degree with University of Utrecht. (2) Estimand framework involving Bayesian principles on organoids data for clinical trial outcomes and models. The estimands framework will be based on expert’s elicitation to understand which questions are more relevant in term of clinical efficacy/toxicity, to select the proper outcomes, to identify the possible intercurrent events and to provide a robust statistical model whose parameters will be estimated under a Bayesian setting. This will be done via a joint doctoral degree with Katholieke Universiteit Leuven.
4.3.3 CIL LICO- Ciliopathies: group of disorders associated with genetic mutation leading to rare and severe genetic diseases - Projet RHU - Recherche Hospitalo-Universitaire- 20118-22
Ciliopathies are a large group of rare and severe genetic diseases caused by ciliary dysfunction associated with clinical and genetic heterogeneity, as well as a lack of knowledge on patients' natural history (i.e., the evolution of the disease). The aim of this RHU-3 project, funded through the Investissements d'avenir program, is to develop innovative, diagnostic, prognostic and tailored therapeutic approaches for patients suffering from ciliopathies to prevent them from developing renal failure. Following this aim, we will develop a mechanistic stratification of ciliopathies, in order to regroup suspected and already diagnosed ciliopathies in a treatment-orientated classification. One goal is to engineer a ready-to-use bio-kit for assessing both diagnosis and prognosis of developing renal alteration, in ciliopathy patients. Finally, personalized therapeutic approaches will be proposed for patients using predictive approaches. We will use methods developed in Axis 1 as well as methods proposed in Axis 2 to model patient trajectory. This project is in collaboration with Imagine Institute, AP-HP, the Medetia company and Ecole Polytechnique.
4.3.4 BNDMR- Banque Nationale des Maladies Rares
The French National Registry for Rare Diseases (BNDMR) is a national tool for epidemiology and public health purposes in the field of rare diseases. In line with the objectives defined by the 2nd and 3rd French National Plan for Rare Diseases, the BNDMR team develops a secure national information system which gathers anonymized clinical data of patients affected by rare diseases in its BNDMR data warehouse. As medical head of the BNDMR, AS Jannot has several research projects strongly connected with HeKA team including CDE.AI and Dromos project. CDE.ai aims to create a set of natural language processing algorithms that will allow the semi-automatic completion of the rare disease minimal data set that is currently completed manually for all patients followed up in the rare disease expert centres. In this project, we will use methods developped in Axe 1 (collaboration with N Garcelon). The DROMOS project is a project that uses the National Data Bank for Rare Diseases by linking it to health insurance data. This matching will allow the description of the care of rare disease patients at the national level for rare diseases, including the characteristic care of the most frequent rare diseases. We will use methods developped in the from of Axis 2 to model these longitudinal data.
4.3.5 Genomic variability transversal program (2018-2023)
The objective of the INSERM Genomic Variability cross-cutting programme is to understand the role played by genes and their variants on the development of pathologies. This programme, which is based on the longitudinal follow-up of cohorts of individuals and their phenotyping, aims to promote the development of new methods for the analysis of longitudinal data. In this program, we develop in the frame of axis 2 longitudinal approaches to define patients trajectories from truncated data such as those in France National Health Database (SNDS).
4.4 Other diseases
4.4.1 Antibiotic resistance – FAIR project EU Horizon (on going)
The aim of the FAIR project is to evaluate Flagelin aerosol therapy for stimulation of immunity as an alternative treatment against pneumonia with multidrug resistant bacteria. In this project, we are developing a full model using pharmacometrics expertise as well as statistical designs for extrapolation purpose and the design of dose-finding study in healthy volunteers. As written above, in this project, S Zoharvco-lead along with C Kloft (Freie Universitaet Berlin) the WP entitled “Development of a translational modelling and simulation platform for flagellin PK/PD”. The aim of the WP is to propose an optimal design for the first-in-man clinical trials, maximizing knowledge gained from in vitro experimentation, expert knowledge and pre-clinical experiments along the way. By incorporating mechanistic approaches earlier in the development process along with a continuous learning modelling under Bayesian inference, we hope to increase the probability of success of the translation process to the clinical setting and thus, optimizing the statistical design and sample size. This project is in relation with axis 3.
4.4.2 Virtual reality (Ongoing)
Several projects led at the HEGP are currently ongoing to evaluate the analgesia provided by the use of Virtual Reality in different care settings (extracorporeal lithotripsy, after colorectal cancer surgery, and fiberoptic bronchoscopy in critical care). In these projects, not so close but still related to axis-2, we will provide methodological approach and use statistical methods to conclude on the clinical questions, by working closely with all Coordinating Investigators (Prof. D Clausse, G Manceau and A Rastello)
5 Social and environmental responsibility
5.1 Impact of research results
Our methods and designs are applied in collaboration with medical research team members at HEGP and Necker hospitals (among others). During the last three years, and in consequence of the COVID-19 sanitary pandemic, we developed and deployed tools for knowledge extraction from clinical text to the central clinical data warehouse of the AP-HP (Entrepôt de Données de Santé, AP-HP) . Through others, these tools enable the recognition of named entities, their context (e.g., negation, personal or family history, hypothesis). They have been widely reused for the development of clinical studies, which led to 13 international publication of clinical epidemiology related to COVID-19. It also serve as a basis for our implication in the 4CE international consortium (https://www.covidclinical.net/), which led to 15 publications. These tools have been refactored and their use is facilitate within the medkit library.
6 Highlights of the year
- David Drummond- nomination as MCU-PH in Sept 2023.
- Stéphanie Allassonnière et Anne-Sophie Jannot - nomination at "Mission intergouvernemental sur les données de santé"
- New fundings: PEPR Digital Health (2 projects as PI and 3 as WP leaders), EU Horizon cluster Health as PI, EU MSCA as WP leader, idemo MEDITWIN (please see section Application domains for details)
6.1 Awards
Jean Feydy: "prix science ouverte du logiciel libre de recherche 2023" du ministère de l’Enseignement supérieur et de la Recherche, dans la catégorie "espoir - catégorie documentation"
7 New software, platforms, open data
7.1 New software
7.1.1 medkit
-
Name:
a toolkit for a learning health system
-
Keywords:
Learning health system, Biomedical data, Decision support, Python, Information extraction, Natural language processing, Audio signal processing, Machine learning
-
Functional Description:
This library aims at (1) facilitating the manipulation of healthcare data of various modalities (e.g., structured, text, audio data) for the extraction of relevant features and (2) developing supervized models from these various modalities for decision support in healthcare.
-
Release Contributions:
This release 0.3.1 fixes a pip install issue found for release 0.3.0.
- URL:
-
Contact:
Adrien Coulet
-
Participants:
Deycy Camila Arias Villamil, Olivier Birot, Kim-Tam Huynh, Antoine Neuraz, Ivan Lerner, Bastien Rance, Adrien Coulet
7.1.2 Pythae
-
Keywords:
Generative Models, Benchmarking, Reproducibility
-
Functional Description:
This library implements some of the most common (Variational) Autoencoder models under a unified implementation. In particular, it provides the possibility to perform benchmark experiments and comparisons by training the models with the same autoencoding neural network architecture. The feature make your own autoencoder allows you to train any of these models with your own data and own Encoder and Decoder neural networks. It integrates experiment monitoring tools such wandb, mlflow or comet-ml and allows model sharing and loading from the HuggingFace Hub in a few lines of code.
- URL:
-
Contact:
Clement Chadebec
7.1.3 Pyraug
-
Keywords:
Generative Models, Data augmentation
-
Functional Description:
This library provides a way to perform Data Augmentation using Variational Autoencoders in a reliable way even in challenging contexts such as high dimensional and low sample size data.
- URL:
-
Contact:
Clement Chadebec
8 New results
The team have generated many results in the last year, here are five illustrations for each axis.
8.1 Axis 1
Neuraz, A., Lerner, I., Birot, O., Arias, C., Han, L., Bonzel, C.L., Cai, T., Huynh K.T., Coulet, A. (2023). TAXN: Translate Align Extract Normalize, a multilingual extraction tool for clinical texts. Studies in health technology and informatics, To appear.
Several studies have shown that about 80% of the medical information in an electronic health record is only available through unstructured data. Resources such as medical terminologies in languages other than English are limited and restrain the NLP tools. We propose here to leverage English based resources in other languages using a combination of translation, word alignment, entity extraction and term normalization (TAXN). We implement this extraction pipeline in an opensource library called “medkit”. We demonstrate the interest of this approach through a specific use-case: enriching a phenotypic dictionary for post-acute sequelae in COVID-19 (PASC). TAXN proved to be efficient to propose new synonyms of UMLS terms using a corpus of 70 articles in French with 356 terms enriched with at least one validated new synonym. This study was based on freely available deeplearning models.
Participants: Neuraz A ., Lerner I ., Birot O ., Arias C ., Huynh KT ., Coulet A ..
8.2 Axis 2
We adapted and developed innovative methods based both on machine and deep learning for longitudinal health data in high dimensional setting in different contexts including cluster detection using geographic data, image classification, cut-point detection of prognostic factor, drug side effect detection.
Digital Respiratory Healthcare. Edited by Hilary Pinnock, Vitalii Poberezhets and David Drummond. Book | Published in 2023 DOI: 10.1183/2312508X.erm10223 ISBN (electronic): 978-1-84984-173-3
Respiratory care is undergoing a period of major change as it cautiously begins to embrace digital transformation. Catalysed by the need for remote consultation in the pandemic, time-honoured approaches to delivering care are now being challenged by technology-based initiatives. This Monograph deftly guides the reader through the potential benefits and pitfalls of such change, breaking the discussion down into three areas: technological opportunities and regulatory challenges ; social benefits, challenges and implications; exemplars of digital healthcare. Each chapter reviews contemporary literature and considers not ‘if’ but ‘how’ a digital respiratory future can provide optimal care. The result is an authoritative, balanced guide to developing digital respiratory health.
Participants: Drummond D ..
Lambert J, Leutenegger AL, Jannot AS, Baudot A. Tracking clusters of patients over time enables extracting information from medico-administrative databases. J Biomed Inform. 2023 Mar;139:104309. doi: 10.1016/j.jbi.2023.104309. Identifying clusters (i.e., subgroups) of patients from the analysis of medico-administrative databases is particularly important to better understand disease heterogeneity. However, these databases contain different types of longitudinal variables which are measured over different follow-up periods, generating truncated data. It is therefore fundamental to develop clustering approaches that can handle this type of data. We propose here cluster-tracking approaches to identify clusters of patients from truncated longitudinal data contained in medico-administrative databases.We first cluster patients at each age. We then track the identified clusters over ages to construct cluster-trajectories. We compared our novel approaches with three classical longitudinal clustering approaches by calculating the silhouette score. As a use-case, we analyzed antithrombotic drugs used from 2008 to 2018 contained in the Échantillon Généraliste des Bénéficiaires (EGB), a French national cohort. Our cluster-tracking approaches allow us to identify several cluster-trajectories with clinical significance without any imputation of data. The comparison of the silhouette scores obtained with the different approaches highlights the better performances of the cluster-tracking approaches. The cluster-tracking approaches are a novel and efficient alternative to identify patient clusters from medico-administrative databases by taking into account their specificities.
Participants: Lambert J ., Jannot AS ..
Korb-Savoldelli V, Tran Y, Perrin G, Touchard J, Pastre J, Borowik A, Schwartz C, Chastel A, Thervet E, Azizi M, Amar L, Kably B, Arnoux A, Sabatier B. Psychometric Properties of a Machine Learning-Based Patient-Reported Outcome Measure on Medication Adherence: Single-Center, Cross-Sectional, Observational Study. J Med Internet Res. 2023 This study aimed to create and validate a PROM on medication adherence interpreted using an ML approach. This cross-sectional, single-center, observational study was carried out a French teaching hospital between 2021 and 2022. Eligible patients must have had at least 1 long-term treatment, medication adherence evaluation other than a questionnaire, the ability to read or understand French, an age older than 18 years, and provided their nonopposition. Included adults responded to an initial version of the PROM composed of 11 items, each item being presented using a 4-point Likert scale. The initial set of items was obtained using a Delphi consensus process. Patients were classified as poorly, moderately, or highly adherent based on the results of a medication adherence assessment standard used in the daily practice of each outpatient unit. An ML-derived decision tree was built by combining the medication adherence status and PROM responses. Sensitivity, specificity, positive and negative predictive values (NPVs), and global accuracy of the final 5-item PROM were evaluated. We created an initial 11-item PROM with a 4-point Likert scale using the Delphi process. After item reduction, a decision tree derived from 218 patients including data obtained from the final 5-item PROM allowed patient classification into poorly, moderately, or highly adherent based on item responses. We developed a medication adherence tool based on ML with an excellent NPV. This could allow prioritization processes to avoid referring highly adherent patients to time- and resource-consuming interventions. The decision tree can be easily implemented in computerized prescriber order-entry systems and digital tools in smartphones. External validation of this tool in a study including a larger number of patients with diseases associated with low medication adherence is required to confirm its use in analyzing and assessing the complexity of medication adherence.
Participants: Sabatier, P ., Perrin G ., JArnoux A ..
8.3 Axis 3
Duputel B, Stallard N, Montestruc F, Zohar S, Ursino M. Using dichotomized survival data to construct a prior distribution for a Bayesian seamless Phase II/III clinical trial. Stat Methods Med Res. 2023 May;32(5):963-977. doi: 10.1177/09622802231160554. Master protocol designs allow for simultaneous comparison of multiple treatments or disease subgroups. Master protocols can also be designed as seamless studies, in which two or more clinical phases are considered within the same trial. They can be divided into two categories: operationally seamless, in which the two phases are separated into two independent studies, and inferentially seamless, in which the interim analysis is considered an adaptation of the study. Bayesian designs are scarcely studied. Our aim is to propose and compare Bayesian operationally seamless Phase II/III designs using a binary endpoint for the first stage and a time-to-event endpoint for the second stage. At the end of Phase II, arm selection is based on posterior (futility) and predictive (selection) probabilities. The results of the first phase are then incorporated into prior distributions of a time-to-event model. Simulation studies showed that Bayesian operationally seamless designs can approach the inferentially seamless counterpart, allowing for an increasing simulated power with respect to the operationally frequentist design.
Participants: Duputel B ., Zohar S ., Ursino M ..
Yap C, Rekowski J, Ursino M, Solovyeva O, Patel D, Dimairo M, Weir CJ, Chan AW, Jaki T, Mander A, Evans TRJ, Peck R, Hayward KS, Calvert M, Rantell KR, Lee S, Kightley A, Hopewell S, Ashby D, Garrett-Mayer E, Isaacs J, Golub R, Kholmanskikh O, Richards DP, Boix O, Matcham J, Seymour L, Ivy SP, Marshall LV, Hommais A, Liu R, Tanaka Y, Berlin J, Espinasse A, de Bono J. Enhancing quality and impact of early phase dose-finding clinical trial protocols: SPIRIT Dose-finding Extension (SPIRIT-DEFINE) guidance. BMJ. 2023 Oct 20;383:e076386. doi: 10.1136/bmj-2023-076386. PMID: 37863491. SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) 2013 provides guidance for clinical trial protocol writing. However, neither the original guidance nor its extensions adequately cover the features of early phase dose-finding trials. The SPIRIT Dose-finding Extension (DEFINE) statement is a new guideline that provides recommendations for essential items that should be provided in the protocols of these trials. It details extensions to the SPIRIT 2013 guidance, incorporating 17 new items and modifying 15 existing items. The purpose of this guideline is to promote transparency, completeness, reproducibility of methods, and interpretation of early phase dose-finding trial protocols. It is envisioned that the resulting improvements in the design and conduct of early phase clinical trials will ultimately reduce research inefficiencies and inconsistencies, driving transformational advances in clinical care.
Participants: Ursino M ..
Hua, W., Mei, H., Zohar, S., Giral, M., and Xu, Y. (2022). Personalized dynamic treatment regimes in continuous time: a Bayesian approach for optimizing clinical decisions with timing. Bayesian Analysis, 17(3), 849-878. We developed a two-step Bayesian approach to optimize clinical decisions, estimating personalized optimal dynamic treatment regimes (DTRs), taking into account intervention timing. In the first step, we build a generative model for a sequence of medical interventions with a marked temporal point process (MTPP) where the mark is the assigned treatment or dosage. Then this clinical action model is embedded into a Bayesian joint framework where the other components model clinical observations including longitudinal medical measurements and time-to-event data conditional on treatment histories. In the second step, we proposed a policy gradient method to learn the personalized optimal clinical decision that maximizes the patient survival by interacting the MTPP with the model on clinical observations while accounting for uncertainties in clinical observations learned from the posterior inference of the Bayesian joint model in the first step.
9 Bilateral contracts and grants with industry
9.1 Bilateral Grants with Industry
- Dassault Systèmes - Cifre: S. Katsahian and A. Guilloux are supervising Aziliz Cottin on the project: Survival dynamic prediction for personalization of cancer patients’ follow-up,
- Exystat - Cifre: S. Zohar and M. Ursino are co-supervising Benjamin Duputel on the project: Platform Phase II/III seamless clinical trials Bayesian designs),
- Pierre Fabre - Cifre: S. Katsahian is supervising Juliette Murris on the project: prediction of large-scale recurrent events in digestive cancers,
- Implicity - Cifre: S. Allassonnière is supervising Louis Vincent on latent modelling of cardiac time series including external information. A special focus is on missing data and heterogeneous populations.
- Sanofi - Cifre S. Katsahian and A. Guilloux are co-supervising Jean-Baptiste Baitairian on the project: Quantitative Bias Assessment for causal inference.
- Epiconcept - Cifre Sandrine Katsahian and Agathe Guilloux are co-supervising Emilien Jemelen on the project: Evaluation of the contribution of the use of artificial intelligence in the French breast cancer screening program.
- Combo - Sanofi - "Evaluating drug combinations in oncology with Real-World Data and state-of-the-art knowledge". Adrien coulet, Moreno Ursino and Sarah Zohar are part of this project. Please see section 4 for further details.
10 Partnerships and cooperations
10.1 European initiatives
10.1.1 Horizon Europe
INVENTS - Sarah Zohar is the PI of this EU Horizon project. Please see section 4 for further details.
10.1.2 H2020 projects
FAIR - Sarah Zohar is a WP leader in this H2020 project. Please see section 4 for further details.
10.1.3 Other european programs/initiatives
The European taskforce lead by EIT Health and French Ministry of Solidarity and Health, for the “harmonization of clinical studies criteria and methodologies in Europe for the evaluation of digital medical devices”. In this taskforce, Sarah Zohar co-lead the WP2 on “Evidence in clinical evaluation” with Corinne Collignon (Head of the Digital Mission at HAS, France) and Barbara Höfgen (Head of the Unit DiGA-Fast-Track at Bfarm, Germany)
EU MSCA Doctoral Networks -Project ORGESTRA "Organoid technologies for disease modeling, drug discovery and development for rare diseases". Sarah Zohar is the WP leader of methodological design for using organoids outcomes and transfer it to clinic.
10.2 National initiatives
ReCAP - Moreno Ursino is the referent of the « Early trials » group.The national network RECaP of Research in Clinical Epidemiology and in Public Health has for purpose to mutualize original research projects and to produce innovations in clinical epidemiology and public health.
11 Dissemination
11.1 Promoting scientific activities
11.1.1 Journal
Member of the editorial boards
Sarah Zohar is Associated Editor of two scientific journals; "Biometrics" and "Statistics in Biopharmaceutical Research"
Bastien Rance is member of the editorial board of the "International Journal ofMedical Informatics".
Reviewer - reviewing activities
Members ofHeKA are regular reviewers for the following journals: ScientificData, Scientific Reports, SemanticWeb Journal, Journal of the SemanticWeb, Artificial Intelligence inMedicine, statistics in medicine, biometrics, statistics in medical research, etc.
11.1.2 Leadership within the scientific community
Antia Burgun serves as the Representative of the French medical informatics community at the IMIA (InternationalMedical Informatics Association). She also serves as a member of the Executive Board of the Imagine Institute.
11.1.3 Scientific expertise
Sarah Zohar is a voting member at the Cnedimts (“Commission nationale d’évaluation des dispositifs médicaux et des technologies de santé”) at HAS (“Haute Autorité de Santé”).
Sarah Zohar was in the review committee of BpiFrance grant evaluation "évaluation DM à base d’IA et de numérique".
Members of the HeKA team are involved in several ethic committees, such as the Comité d’Ethique de la Recherche APHP.Centre, the EDS AP-HP Comité Scientifique et Ethique. Members of HeKA the scientific board of the ANR generic call Axe H.14 : Interfaces : mathématiques, sciences du numérique – biologie, santé and of the FC3R (French Center for the 3R).
Stéphanie Allassonière and Anita Burgun hold the Helath Chair at PRAIRIE institute.
Stéphanie Allassonière is the vice-president for valorization of Université Paris Cité
11.2 Teaching - Supervision - Juries
Anne Sophie Jannot and Sandrine Katsahian lead the speciality “Big Data in Health” in theMaster of Science of Public Health at the Paris Cité University.
Anne Sophie Jannot co-leads the quantitative biomedicine course, which is part from themedical degree course.
Anne Sophie Jannot leads a professional degree of “health data reuse” at the Paris Cité University in collaboration withMarseille and Bordeaux Universities.
Sandrine Katsahian is responsible for the PRIME department dedicated to the Research, Innovation, DigitalMedicine located in the HEGP in APHP.Centre.
Moreno URSINO co-leads the course “Science des données” in the L2SIAS at the Paris Cité University.
Stéphanie Allassonniere within the MVA (Mathematics, Vision, Learning) M2 run a cours focusing on the analysis of real-life health data.
Stéphanie Allassonniere coordinates the SMPS Bioentrepreneur (Université Paris Cité)
Jean Feydy is involved in the digital health track of the EPITA school of software enginee
Members of the HeKA team are in charge of two majors of the Master in Public Health of the Université Paris Cité: one on Biomedical Informatics (IBM) and one on Data Science in Healthcare (DMS).
11.2.1 Internal or external Inria responsibilities
Stéphanie Allassonière was the chair of the CRCN-ISFP INRIA Lyon committee.
11.2.2 Interventions
HeKA memebers are regularly invited to present their research to the general public.
12 Scientific production
12.1 Major publications
- 1 articleData Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder.IEEE Transactions on Pattern Analysis and Machine IntelligenceJune 2022HAL
- 2 articleIDNetwork: A deep illness‐death network based on multi‐state event history process for disease prognostication.Statistics in Medicine419April 2022, 1573-1598HALDOI
- 3 articleBayesian dose‐regimen assessment in early phase oncology incorporating pharmacokinetics and pharmacodynamics.BiometricsFebruary 2022HALDOI
- 4 articleMining Electronic Health Records for Drugs Associated With 28-day Mortality in COVID-19: Pharmacopoeia-wide Association Study (PharmWAS).JMIR Medical Informatics1032022, e35190HALDOI
- 5 articleSpatio-temporal mixture process estimation to detect population dynamical changes.Artificial Intelligence in Medicine1262022, 102258HALDOI
- 6 inproceedingsUsing an ontological representation of chemotherapy toxicities for guiding information extraction and integration from EHRs.Medinfo 2021 - 18th World Congress on Medical and Health InformaticsVirtual conference, AustraliaOctober 2021HAL
12.2 Publications of the year
International journals
International peer-reviewed conferences
National peer-reviewed Conferences
Scientific books
Scientific book chapters
Edition (books, proceedings, special issue of a journal)
Reports & preprints