This project aims at developing formal methods for understanding the cell machinery and establishing computational paradigms in cell biology. It is based on the vision of

cells as machines,

chemical reaction networks as programs,

and on the use of concepts from computer science to master the complexity of cell biochemical processes.

We contribute to the development of a computational theory of chemical reaction networks (CRNs), by addressing fundamental research issues in computer science on the concepts of analog computation and analog computational complexity in biochemistry, and on the interplay between structure and dynamics in CRNs.

Since 2002, we develop a software platform, called the Biochemical Abstract Machine (), for modeling, analyzing and now synthesizing CRNs, with some unique algorithmic contributions. The reaction rule-based language of BIOCHAM allows us to reason about CRNs at different levels of abstraction in the hierarchy of their stochastic, differential, Boolean and hybrid semantics. Various static analysis methods, most of them based on constraint solving or graph theory, provide useful information before going to simulations and dynamical analyses, for which quantitative temporal logic is used to formalize cell behaviors with imprecise data, and to constrain model building.

A tight integration between dry lab and wet lab efforts is also essential for the success of the project. This is achieved through collaborations with biologists and experimentalists, including partners from the pharmaceutical industry, on concrete biological and biomedical questions.

Because of both the importance of constraint solving and optimization techniques in our approach, and the need of rapid prototyping for our software developments, we keep some research and teaching activity in constraint logic programming, as a general paradigm for computing with partial information systems, and solving practical NP-hard problems.

Originally, Feinberg's CRN theory, and Thomas's influence networks analyses, were created to provide sufficient and/or necessary structural conditions for the existence of multiple steady states and oscillations in complex gene networks. Those conditions can be verified by static analyzers without knowing kinetic parameter values, nor making any simulation.

In this approach, most of our work consists in analyzing the interplay between the structure (Petri net invariants, influence graph, reductions by subgraph epimorphisms)
and the dynamics (Boolean, CTMC, ODE, time scale separations) of CRNs in their different interpretations.
In particular, our study of the influence graphs of reaction systems lead to the non-trivial
generalization of Thomas' conditions of multi-stationarity and Soulé's proof for influence systems,
to reaction systems .

Furthermore, our original method to infer CRNs from ODEs showed the generality of CRNs, and lead us to prove the Turing-completeness of continuous CRNs over a finite set of molecular species, showing that any computable function over the real numbers (i.e. computable in arbitrary precision by a Turing machine) can be computed by a CRN over a finite set of molecular species (Best paper award CMSB 2017, Prix La Recherche 2019). This result opens a whole research avenue on CRN design by compilation of mathematical functions , with innovative applications in synthetic biology.

Our group was among the first ones in 2002 to apply model-checking methods to systems biology
in order to reason on large molecular interaction networks, such as Kohn's map of the mammalian cell cycle (800 reactions over 500 molecules) .
The logical paradigm for systems biology that we have subsequently developed for quantitative models can be summarized by the following identifications :

biological model = transition system

dynamical behavior specification = temporal logic formula

model validation = model-checking

model reduction = sub-model-checking,

model prediction = formula enumeration,

static experiment design = symbolic model-checking, state

model synthesis = constraint solving

dynamic experiment design = constraint solving

In particular, the definition of a continuous satisfaction degree for first-order temporal logic formulae with constraints over the reals,
was the key to generalize this approach to quantitative models,
opening up the field of model-checking to model optimization

This line of research continues with the development of temporal logic constraint patterns with efficient solvers, and their experimentation for fast model building, in partnership with biologists to answer concrete questions in the biomedical domain and the pharmaceutical industry.

Constraint solving and optimization algorithms are important in our research. On the one hand, static analyses of CRNs often involve solving hard combinatorial optimization problems, for which we have shown that constraint logic programming techniques are particularly successful, often beating dedicated algorithms on real-size instances from model repositories.

On the other hand, parameter search problems involve solving hard continuous optimization problems, for which evolutionary algorithms, and especially the covariance matrix evolution strategy () have shown to provide best results in our context. Constraint-based models and efficient constraint solvers are thus instrumental in our approach for building challenging quantitative models, gaining model-based insights, revisiting admitted assumptions, and contributing to biological knowledge.

Our collaborative work on biological applications is expected to serve as a basis for groundbreaking advances in cell functioning understanding, cell monitoring and control, and novel therapy design and optimization. Our collaborations with biologists are focused on concrete biological questions, and on the building of predictive computational models of biological systems to answer them. Furthermore, one important application of our research is the development and distribution of a modeling software for computational systems biology and synthetic biology.

Since 2002, we develop an open-source software environment for modeling and analyzing biochemical reaction systems. This software, called the Biochemical Abstract Machine (), is compatible with SBML for importing and exporting models from repositories such as BioModels.
It can perform a variety of static analyses, specify behaviors in Boolean or quantitative temporal logics, search parameter values satisfying temporal constraints, and make various simulations.
While the primary reason of this development effort is to be able to implement our ideas and experiment them quickly on a large scale, using rapid prototyping techniques based on constraint logic programming,
BIOCHAM is distributed and used by other groups worldwide, for building CRN models, for comparing CRN analysis/synthesis techniques, and for teaching computational systems biology.
A Jupyter BIOCHAM kernel has been created to make it possible to use BIOCHAM on our web server without any installation.
We plan to continue developing BIOCHAM for these different purposes and improve the software quality and animation of the community of users with the recruitment of a research engineer.

Since 2020, we participate in the , which provides an integrated collection of software tools for the analysis of qualitative models, including CaSQ. This platform encourages the reproducibility of analysis by combining Docker images (reproducible software environment) and Jupyter notebooks (reproducible and shareable workflows).

These two last efforts play a central role in the global project, see Section .

In collaboration with Franck Molina (prix de l'innovation CNRS 2020), Lab. Sys2Diag ALCEN-CNRS, Montpellier, and Jie-Hong Jiang, NTU, Taiwan, we aim at applying our original CRN design pipeline to the implementation of high-level functions with enzymatic reactions in DNA-free non-living vesicles. The targeted medical applications concern the design of biosensors for medical diagnosis , and the design of artificial red blood protocells (Inria action AEx GRAM).

Our approach is based on analog computation theory, and our capability to compile mathematical functions and controllers in DNA-free, RNA-free biochemical reactions with abstract proteins. The concrete realizations are experimented using microfluidic devices at Sys2Diag Lab. in order to precisely control both the size of the vesicles and the concentrations of the injected chemical compounds. Interestingly, it is worht noting that the choice of non-living chassis, in contrast to living cells widely used in synthetic biology, is particularly appealing for many innovative applications involving security considerations and compliance to EU regulation.

The unique features of BIOCHAM have been used in several success stories in the analysis, or now synthesis, of cell signaling networks. This was for instance the case for the deciphering of the complex dynamics of G protein-coupled receptors, using original parameter optimization techniques with temporal logic constraints.

More recently, using our CRN compilation pipeline of mathematical functions, we could generate synthetic analogs of the natural and ubiquitous MAPK cell signaling network, by compilation of similar sigmoid input/ouput functions .

Recent advances in cancer chronotherapies support the evidence that there exist important links between the cell cycle and the circadian clock genes. One purpose for modeling these links is to better understand how to efficiently target malignant cells depending on the phase of the day and patient characterictics. These questions have been at the heart of a series of collaborative projects with Franck Delaunay (CNRS Nice) and Francis Lévi (INSERM Hopital Paul Brousse, Villejuif) and now Annabelle Ballesta (INSERM, Institut Curie Paris), with active participation to ANR , EU EraNet Sysbio
C5Sys
and FP6
TEMPO
projects.

In the past, we developed a coupled model of the Cell Cycle, Circadian Clock, DNA Repair System, Irinotecan Metabolism and Exposure Control under Temporal Logic Constraints , and a bidirectional coupled model of the cell cycle and the circadian clock to explain unexpected observations in single-cell experiments in fibroblasts.

This year, several new results were obtained in the Thesis of Julien Martinelli , see Section .

In synthetic biology, our approach based on analog computation target enzymatic reactions with proteins in artificial vesicles which are thus DNA-free and RNA-free.

Our multidisciplinary research rooted in fundamental computer science aims at contributing to biology and medicine by going quite far in the applications.

In , we present a polynomialization algorithm of quadratic time complexity to transform a system of elementary differential equations to polynomial ordinary differential equations (PODE). This algorithm is used as a front-end transformation in the compilation pipeline of BIOCHAM, to compile any elementary mathematical function, either of time or of some input species, into a finite CRN (see Fig. ). For instance, the CRN synthesized for the Hill sigmoid function of order 5 provides a synthetic analog of the ubiquitous MAPK signalling CRN.

This work has been presented at CMSB 2021 , CASC 2021 and in an invited talk given at SIAM AG 2021. It is a follow-up of our previous results on the Turing-completeness of continuous CRN over a finite set of molecular species, and on the complexity of the PODE quadratization problem.

The disease map of Covid-19 mentioned last year in highlight has been published this year in Mol. Sys. Biol..

The Turing completeness result for continuous chemical reaction networks (CRN) shows that any computable function over the real numbers can be computed by a CRN over a finite set of formal molecular species using at most bimolecular reactions with mass action law kinetics. The proof uses a previous result of Turing completeness for functions defined by polynomial ordinary differential equations (PODE), the dualrail encoding of real variables by the difference of concentration between two molecular species, and a back-end quadratization transformation to restrict to elementary reactions with at most two reactants. The presentation of a mathematical function as the solution of a PODE is however not always evident.

In , , we present a polynomialization algorithm of quadratic time complexity to transform a system of elementary differential equations in PODE. This algorithm is used as a front-end transformation to compile any elementary mathematical function, either of time or of some input species, into a finite CRN (see Fig. ). We illustrate the performance of our compiler on a benchmark of elementary functions relevant to CRN design problems in synthetic biology specified by mathematical functions. In particular, the abstract CRN obtained by compilation of the Hill function of order 5 is compared to the natural ubiquitous CRN structure of MAPK signalling networks.

With the automation of biological experiments and the increase of quality of single cell data that can now be obtained by phosphoproteomic and time lapse videomicroscopy, automating the building of mechanistic models from these data time series becomes conceivable and a necessity for many new applications. While learning numerical parameters to fit a given model structure to observed data is now a quite well understood subject, learning the structure of the model is a more challenging problem that previous attempts failed to solve without relying quite heavily on prior knowledge about that structure.

In a paper in preparation, we present an unsupervised statistical learning heuristic search algorithm to infer reaction rules with a time complexity for inferring one reaction in linear time in the number of observed transitions in the traces, and quadratic time in the number of observed molecular species.

In the thesis of Marine Collery at IBM France in the domain fraud detection from bank transaction traces, we have evaluated a somewhat similar rule inference algorithm in .

We investigated at the molecular scale the influence of systemic regulators (e.g. temperature, hormones) on peripheral clocks, through a model learning approach involving systems biology models based on ordinary differential equations. To that end, we rely on prior knowledge encompassed in a circadian clock model that was also the subject of a publication in . The latter allowed us to derive an approximation for the action of systemic regulators on the expression of three core-clock genes: Bmal1, Per2 and Rev-ErbBmal1 or Per2 transcription most likely by temperature or nutrient exposure cycles. This agreed with biological knowledge on temperature-dependent control of Per2 transcription. The strengths of systemic regulations were found to be significantly different according to mouse sex and genetic background.

In the framework of the ANR-DFG SYMBIONT project we investigate mathematical justification of SEPI reductions based on Tikhonov's theorem and their computation using tropical algebra methods and constraint programming techniques .

This year we contributed to deliverable D2.1 on performance evaluation comparing polyhedral, SAT modulo theories (Z3) and constraint programming (BIOCHAM) methods.

In collaboration with colleagues in the group at FU Berlin and the at université de Nice, we study extended Boolean models with intermediate variables representing unknown delays on the interactions. This approach eliminates implicit kinetic assumptions in the classical updating modes of these models, not only to recover relevant dynamical behaviours, but also to provide a framework for reasoning on kinetic constraints which are sufficient to enable these behaviours.

This year, preliminary results have been presented in two public talks and the results have been generalized. An article is in preparation.

Molecular interaction maps have emerged as a meaningful way of representing biological mechanisms in a comprehensive and systematic manner. However, their static nature provides limited insights to the emerging behavior of the described biological system under different conditions. Computational modelling provides the means to study dynamic properties through in silico simulations and perturbations. Last year we described how to bridge the gap between static and dynamic representations of biological systems with CaSQ, a software tool that infers Boolean rules based on the topology and semantics of molecular interaction maps built with CellDesigner (see Section ). We developed CaSQ by defining conversion rules and logical formulas for inferred Boolean models according to the topology and the annotations of the starting molecular interaction maps. We used CaSQ to produce executable files of existing molecular maps that differ in size, complexity and the use of SBGN standards. We also compared, where possible, the manually built logical models corresponding to a molecular map to the ones inferred by CaSQ. The tool is able to process large and complex maps built with CellDesigner (either following SBGN standards or not) and produce Boolean models in a standard output format, SBML-qual, that can be further analyzed using popular modelling tools. References, annotations and layout of the CellDesigner molecular map are retained in the obtained model, facilitating interoperability and model reusability. This year this was used in the context of the Covid 19 Disease Map project as a fundamental part of the pipeline

The fast accumulation of biological data calls for their integration, analysis and exploitation through more systematic approaches. The generation of novel, relevant hypotheses from this enormous quantity of data remains challenging. Logical models have long been used to answer a variety of questions regarding the dynamical behaviours of regulatory networks. As the number of published logical models increases, there is a pressing need for systematic model annotation, referencing and curation in community-supported and standardised formats. In we summarise the key topics and future directions of a meeting entitled ‘Annotation and curation of computational models in biology’, organised as part of the 2019 [BC]2 conference. The purpose of the meeting was to develop and drive forward a plan towards the standardised annotation of logical models, review and connect various ongoing projects of experts from different communities involved in the modelling and annotation of molecular biological entities, interactions, pathways and models. This article defines a roadmap towards the annotation and curation of logical models, including milestones for best practices and minimum standard requirements.On September 2021 a second workshop took part in Basel under the auspices of [BC]2 conference, to discuss the advancements in the logical modelling community and exchange with other modelling communitites of systems biology toward a common framework for annotated, accessible, reproducible and interoperable computational models in biology. Sylvain Soliman and Anna Niarakis were co-organizers and a the proceedings paper is under preparation.

In this work, we make use of the type I IFN graphical model developed during the COVID19 Disease Map project and available in the C19DM repository and the map-to-model translation framework (see Section ), to obtain an executable, dynamic model of type I IFN signalling for in silico experimentation. Our goal is to understand how we can maximize the Antiviral Immune response (production of ISGs), limit Inflammation, and amplify the inhibition of viral replication through the production of the IFNs. Starting from the CellDesigner map of the type I Interferon signalling we use the tool to obtain an SBML qual file with preliminary Boolean rules. To cope with specificities of the COVID19 Disease Map repository, the tool CaSQ was adapted (version 1.0.0) to include: a) automatic merging of identical SBML species that can appear for visual purposes on the XML CellDesigner graph, b) the active suffix in the species name when the dotted circle was used to denote activation of a certain species. The SBML qual file was processed to ensure compatibility regarding name display in GINsim, and also control nodes were added in an attempt to reduce the number of inputs (from 37 to 7) and ease the computational cost. Given the size of the model (given the number of nodes and edges), an exhaustive attractors search would not be feasible. Therefore, we searched for attractors for a given set of initial conditions that cover different biological scenarios of the type I IFN pathway with or without the infection, and in the presence or absence of drugs. For the analysis we used the CoLoMoTo notebook implementation. Besides the dynamical analysis of the model which is based on literature mining, we are also using the model to assess the impact of target nodes prioritized by omic data analysis and drug target enrichment, performing single and combined perturbations. An article is in preparation.

Fibroblasts are critical regulators of several physiological processes linked to extracellular matrix regulation. Under certain conditions, fibroblasts can also transform into more aggressive phenotypes and contribute to disease pathophysiology. In this work, we highlight metabolic reprogramming as a critical event toward the transition of fibroblasts from quiescent to activated and aggressive cells, in rheumatoid arthritis and cancer. We draw obvious parallels and discuss how systems biology approaches and computational modeling could be employed to highlight targets of metabolic reprogramming and support the discovery of new lines of therapy. The objectives of this project are: a) First, to study similarities in the molecular and signalling pathways of RASFs and CAFs, which are involved in the regulation of glucose metabolism. b) Second, to create dynamic models that could be used for in silico perturbations in order to study the cells’ response to different conditions such as inflammation, hypoxia, and elevated ROS levels, to name a few. Dynamical modelling is indispensable if we want to understand the emergent behaviour of the cells when complex and intertwined pathways are involved. c) Lastly, to identify and propose a possible mechanism (common or not) that could explain the fibroblasts’ metabolic reprogramming regarding glucose in disease- specific conditions. An in-depth review focusing on RASFs and CAFs was recently published in Cancers .

Rheumatoid arthritis fibroblast-like synoviocytes (RA FLS) are central players in the disease pathogenesis, as they are involved in the secretion of cytokines and proteolytic enzymes, exhibit invasive traits, high rate of self-proliferation and an apoptosis-resistant phenotype. Using the RA map and the tool CaSQ we build a large-scale dynamic model of RA FLS to study apoptosis, cell proliferation and growth, osteoclastogenesis and bone erosion, matrix degradation and cartilage destruction and inflammation outcomes. The RA FLS network was created selecting fibroblast-relevant sub-parts of the state-of-the-art RA map, and the Boolean rules were added using the tool CaSQ that infers logical formulae based on the topology and the semantics of the network (see Section ). To cope with complexity, we also employ a “divide and conquer” strategy by creating separate executable sub-modules that comprise each only one phenotype. We present also reduced model versions that facilitate downstream analysis. Systematic testing of different initial conditions could further lead to predictions regarding the outcomes of specific perturbations, such as single or combined effects, simulated with the model by forcing or suppressing the activity of various factors of interest systematically. The goal is to gain a better understanding of the mechanisms that drive inflammation, resistance to apoptosis, high proliferation rate, and cartilage and bone degradation, and their coordination, to delineate and gain control of these outcomes (paper in preparation).

Spleen tyrosine kinase (SYK) can behave as an oncogene or a tumor suppressor, depending on the cell and tissue type. As pharmacological SYK inhibitors are currently evaluated in clinical trials, it is important to gain more information on the molecular mechanisms underpinning these opposite roles. In , we reconstructed and compared data-enriched signaling networks using phosphoproteomic data from breast cancer and Burkitt lymphoma cell lines where SYK plays opposite roles. Our analysis shows that in breast cancer cells, the SYK target-enriched signaling pathways included intercellular adhesion and Hippo signaling components that are often linked to tumor suppression. In Burkitt lymphoma cells, the SYK target-enriched signaling pathways included molecules that could play a role in SYK pro-oncogenic function in B-cell lymphomas. These data demonstrate that proteomic profiling combined with mathematical network modeling allows untangling complex pathway interplays and revealing difficult to discern interactions among the SYK pathways that positively and negatively affect tumor formation and progression.

Microtubules and their post-translational modifications involved in major cellular processes such as: mitosis, cardiomyocyte contraction, and neuronal differentiation. More precisely, in neurons, the post-translational modifications of detyrosination and tyrosination are crucial for neuronal plasticity, axon regeneration, recruitment and transports of proteins and correct neuronal wiring. We hypothesize that the decrease of density and length of microtubules and the loss of neuronal structures such as synapses, dendritic spine and growth cone which are correlated with the progressive cognitive decline may be the consequence of the dysregulation of the cycle detyrosination/tyrosination in neurodegenerative disorder. This hypothesis is investigated in collaboration with Servier by combining experimental approaches with mathematical modelling in a paper currently under review.

Skin protects the body against external agents, for instance pathogens, irritants, or UV radiation, that can trigger inflammation. Inflammation is a complex phenomenon that is classified in two main types, acute and chronic. They are distinguished by different parameters such as the duration, the underlying mechanisms, the components involved like the type of immune cells, and the nature and intensity of the associated clinical signs. The computational models developed in collaboration with Johnson&Johnson France, combine mathematical and multi-agent modelling using BIOCHAM and EPISIM modelling tools.

In we study the effect of pH on skin inflammation using a multi-agent model.

In the framework of the Cifre PhD thesis of Jeremy Grignard at Servier, we work on the coupling between computational modeling and biological experiment design, and on chemical reaction network inference methods from data time series.

In the framework of the Cifre PhD thesis of Eléa Greugny at Johnson&Johnson Santé Beauté France, we work on the computational modeling of inflammatory process in the skin, using multi-scale modeling and multi-agent simulation.

In the framework of the PhD thesis of Marine Collery at IBM France, we work on rule learning from time series data in the context of learning fraud detection rules in bank transactions.

