Section: New Results

Structural identification of gene regulatory networks

In general, structural identification of genetic regulatory networks involves fitting appropriate network structures and parameters to the data. While modern measurement techniques such as reporter gene systems provide data of ever-increasing quality, the problem remains challenging because exploring all possible network structures in the search of the best fitting model is prohibitive.

In order to address the structural identification problem, Eugenio Cinquemani proposed in collaboration with the Automatic Control Lab at ETH Zürich (Switzerland) and the Computer Engineering & Systems Science Department of the University of Pavia (Italy), an ODE modelling framework which we refer to as models with unate-like structure. In Boolean network modeling, unate functions are argued to capture virtually all observable interactions in gene regulatory networks. In our quantitative framework, unate logics are encoded in the structure of the nonlinear synthesis rates of the network proteins. This framework allows us to integrate a-priori information on the most likely network structures, and the models enjoy monotonicity properties that can be exploited to simplify the identification task.

As described in previous work, published in Bioinformatics in 2010, the key idea is to divide the identification process into two steps. In the first step, different monotonicity properties of different model structures are exploited to discard those structures whose property is falsified by the observed data points (time-lapse protein concentrations and synthesis rates). In the second step, the parameters of the model structures not discarded in the first step are fitted to the data in the search of the simplest structure explaining the data with sufficient accuracy. The procedure was validated on challenging data from the literature.

On the methodological side, in the context of the same international collaboration, the identification approach has been further developed. For important subclasses of unate models, larger sets of network structures can now be discarded in the hypothesis falsification step, based on additional properties other than monotonicity (namely quasi-convexity). These improvements have been presented at the 18th IFAC World Congress held in Milan, and are reported in a journal paper that has been accepted for publication in the International Journal of Robust and Nonlinear Control, in a special issue on system identification for biological systems. In the framework of the PhD thesis of Diana Stefan, in collaboration with Eugenio Cinquemani, Stephan Lacour and Omaya Dudin, the method is now being applied to experimental data produced within IBIS for the study of the gene network regulating motility of E. coli bacteria.

Woei-Fuh Wang, who defended her PhD thesis carried out under the supervision of Johannes Geiselmann and Chung-Ming Chen in December 2011, addressed a different problem in the structural identification of gene regulatory networks. The inference of the network topology and regulatory mechanisms is complicated by the fact that we usually do not know all relevant genes that need to be taken into account for explaining the observed expression patterns. The aim of the thesis was to detect the presence of such "missing genes", as well as their regulatory roles and expression patterns. Using a well-known class of simplified kinetic models, based on power-law approximations of synthesis rate functions, an inference algorithm was developed. The algorithms are based on factor analysis, a well-developed multivariate statistical analysis approach that is used to investigate unknown, underlying features of a dataset, as well as independent component analysis. The proposed method of inferring the expression profile of a missing gene and connecting it to a known network structure has been applied to artificial networks, as well as a real network studied within IBIS: the acs regulatory network in Escherichia coli.