The research objectives of the Dream team are about aiding monitoring and diagnosing time evolving systems. The main issue is to help the person in charge of the system by analyzing the observations provided by sensors and giving her/him information about diagnosis hypotheses (potential anomalies or failures) and recommended actions. Qualitative model-based approaches are advocated for at least two main reasons:
they are "white-box" approaches and consequently diagnoses and recommended actions can be explained to the user in an explicit and adequate language,
they are flexible enough and are then adapted to quickly evolving systems such as technological systems (for instance telecommunication components).
We use a model-based approach relying on normal and faulty behavioral models. These models are discrete-event models such as (temporal) communicating automata, temporal causal graphs or chronicles.
In this context, two main research themes are developed:
Classical model-based diagnosis methodologies cannot be directly used for complex systems due to the intractable size of the model and the computational complexity of the process. It is especially true when on-line diagnosis is considered. Two solutions are investigated:
We propose a decentralized approach which relies on combining local diagnoses built from local models (or local diagnosers). Three problems are currently investigated: Which strategy should be used for an optimal merge of the local diagnoses in order to preserve the efficiency and the completeness of the process? How the process incrementality can be ensured in an on-line diagnosis context where observations are incrementally collected? How to deal with reconfigurable systems the topology of which can be changed at running time.
We propose to use model-checking techniques in order to improve the efficiency of the computation and to cut down the combinatorial state explosion. It means using adequate symbolic representations as BDD for instance and partial order reduction techniques taking advantage of the existing inherent concurrency.
It is well recognized that model-based approaches suffer from the difficulty to acquire the model. It is why we focus on automatically acquiring models from data with symbolic learning methods coupled with data mining methods. One of the challenges we tackle with is to extend existing inductive logic programming methods (ILP) to temporal data in order to be able to deal with data coming from signals (as electrocardiograms in the medical domain) or alarm logs (in the telecommunication domain). Two problems are currently investigated: how to adapt the learning process to deal with multiple sources of information (multi-sensor learning)? how to integrate signal processing algorithms to the learning or diagnosis task when this latter relies on a qualitative description of signals?
Our application domains are the following:
industrial applications with a focus on telecommunication networks;
medical applications and especially cardiac monitoring, i.e the on-line analysis of cardiac signals to detect arrhythmias and the development of "intelligent" cardiac devices (pacemakers and defibrillators) having some signal analysis and diagnosis capabilities;
environmental protection, and more precisely the development of decision support systems to help the management of agricultural plots with the objective of preserving water quality threatened by pesticide pollution.
Our work on monitoring and diagnosis relies on model-based approaches developed by the Artificial Intelligence community since the founding studies by R. Reiter and J. de Kleer . Our project investigates the on-line monitoring and diagnosis of systems, which are modeled as discrete events systems, focusing more precisely on monitoring by alarms management . Computationnal efficiency is a crucial issue for real size problems. We are developing two approaches. The first one relies on diagnosers techniques , for which we have proposed a decentralized and generic approach. The second one uses chronicle recognition techniques, focusing on learning chronicles.
Early work on model-based diagnosis dates back in the 70-80's by R. Reiter, the reference paper on the logical theory of diagnosis being . In the same years was constituted the community known as DX, named after the workshop on the principles of diagnosis. The research in these areas are still very active and the DX workshop gathers about fifty people in the field every year. Contrary to the expert system approach, which has been the leading approach for diagnosis (medical diagnosis for instance) before 1990, the model-based approach lies on a deep model representing the expected correct behavior of the system to be supervised or on a fault model. Instead of acquiring and representing an expertise from experts, the model-based approach uses the design models of industrial systems. The approach has been initially developed for electronic circuits repair , focusing on off-line diagnosis of so-called static systems. Two main approaches have been proposed then: (i) the consistency-based approach, relying on a model of the expected correct behavior, which aims at detecting the components responsible for a difference between the expected observations and the really observed ones; (ii) the abductive approach which relies on a model of the failures that can affect the system, and which identifies the failures or the faulty behavior explaining the anomalous observations. See the references for a detailed exposition of these investigations.
Since 1990, the researchers in the field have studied the monitoring and the diagnosis of dynamic systems, which made them closer to the researchers in control theory. What characterizes the IA approach is the use of qualitative models instead of quantitative ones and the importance given to the search for the real origin of the faulty behavior. Model-based diagnosis approaches rely on qualitative simulation or on causal graphs in order to look for the causes of the observed deviations. The links between the two communities have been enforced, in particular for what concerns the work about discrete events systems and hybrid systems. The used formalisms are often similar (automata, Petri nets ,...) .
Our team focuses on monitoring and on-line diagnosis of discrete events systems and in particular on monitoring by alarm management. In this context, a human operator is generally in charge of the system monitoring and receives events (the alarms) which are time-stamped and emitted by the components themselves, in reaction to external events. These observations on the system are discrete informations, corresponding to an instantaneous event or to a property associated to a time interval. The main difficulties for analyzing this flow of alarms are the following:
the huge number of received alarms: the supervisor may receive till several hundreds of messages per second, many of which being insignificant,
the alarm overlapping: the order in which alarms are received may be different from the order in which alarms were emitted. Moreover, various sequences of alarms resulting from concurrent failures may overlap. The propagating delays, and sometimes the ways the alarms are transmitted, must be taken into account, not only for event reordering, but also to decide at what time all the useful messages can be considered as being received.
the redundancy of received alarms: some alarms are only routine consequence of other alarms. This can provoke a phenomenon known as cascading alarms.
the alarm loss or alarm masking: some alarms can be lost or masked to the supervisor when an intermediate component in charge of the transmission is faulty. The absence of an alarm must be taken into account, since it can give a useful information about the state of the system.
There are two cases focusing on very different issues. In the first one, the alarms must be dealt with, on-line, by the operator. In this case, alarm analysis must be done in real time. The operator must react in a very short period of time to keep the system working at the best in spite of the inputs variability and the natural evolution of the processes. Consequently, the natural system damages (components wear, slow modification of the components properties, etc.) are not directly taken into account but are corrected by tuning some parameters.
This reactive treatment withstands the treatment of alarms maintenance. In this second case, a deeper off line analysis of the system is performed, by foreseeing the possible difficulties, by planning the maintenance operations in order to minimize significantly the failures and interruptions of the system.
The major part of our work focuses on on-line monitoring aid and it is assumed that the correct behavior model or the fault models of the supervised systems are available. However, an on-line use of the models is rarely possible because of its complexity with respect to real time constraints. This is especially true when temporal models are under concern. A way to tackle this problem is to make an off-line transformation (or compilation) of the models and to extract, in an adapted way, the useful elements for diagnosis.
We study two different methods:
In the first method, the automaton used as a model is transformed off-line into an automaton adapted to diagnosis. This automaton is called a diagnoser. The transitions of the automaton are only triggered by observable events and the states contain only information on the failures that happened in the system. Diagnosing the system consists in going through all the different states of the diagnoser as observable events become available. This method has been proposed by M. Sampath and colleagues . We have extended this method to the communicating automata formalism (see also ). We have also developed a more generic method which takes advantage of the symmetries in the architecture of the system .
The main drawback of centralized approaches is that they require to explicitly build the global model of the system which is unrealistic for large and complex systems as telecommunication networks. It is why our more recent work deals with a decentralized approach . This approach can be compared to R. Debouk and colleagues and also to P. Baroni and colleagues . Our method, unlike R. Debouk et al., relies on local models. We do not need to construct a global model. Indeed, the size of the global model would have been too important in our applications. Even if the methods are very close, P. Baroni et al. are concerned with an a posteriori diagnosis (off-line) whereas we propose an on-line diagnosis. Each time an alarm comes, it is analyzed and the diagnosis hypotheses are incrementally computed and given to the operator. Our main theme of study is close to E. Fabre and colleagues . The main difference is that they propose a multi-agent approach where the diagnoses are computed locally at the component level using message exchanges, whereas we construct a global diagnosis which is given to the operator at the supervisor level.
In the second method, the idea is to associate each failure that we want to detect with a chronicle (or a scenario), i.e. a set of observable events interlinked by time constraints. One way to supervise dynamic systems is to recognize those chronicles on-line. The principle is to follow the possible chronicles corresponding to a set of received failure messages until finding one or several chronicles that satisfy all the constraints. To perform this task, we have to create a chronicle base that contains all the possible chronicles. This base must be updated each time the supervised system evolves physically or structurally. An expert is needed to create the chronicle base. However, this makes the maintenance of the base very expensive. That's why we prefer to use an automatic method to learn the base. Most of the studies on chronicle recognition are french and are based on C. Dousson's thesis . Applications generally deal with system monitoring (telecommunication network), video-surveillance (underground, bank, etc...). Our research studies do not focus directly on the development of chronicle recognition systems but on the automatic acquisition of the chronicle base. This idea is developed in the next paragraph.
The techniques investigated in the group aim at acquiring and improving models automatically. They belong to the field of machine or artificial learning . In this domain, the goal is the induction or the discovery of objects characterizations from their descriptions by a set of features or attributes. Our work is grounded on Inductive Logic Programming (ILP).
A learning method is supervised if samples of objects to be classified are available and labeled by the class they belong to. Such samples are often called learning examples. If the examples cannot be classified a priori, the learning method is unsupervised. Kohonen maps, induction of association rules in data mining or reinforcement learning are typical unsupervised learning methods. From another point of view, learning methods can be symbolic, such as inductive rule or decision tree learning, or numerical, such as artificial neural networks.
We are especially interested in structural learning which aims at making explicit relations among data where such links are not known. The temporal dimension is of particular importance in applications we are dealing with, such as process monitoring in health-care, environment or telecommunications. Additionally, we consider that the comprehensibility of the learned results is of crucial importance as domain experts must be able to evaluate and assess these results. ILP is the learning technique that best meets these requirements. We use a supervised version of this technique but also intend to use the unsupervised version which is called Relational Data Mining.
ILP began in the early 80's, though not under this name, when knowledge representation paradigms coming from logic programming began to be used in the field of machine learning. Such a high-level language meets the needs of relational representations for the description of structured objets or true relations between objects
During the 90's, ILP has become a proper research topic at the intersection of domains such as machine learning, logic programming and automated deduction. The main goal of ILP is the induction of classification or prediction rules from examples and from domain knowledge. The ILP research field has been extended to data mining enabling the discovery of association rules describing the correlations between data descriptors. As ILP relies on first order logic, it provides a very expressive and powerful language for representing hypotheses as well as the domain knowledge, this is its major feature.
Formally, ILP can be described as follows: given a set of positive examples P and a set of negative examples N of some concept to be learned, a logical theory B called the background knowledge and a language LH specifying which clauses are syntactically and semantically acceptable, the goal is to discover a hypothesis H in the form of a logic program belonging to LH such that
pPBH| = p and nNBH| n.
This definition can be extended to multi-class learning. From a computational point of view, the learning process consists in searching the hypothesis space, either top-down by refining clauses that are too general (that cover negative examples) by adding literals to clause body or bottom-up by generalizing clauses that are too specific (that do not cover enough positive examples) by deleting literals or transforming constants into variables in literals. An interesting property is that the clause space has a lattice structure which enables an efficient search.
ILP is mainly used for learning classification rules. Similar techniques can also be used for inducing decision trees as well as for first order regression. The goal of regression is to predict the value of a real variable instead of a class value. Some more recent extensions deal with learning dynamic models: one such extension uses a representation coming from the qualitative simulator QSIM, another enables the discovery of differential equations from examples describing the behavior of a dynamic system .
Nowadays, work in ILP is mainly concerned with improving learning robustness (dealing with noisy or incomplete data) or efficiency (improving the search space exploration by taking structural properties into account, by stochastic techniques or by parallelizing algorithms for massively parallel computers). Another research direction investigates how to associate ILP to other learning methods which are more efficient for particular kind of data or to associate different learning strategies during ILP search. Extending the language to full first-order is also investigated. In this direction, learning from temporal data is of major interest because many application domains, such as telecommunications, health-care or environment, provide huge amounts of such data. This is why we have chosen to rely upon work by C. Rouveirol and M. Sebag who have shown the value of associating ILP to CLP (Constraint Logic Programming) in order to compute efficiently numerical values. D. Page wrote that a final challenge for ILP is to elaborate tight collaboration schemes between experts and ILP systems for knowledge discovery in order to avoid their complexity i) by enabling the evaluation of alternative hypotheses and not only those that maximize some heuristic function, ii) by devising tests and experiments for choosing among several hypotheses, iii) by providing non numerical justifications of the hypotheses such as belief measures or illustrative examples, iv) by consulting the expert when anomalies are detected in the data.
Our work is more concerned with the application of ILP rather than developing or improving the techniques. Nevertheless, as noticed by Page and Srinivasan , the target application domains (such as signal processing in health-care) can benefit from the adaptation of ILP to the particular features of the application data. Thus, we investigate how to associate temporal abstraction methods to learning and to chronicle recognition. We are also interested in constraint clause induction, particularly for managing temporal aspects. In this setting, some variables are devoted to the representation of temporal phenomena and are managed by a constraint system in order to deal efficiently with the associated computations (such as the covering tests, for example).
The following application domains are concerned by our work: telecommunication networks, medicine and environment.
Monitoring telecommunication networks is an important task and is one of the conditions to ensure a good quality of service. Given a monitoring system continuously receiving observations (alarms) sent by the system components, our purpose is to help operators to identify failures.
In this context, we developed a decentralized component-oriented approach, able to incrementally compute on-line diagnoses . The efficiency of the algorithm is increased by the use of model-checking techniques as partial order reduction techniques and BDD. Currently, we are extending our research to reconfigurable systems, i.e systems the topology of which is changing along time, due for instance to reconfiguration actions decided to remedy upload problems.
Another important challenge for telecommunication networks is predicting the subjective quality of proposed services (as it could be felt by the user) from collected technical data. Mixing data-mining and symbolic learning techniques is the way we chose to acquire this predictive knowledge.
A last issue is the security of these networks and we are starting a joint work with Lande (M. Ducassé) and France-Telecom R&D funded by a CRE (external research contract). The main idea is to use the chronicle acquisition techniques developed in the cardiac domain in order to acquire automatically detection patterns.
All this research work on telecommunication networks is done in collaboration and with the support of France-Telecom R&D.
Since the development of expert systems in the 70's, decision aiding tools have been widely studied and used in medicine and health-care. The ultimate goal is to help a physician to establish his diagnosis or prognosis from observations delivered by sensors and the individual patient's data. This involves at least three tasks:
patient monitoring: processing and abstracting signals recorded by sensors placed on patients, in order to generate alarms when a particular situation has occurred, or is about to occur. The standard context is intensive care units in hospitals where an alarm must be treated within a very short time. With the advent of telemedicine similar situations arise, but the delay to treat an alarm may be much longer. For example, a cardiac or diabetic patient may be surveyed at home and the recorded data are sent every day at some fixed hour to the care unit. If some problem is detected, the patient is urged to consult a doctor, but a long delay may occur between the time at which the problem occurred and the treatment. Time is a major feature of medical data, thus temporal abstraction associated to signal processing techniques must be used for filtering and pre-processing the raw data;
diagnostic and prognostic reasoning: models, such as causal or probabilistic models, have supplanted expert systems for diagnosis. As the course and outcome of a disease process is dynamic, time plays also an important role in diagnostic and prognostic models. Also, treatment planning or/and the clinical context may interact with these two basic reasoning processes and particular methods have to be studied and implemented to integrate these aspects;
modeling: though some particular parts of the human body are known very well (e.g. the heart), deep models are generally difficult to build in medicine because of incomplete or too complex knowledge (e.g. the brain). Fortunately, huge amounts of data have been recorded and stored in medical databases. These data can be analyzed in order to discover new knowledge that may be used to construct abstract models or behavioral models, very similar to the old expert systems, but avoiding the bottleneck of expert knowledge acquisition. Processing medical data is a specific research area known as ``intelligent data analysis (IDA) in medicine'' . An essential feature of the techniques used in IDA is that most are knowledge-based: they can use knowledge about the problem domain. Thus, a learning approach such as inductive logic programming is a tool of choice.
These three points are studied in projects involving industrial (ELA medical), medical (University Hospital of Rennes) and academic (LTSI - University of Rennes) partners, especially in the field of cardiology. Particularly, new cardiac devices and monitoring systems are investigated.
The need of decision support systems in the environmental domain is now well-recognized. It is especially true in the domain of water quality and a program, named Bretagne Eau Pure (http://www.bretagne-eau-pure.org), was launched a few years ago in order to help regional managers to protect this important resource. The challenge is to preserve the water quality from pollutants as nitrates and herbicides, when these pollutants are massively used by farmers to weed their agricultural plots and improve quality and quantity of their crops. The difficulty is then to find solutions which satisfy contradictory interests and first to get a better knowledge on pollutant transfer. For instance, it is certainly true that the pesticide transfer through catchments is still not enough analyzed and poorly understood.
In this context, we are developing decision support systems to help regional managers in preserving the river water quality. Two main artificial intelligence techniques are used in this area: multi-agents systems, which are suited to model multi-expert cooperation, and qualitative modeling, to model biophysical processes in an explicative and understandable way. The approach we advocate is the coupling of a qualitative biophysical model, able to simulate the biophysical process, and a management model, able to simulate the farmer decisions.
Two main research themes are investigated in this framework: the use of qualitative spatial modeling to simulate the pollutant transfer through agricultural catchments and the use of learning/data mining techniques to discover, from model simulation results, the discriminant variables and acquire rules relating these variables. In both cases, one of the main challenges is that we are faced with spatio-temporal data.
Our partners are mainly the SAS Inra research group, located in Rennes and other Inra research groups as the BIA group in Toulouse and the LASB group in Montpellier.
The problem we deal with is monitoring complex and large discrete-event systems (DES) such as telecommunication networks. Diagnosing dynamical systems represented as DES consists in finding what happened to the system from existing observations. Different terminologies can be found in the literature as histories, scenarios, narratives, consistent paths. They all rely on the idea that the diagnostic task consists in determining the trajectories (a sequence of states and events) compatible with the sequence of observations. From these trajectories, it is then easy to determine (identify and localize) the possible faults. The two main difficulties are i) the intractable size of the model and the huge number of states and trajectories to be explored; ii) the on-line change in the system topology and behavior.
To cope with the first difficulty, we proposed the use of decentralized diagnosis. The decentralized approach enables an on-line diagnosis without requiring the computation of the global model. Given a decentralized model of the system and a flow of observations, the program computes the diagnosis by combining local diagnoses built from local models (or local diagnosers). Two main problems have been investigated in the last three years ( ): which strategy for an optimal merge of the local diagnoses in order to preserve the efficiency and the completeness of the process? How to ensure the incrementality of the process in an on-line diagnosis context where observations are incrementally collected? A paper describing the formal framework and its experimentation on telecommunication networks has been written and submitted to Artificial Intelligence Journal. It has been updated in 2004 according to reviewers comments and is now in the final review process step (it should be published at the early beginning of 2005).
Our current work (Alban Grastien's thesis) concerns the diagnosis of reconfigurable systems. By reconfiguration, we mean systems in which addition, removal or modification of components are allowed. These changes result from (on-line) reconfiguration actions. This is for instance the case of telecommunication networks whose topology can be changed to solve overload problems, and whose components can be replaced by new ones when they are defective. In a first step we have studied a formal characterization of reconfigurable systems and of reconfigurable systems diagnosis (see ). In a second step we have extended the existing decentralized diagnosis algorithms to make them able to cope with reconfigurable systems. Currently, we are implementing and experimenting these ideas.
Together with the LTSI (Signal Processing Lab - INSERM, University of Rennes 1), we are studying how to use chronicle recognition techniques for cardiac monitoring and diagnosis. Our goal is to analyze the signals coming from several sensors in order to detect and characterize the cardiac arrhythmias a monitored patient is subject to. The nature of the arrhythmias, their features and their frequency can be used to propose convenient therapies, such as specific drugs or cardiac devices (pacemaker or defibrillator).
We are particularly working on two aspects: discovering chronicles by machine learning and improving event detection on signals. Concerning chronicle discovery we are studying how to adapt machine learning techniques in order to deal with multi-channel aspects. Various control policies are implemented and assessed: global learning and recognition from information provided by all the channels, independent learning on each channel and then symbolic knowledge fusion for global recognition, or independent recognition on each channel and merging all the results. We are particularly working on symbolic knowledge fusion which appears to be a hot topic in the knowledge acquisition community. This is the subject of Élisa Fromont's thesis which is supported by a grant from the RNTS Cepica project (cf. ).
This year we have investigated different methods for learning from multi-channel data. The two main methods are: either learn from each source separately and then use a voting procedure to select the relevant rules according to the state of the signal (this method can, in particular, improve results in case of a noisy signal), or learn rules globally, directly from all the data coming from the different sources. Voting methods that take advantage of the information from different channels are detailed in . Because voting methods appeared to be more adapted to numerical than symbolic learning, we decided to focus on the ``global'' method. Coping with multiple data sources adds complexity to the learning process as the amount of data and the complexity of the hypothess language increase. Actually, very high computation times were observed in practice. We have proposed a ``divide and conquer" method to multi-sources learning which consists in learning independently on each data source and then in using the partial results to bias the global learning . Very accurate rules were obtained and, as expected, computation times were far better than with the ``global'' method (a factor of ten on average).
In order to improve the quality of signal processing and event detection, we are studying techniques that could implement a tight collaboration between low-level signal processing algorithms on the one hand, and high-level recognition algorithms such as chronicle recognition on the other hand. This is the subject of François Portet's thesis. The main idea is to use the recognition context to choose the best signal processing algorithm as well as to tune its execution parameters.
We have studied extensively signal processing algorithms used for ECG analysis in order to assess and to model their performance in different contexts (noise type and level, shape of ECG beats, etc.). These results have been submitted for publication to Medical & Biological Engineering & Computing (the article is in the final review process). From this study we have devised rules in order to choose the most relevant algorithms according to the current recognition context (noise, predicted arrhythmias, type of cardiac devices - pacemakers or monitors -, etc.).
A second idea is to adapt the recognition task to the context. In our chronicle recognition context, it means to be able to refine or to abstract the analysis by navigating in a hierarchy of chronicle bases according to the difficulty of the current recognition task. A more abstracted chronicle base needs less low-level computation than a more refined one. Using this kind of chronicle hierarchy leads to a smarter use of computational resources.
Magda2 (Modélisation et Apprentissage pour une Gestion Distribuée des Alarmes De bout En boUt - Modeling and Machine Learning for Distributive Management of Alarms from End to End) is an RNRT project which aims at providing advanced solutions for managing heterogenous networks, and for taking into account events involving an interaction between the network and the service layers. The project ended on the beginning of 2004. We have been working on learning to predict the quality of service (QoS) of an application (TV on demand) running on a telecommunication network from information provided by equipments (routers) of this network. Precisely we have been working on time series abstraction by time-stamped symbolic event sequences. These abstracted time series are then processed by an inductive logic programming technique in order to learn diagnostic temporal rules predicting the QoS at the end user side. The learning data were recorded from a monitored network on which different faults were actually implemented. Though the results are not as rich as expected, because the recorded data did not show a sufficient quality, we demonstrated the feasibility of our approach to temporal learning and data mining .
In collaboration with LTSI (Laboratoire de Traitement du Signal et Image), we started to analyse behavioral data using data mining techniques. The data have been obtained by recording cardiac signals from new intraventricular devices and then aggregated in order to classify some patient's activity into classes such as resting, training, etc. associated to the total daily duration of each kind of activity. Data mining aims at discovering interesting patterns indicating that the state of the patient is evolving positively or negatively.
The concept of inductive databases is an attempt to formalize the mining process from a database containing the original data as well as knowledge induced from these data and represented as patterns. We are investigating how to enrich the representation of patterns in order to let an inductive database cope with numerical temporal information. Until now, only the sequential aspect of data was taken into account but more precise information is often needed for supervising patients, plants or networks, for instance. In our approach, temporal patterns are represented by chronicles. We have devised a complex generality relation over chronicles which relies on the inclusion of sets of events (itemsets) as well as on the inclusion of temporal constraints denoting minimal and maximal delays between events. The version space algorithm at the heart of the inductive database formulation has been modified to take the proposed generality relation, especially the notion of maximally specific chronicle, into account . We are beginning a collaboration with France Telecom R & D and the Lande project on the use of chronicles for network security where this approach will be applied to the discovery of scenarios describing network attacks (cf. section ).
In the framework of the Sacadeau project, our aim is to build decision support systems to help catchment managers to preserve streamwater quality.
In collaboration with INRA researchers, two actions are conducted in parallel.
The first one consists in building a qualitative model to simulate the pesticide transfer through the catchment from the time of its application by the farmers to the arrival at the stream. The model architecture relies on the coupling of two models: a biophysical transfer model and a management model which can simulate the farmer decisions in herbicide application, depending on the climate and the weeding strategy, to cite only some of the decision criteria. Given data on the climate over the year, on the catchment topology and on the farmer strategy, the model outputs the pesticide concentration in the stream along the year. Though Inra is the main contributor, we actively participate to its realization. The biophysical model is now implemented and currently in a validation phase. The management model is nearly completed.
The second action consists in identifying some of the input variables as main pollution factors and in learning rules relating these pollution factors to the temporal distribution of the stream pesticide concentration. We chose to use Inductive Logic Programming (ILP) techniques to get easy-to-read and explicative rules. After having collected representative scenarios, we obtained significant sets of simulated data, corresponding to situations identified to be interesting ones. These data constitute the learning data base from which ILP tools can be used to infer rules relating pollution factors and temporal distribution of the pollution. We used the ICL software and obtained first interesting results. Most of learned rules express influences between parameters which were already well-known. But, interestingly enough, some of them underline influences and cause-effect relations the experts were not aware of, and even in some cases, influences partly contradictory with what was believed by experts. We are currently discussing with our experts to evaluate these results. In parallel, a study on the use of classification techniques has been started in order to get a first synthetic view of the main parameters. This study focusses mainly on the climate which appears to be an important variable in this pollution process.
The Soleil project aims to build a 3rd generation synchroton-radiation center located close to the Orsay University Campus. We have been contacted by Soleil project researchers working in biocrystallography (which aims to identify the 3-D structure of the proteins). The objective is to supervise the operation of beamline PROXIMA 1 (under construction) so that fault conditions are diagnosed and, if necessary, corrected by automatic realignment procedures. At first, we have decided to use causal graphs to link defaults observed at the sample position and possible causes of misalignment. Several temporal causal graphs describing specific parts of the beamline have been designed. It appears that designing causal graphs is not that easy, especially because temporal reasoning is concerned. So we began to implement a causal graph editing tool. For instance, this tool can display graphically the abductive diagnosis and the temporal constraint propagation on the causal graph. Its main parts were implemented by a trainee student during summer 2004.
Also, during a visit to the European Synchroton Radiation Facility of Grenoble (ESRF), we had the opportunity to observe a beamline close to the future beamline PROXIMA1. It appears that several beamlines are built along this same scheme. So, we decided to build a generic static beamline model from which specific instances, such as PROXIMA1, could be derived as well as more abstract causal models which are more suited to abductive diagnosis.
Let us first remark that this piece of work has no strong relations with the main research stream of our group. It can be noted that a common work, involving P. Besnard, M.-O. Cordier and Y. Moinard, has begun on designing a logic for causality. This work stems on and is clearly related to diagnosis where observed symptoms has to be explained by faults. Notice that the problems encountered by workers on diagnosis have been one of the main motivations for introducing default logics and that an important part of the presently active work of causation is an illustration of this long lasting close relationship. A paper has been submitted on this preliminary work which will be more detailed in the next activity report.
We have continued our work on the inference by plausibility of Friedman and Halpern . Default reasoning allows to make tentative conclusions which are the best conclusions that can be drawn from the present state of knowledge. This concerns rules with exceptions, which are implicit in various domains. For instance, in order to compute the result of an action, many hidden hypothesis should be made. The best conclusion is to consider that, except for the exceptional cases which can be deduced from the given knowledge, everything is normal, since, without this assumption, either no conclusion could be drawn, or no real situation could be formalized. The inference by plausibility is an interesting recent proposal which can formalize default reasoning. In particular it is well fitted for applying the methods of knowledge compilation: a plausibility is assigned to each formula, a task which can be made "off-line", then the deductions consist only in comparing these plausibilities.
We have completed our proposal for a modified version of the original one, which was not fully satisfactory. In particular, we have shown that our modified version fits well with a general and promising proposal for default reasoning, , which is not the case for the original proposal. All our new results have reinforced our claim that the new proposal is superior to the original one. We are also investigating a characterization of our version of inference by plausibility, in terms of ``reasoning properties'', an important result for any potential user. Even if, in its present state, our characterization is not fully achieved (since it contains, together with simple and natural properties, one property too technical to be of real use), it is a step towards the final result. It must be noticed that this characterization can be easily modified in order to concern the original version of inference by plausibility.
This CRE no 171978 (External Research Contract) is a focused collaboration between the project Dream, the project Lande (M. Ducassé) and France Telecom R & D on the problem of detecting specific network attacks. This study is planned to last three years. The first objective is to evaluate the use of chronicles, patterns of temporally constrained events, for representing and detecting attack scenarios on telecommunication networks. The second objective is to learn or discover automatically such attack scenarios from network logs, either generated by a simulation process or really observed on active networks.
The project Sacadeau (Système d'Acquisition de Connaissances pour l'Aide à la Décision pour la qualité de l'EAU - Knowledge Acquisition System for Decision-Aid to Improve Streamwater Quality) has begun in October 2002. It is funded by Inra (French institute for agronomy research) and will last three years. The project involves the following partners: three INRA research groups (SAS from Rennes, LASB from Montpellier and BIA from Toulouse) and Irisa. It also involves experts belonging to the regional administrative entities. The project aims at building a decision-aid tool to help specialists in charge of the catchment management in order to preserve the streamwater quality. The proposal relies on the building of two coupled qualitative models: a transfer model to simulate the pesticide transfer through the catchment and a management model to simulate the farmer decisions concerning the application of pesticides and the weeding strategy. The final objective is to analyze simulation results by using learning and data mining techniques, to discover the discriminant variables and to acquire rules relating the climate, the farmer strategy, the catchment topology with the pesticide concentration in the stream.
This RNTS (Réseau National Technologies pour la Santé) project has begun at the end of 2003 and will last 3 years. The partners are ELA-Medical, the department of cardiology of the Rennes University Hospital, the LTSI-University of Rennes 1 and IRISA. The project is concerned with the conception of new cardiac devices, the study of which has begun during the instigative concerted action PISE. Its main concerns are: to propose and to evaluate new sensors able to assess the hemodynamic effects of a stimulation; to develop signal processing methods devoted to the specific signals measured by the new sensors and to refine, by using machine learning methods and chronicle recognition, the scenarios that may present some risk for an individual patient; to study different stimulation protocols taking into account the device specificities and constraints; to validate these concepts in clinical situations.
Members of the Dream team are involved in the following national collaboration programs:
Imalaia (common working group of the GdR Automatique, GdR- PRC I3 and Afia group) which brings together researchers from automatic and artificial intelligence fields on the subject of dynamic system monitoring. M.-O. Cordier is co-chair with L. Travé-Massuyès and F. Lévy.
RTP ``information and intelligence: reasoning and decision'' set up by the department STIC of the CNRS (M.-O. Cordier is a member of the steering committee).
GdR I3 working group GT 3.4 (machine-learning, knowledge discovery in databases, data mining - R. Quiniou, A. Salleb, A. Vautier).
Monet2 (European network of excellence on Model-based and Qualitative reasoning). Dream is particularly involved in the Bridge task group, which attempts to integrate AI and automatic methods for diagnosis and monitoring, and the biomedical task group which attempts to forge links between the fields of Biological Research and medical qualitative reasoning research (M.-O. Cordier and R. Quiniou).
Gianluca Torta, PhD student, University of Turino, visited us from may to july 2004. He gave a seminar on his joint work with P. Torasso. We started a common work (A. Grastien, M.-O. Cordier) on the use of temporal information in encoding observations in a diagnosis context.
AAI: Applied Artificial Intelligence (M.-O. Cordier).
AICOMs: Artificial Intelligence Communications (M.-O. Cordier).
JEDAI: Journal Electronique d'Intelligence Artificielle (M.-O. Cordier).
ARIMA: Revue Africaine de la Recherche en Informatique et en Mathématiques Appliquées (M.-O. Cordier).
Revue I3 (M.-O. Cordier).
RFIA'04, KR'04, DX'04, CARI'04 (M.-O. Cordier).
Co-chairs of the workshop "Temporal pattern extraction for on line detection of critical situations" at EGC 2005 (M.-O. Cordier, R. Quiniou).
ECCAI board member (in charge of the bimonthly bulletin) : M.-O. Cordier
Many members of the DREAM team are also faculty members and are actively involved in computer science teaching programs in Ifsic, INSA and ENSAR. Besides these usual teachings Dream is involved in the following programs:
Master in computer science (ifsic): RATS module: temporal and spatial reasoning (M.-O. Cordier, Y. Moinard, R. Quiniou).
Master in computer science (ifsic): DIAG module: diagnosis (M.-O. Cordier, S. Robin,L. Rozé).
A. Vautier has given a talk on "An extension of inductive databases" during the meeting of the GDR I3 working group 3.4, november 15th, Lyon.
A. Salleb has given a talk on "Quantitative association rule extraction" during the meeting of the GDR I3 working group 3.4, november 15th, Lyon.