Pulsar is focused on cognitive vision systems for Activity Recognition. We are particularly interested in the real-time semantic interpretation of dynamic scenesobserved by sensors. We thus study spatio-temporal activities performed by human beings, animals and/or vehicles in the physical world.
Our objective is to propose new techniques in the field of cognitive visionand cognitive systemsfor mobile object perception, behavior understanding, activity model learning, dependable activity recognition system design and evaluation. More precisely Pulsar proposes new computer vision techniques for mobile object perception with a focus on real-time algorithms and 4D analysis (e.g. 3D models and long-term tracking). Our research work includes knowledge representation and symbolic reasoning for behavior understanding. We also study how statistical techniques and machine learning in general can complement a priori knowledge models for activity model learning. Our research work on software engineering consists in designing and evaluating effective and efficient activity recognition systems. Pulsar takes advantage of a pragmatic approach working on concrete problems of activity recognition to propose new cognitive system techniques inspired by and validated on applications, in a virtuous cycle.
Within Pulsar we focus on two main applications domains: safety/securityand healthcare. There is an increasing need to automate the recognition of activities observed by sensors (usually CCD cameras, omni directional cameras, infrared cameras), but also microphones and other sensors (e.g. optical cells, physiological sensors). Safety/security application domain is a strong basis which ensures both a precise view of the research topics to develop and a network of industrial partners ranging from end-users, integrators and software editors to provide data, problems and fundings. Pulsar is also interested in developing activity monitoring applications for healthcare (in particular assistance for the elderly).
Our work has been applied in the context of more than 7 European projects such as AVITRACK, SERKET, CARETAKER, COFRIEND. We have industrial collaborations in several domains: transportation (CCI Airport Toulouse Blagnac, SNCF, INRETS, ALSTOM, RATP, Rome ATAC Transport Agency (Italy), Turin GTT (Italy)), banking (Crédit Agricole Bank Corporation, Eurotelis and Ciel), security (THALES R&T FR, THALES Security Syst, INDRA (Spain), EADS, Sagem, Bertin, Alcatel, Keeneo, ACIC, BARCO, VUB-STRO and VUB-ETRO (Belgium)), multimedia (Multitel (Belgium), Thales Communications, IDIAP (Switzerland), SOLID software editor for multimedia data basis (Finland)), civil engineering sector (Centre Scientifique et Technique du Bâtiment (CSTB)), computer industry (BULL), software industry (SOLID software editor for multimedia data basis (Finland), AKKA) and hardware industry (ST-Microelectronics). We have international cooperations with research centers such as Reading University (UK), ARC Seibersdorf research GMBHf (Wien Austria), ENSI Tunis (Tunisia), National Cheng Kung University (Taiwan), National Taiwan University (Taiwan), MICA (Vietnam), IPAL (Singapore), I2R (Singapore), NUS (Singapore), University of Southern California (USC), University of South Florida (USF), University of Maryland.
Pulsar is a Project team created in January 2008 in the continuation of the Orion project. François Brémond has taken the lead of the Pulsar team since September 2009 and Monique Thonnat, the former leader, has taken responsibilities at INRIA at the national level. Indeed, she is now both deputy scientific director at INRIA and researcher in Pulsar team. We have a new software platform called SUP (Scene Understanding Platform) for activity recognition based on sound software engineering paradigms. We have also continued original work on learning techniques such as data mining in large multimedia database. For instance, we have been able to learn the behavior profile of nine elderly people living in the Ger'home laboratory by processing more than 9 x 4 hours of video and sensor (e.g pressure and contact sensors) recordings. We have also learned gesture descriptors to recognize actions such as walking and jumping.
Pulsar conducts two main research axes : scene understanding for activity recognition and software engineering for activity recognition.
Scene understanding is an ambitious research topic which aims at solving the complete interpretation problem ranging from low level signal analysis up to semantic description of what is happening in a scene viewed by video cameras and possibly other sensors. This problem implies to solve several issues which are grouped in three major categories: perception, understanding and learning.
Software engineering methods allow to ensure genericity, modularity, reusability, extensibility, dependability, and maintainability. To tackle this challenge, we rely on the correct theoretical foundations of our models, and on state-of-the art software engineering practices such as components, frameworks, (meta-)modeling, and model-driven engineering.
Our goal is to design a framework for the easy generation of autonomous and effective scene understanding systems for activity recognition. Scene understanding is a complex process where information is abstracted through four levels: signal (e.g. pixel, sound), perceptual features, physical objects and events. The signal level is characterized by strong noise, ambiguous, corrupted and missing data. Thus to reach a semantic abstraction level, models and invariants are the crucial points. A still open issue consists in determining whether these models and invariants are given a priori or are learned. The whole challenge consists in organizing all this knowledge in order to capitalize experience, share it with others and update it along with experimentation. More precisely we work in the following research axes: perception (how to extract perceptual features from signal), understanding (how to recognize a priori models of physical object activities from perceptual features) and learning (how to learn models for activity recognition).
We are proposing computer vision techniques for physical object detection and control techniques for supervision of a library of video processing programs.
First for the real time detection of physical objects from perceptual features, we design methods either by adapting existing algorithms or proposing new ones. In particular, we work on information fusion to handle perceptual features coming from various sensors (several cameras covering a large scale area or heterogeneous sensors capturing more or less precise and rich information). Also to guarantee the long-term coherence of tracked objects, we are adding a reasoning layer to a classical bayesian framework, modeling the uncertainty of the tracked objects. This reasoning layer is taking into account the a priori knowledge of the scene for outlier elimination and long term coherency checking. Moreover we are working on providing fine and accurate models for human shape and gesture, extending the work we have done on human posture recognition matching 3D models and 2D silhouettes. We are also working on gesture recognition based on 2D feature point tracking and clustering.
A second research direction is to manage a library of video processing programs. We are building a perception library by selecting robust algorithms for feature extraction, by insuring they work efficiently with real time constraints and by formalising their conditions of use within a program supervision model. In the case of video cameras, at least two problems are still open: robust image segmentation and meaningful feature extraction. For these issues, we are developing new learning techniques.
A second research axis is to recognize subjective activities of physical objects (i.e. human beings, animals, vehicles) based on a priori models and the objective perceptual measures (e.g. robust and coherent object tracks).
To reach this goal, we have defined original activity recognition algorithms and activity models. Activity recognition algorithms include the computation of spatio-temporal relationships between physical objects. All the possible relationships may correspond to activities of interest and all have to be explored in an efficient way. The variety of these activities, generally called video events, is huge and depends on their spatial and temporal granularity, on the number of physical objects involved in the events, and on the event complexity (number of components constituting the event).
Concerning the modeling of activities, we are working towards two directions: the uncertainty management for expressing probability distributions and knowledge acquisition facilities based on ontological engineering techniques. For the first direction, we are investigating classical statistical techniques and logical approaches. For example, we have built a language for video event modeling and a visual concept ontology (including color, texture and spatial concepts) to be extended with temporal concepts (motion, trajectories, events ...) and other perceptual concepts (physiological sensor concepts ...).
Given the difficulty of building an activity recognition system with a priori knowledge for a new application, we study how machine learning techniques can automate building or completing models at the perception level and at the understanding level.
At the perception level, to improve image segmentation, we are using program supervision techniques combined with learning techniques. For instance, given an image sampling set associated with ground truth data (manual region boundaries and semantic labels), an evaluation metric together with an optimisation scheme (e.g. simplex algorithm or genetic algorithm) are applied to select an image segmentation method and to tune image segmentation parameters. Another example, for handling illumination changes, consists in clustering techniques applied to intensity histograms to learn the different classes of illumination context for dynamic parameter setting.
At the understanding level, we are learning primitive event detectors. This can be done for example by learning visual concept detectors using SVMs (Support Vector Machines) with perceptual feature samples. An open question is how far can we go in weakly supervised learning for each type of perceptual concept (i.e. leveraging the human annotation task). A second direction is the learning of typical composite event models for frequent activities using trajectory clustering or data mining techniques. We name composite event a particular combination of several primitive events.
Coupling learning techniques with a priori knowledge techniques is promising to recognise meaningful semantic activities.
The new proposed techniques for activity recognition systems (first research axis) are then contributing to specify the needs for new software architectures (second research axis).
The aim of this research axis is to build general solutions and tools to develop systems dedicated to activity recognition. For this, we rely on state-of-the art Software Engineering practices to ensure both sound design and easy use, providing genericity, modularity, adaptability, reusability, extensibility, dependability, and maintainability.
This year we focused on four aspects: the definition of a joint software platform with KEENEO, the study of model-driven engineering approaches to facilitate platform usage, the extension of behavioral models, and formal verification techniques to design dependable systems.
In the former project team Orion, we have developed two platforms, one (VSIP), a library of real-time video understanding modules and another one, Lama , a software platform enabling to design not only knowledge bases, but also inference engines, and additional tools. lamaoffers toolkits to build and to adapt all the software elements that compose a knowledge-based system or a cognitive system.
Pulsar will continue to study generic systems and object-oriented frameworks to elaborate a methodology for the design of activity recognition systems. We want to broaden the approach that led to Lamaand to apply it to the other components of the activity recognition platform, in particular to the image processing ones. We also wish to contribute to set up, in the long term, a complete software engineering methodology to develop activity recognition systems. This methodology should be based on model engineering and formal techniques.
To this end, Pulsar plans to develop a new platform (see figure ) which integrates all the necessary modules for the creation of real-time activity recognition systems. Software generators provide designers with perception, software engineering and knowledge frameworks. Designers will use these frameworks to create both dedicated activity recognition engines and interactive tools. The perception and evaluation interactive tools enable a perception expert to create a dedicated perception library. The knowledge acquisition, learning and evaluation tools enable a domain expert to create a new dedicated knowledge base.
This platform will rely on the Lamaexperiment, but necessitates some architectural changes and model extensions. We plan to work in the following three research directions: models (adapted to the activity recognition domain), platform architecture (to cope with deployment constraints such as real time or distribution), and system safeness (to generate dependable systems). For all these tasks we shall follow state-of-the-art Software Engineering practice and, if needed, we shall attempt to set up new ones.
The new platform should be easy to use. We should thus define and implement tools to support modeling, design, verification inside the framework. Another important issue deals with user graphical interfaces. It should be possible to plug existing (domain or application dependent) graphical interfaces into the platform. This requires defining a generic layer to accommodate various sorts of interfaces. This is clearly a medium/long term goal, in its full generality at least.
Developing integrated platforms such as SUP is a current trend in video surveillance. It is also a challenge since these platforms are complex and difficult to understand, to use, to validate, and to maintain. The situation gets worse when considering the huge number of choices and options, both at the application and platform levels. Dealing with such a variabilityrequires formal modeling approaches for the task specification as well as for the software component description.
Model Driven Engineering (MDE) is a recent line of research that appears as an excellent candidate to support this modeling effort while providing means to make models operational and even executable. Our goal is to explore and enrich MDE techniques and model transformations to support the development of product lines for domains presenting multiple variability factors such as video surveillance.
The long term scientific objective concerns research activities both on Model Driven Engineering and video surveillance. On the MDE side, we wish to identify the limits of current techniques when applied to real scale complex tasks. On the video surveillance side, the trend is toward integrated software platforms, which requires formal modeling approaches for the task specification as well as for the software component description.
This MDE approach is complementary to the Program Supervision one, which has been studied by Orion for a long time . Program Supervision focuses on programs, their models and the control of their execution. MDE also covers task specification and transformations to a design and implementation.
Pursuing the work done in Orion, we need to consider other models to express knowledge about activities, their actors, their relations, and their behaviors.
The evolution toward activity recognition requires various theoretical studies. We need to complement the knowledge representation model with a first-class notion of relations. The incorporation of a model of time, both physical and logical, is mandatory to deal with temporal activity recognition especially in real time. A fundamental concern is to define an abstract model of scenariosto describe and recognize activities. Supporting distributed systems is unavoidable for current software systems and it requires a model of distribution. These conceptual models will lead to define corresponding ontologies.
Finally, handling uncertainty is a major theme of Pulsar and we want to introduce it into our platform; this requires deep theoretical studies and is a long term goal.
Another aim is to build dependable systems. Since traditional testing is not sufficient, it is important to rely on formal verification techniques and to adapt them to our component models.
In most activity recognition systems, safeness is a crucial issue. It is a very general notion dealing with person and goods protection, respect of privacy, or even legal constraints. However, when designing software systems it will end up with software security. In Orion, we already provided toolkits to ensure validation and verification of systems built with Lama. First, we offered a knowledge base verification toolkit, allowing to verify the consistency and the completeness of a base as well as the adequacy of the knowledge with regard to the way an engine is going to use it. Second, we also provided an engine verification toolkit that relies on model-checkingtechniques to verify that the Blockslibrary has been used in a safe way during knowledge based system engine designs.
Generation of dependable systems for activity recognition is an important challenge. System validation really is a crucial phase in any development cycle. Partial validation by tests, although required in the first phase of validation, appears to be too weak for the system to be completely trusted. An exhaustive approach of validation using formal methods is clearly needed. Formal methods help to produce a code that has been formally proved and the size and frequency of which can be estimated. Consistently with our component approach, it appears natural to rely on component modeling to perform a verification phase in order to build safe systems. Thus we study how to ensure safeness for components whose models take into account time and uncertainty.
Nevertheless, software dependability cannot be proved by relying on a single technique. Some properties are decidable and they can be checked using formal methods at the model level. By contrast, some other properties are not decidable and they require non exhaustive methods such as abstract interpretation at the code level. Thus, a verification method to ensure generic component dependability must take into account several complementary verification techniques.
While in our research the focus is to develop techniques, models and platforms that are generic and reusable, we also make effort in the development of real applications. The motivation is twofold. The first is to validate the new ideas and approaches we introduced. The second is to demonstrate how to build working systems for real applications of various domains based on the techniques and tools developed. Indeed, the applications we achieved cover a wide variety of domains: intelligent visual surveillance in transport domain, applications in biologic domain or applications in medical domain.
The growing feeling of insecurity among the population led the private companies as well as the public authorities to deploy more and more security systems. For the safety of the public places, the video camera based surveillance techniques are commonly used, but the multiplication of the camera number leads to the saturation of transmission and analysis means (it is difficult to supervise simultaneously hundreds of screens). For example, 1000 cameras are viewed by two security operators for monitoring the subway network of Brussels. In the framework of our works on automatic video interpretation, we have studied the conception of an automatic platform which can assist the video-surveillance operators.
The aim of this platform is to act as a filter, sorting the scenes which can be interesting for a human operator. This platform is based on the cooperation between an image processing component and an interpretation component using artificial intelligent techniques. Thanks to this cooperation, this platform automatically recognize different scenarios of interest in order to alert the operators. These works have been realized with academic and industrial partners, like European projects PASSWORDS, AVS-PV, AVS-RTPW, ADVISOR, AVITRACK CARETAKER, SERKET and CANTATA and more recently, European projects VICoMo and COFRIEND, national projects SIC, VIDEOID, industrial projects RATP, CASSIOPEE, ALSTOM and SNCF. A first set of very simple applications for the indoor night surveillance of supermarket (AUCHAN) showed the feasibility of this approach. A second range of applications has been to investigate the parking monitoring where the rather large viewing angle makes it possible to see many different objects (car, pedestrian, trolley) in a changing environment (illumination, parked cars, trees shacked by the wind, etc.). This set of applications allowed us to test various methods of tracking, trajectory analysis and recognition of typical cases (occlusion, creation and separation of groups, etc).
We have studied and developed video surveillance techniques in the transport domain which requires the analysis and the recognition of groups of persons observed from lateral and low position viewing angle in subway stations (subways of Nuremberg, Brussels, Charleroi, Barcelona, Rome and Turin). We have worked with industrial companies (Bull, Vigitec, Keeneo) on the conception of a video surveillance intelligent platform which is independent of a particular application. The principal constraints are the use of fixed cameras and the possibility to specify the scenarios to be recognized, which depend on the particular application, based on scenario models which are independent from the recognition system.
In parallel of the video surveillance of subway stations, projects based on the video understanding platform have started for bank agency monitoring, train car surveillance and aircraft activity monitoring to manage complex interactions between different types of objects (vehicles, persons, aircrafts). A new challenge consists in combining video understanding with learning techniques (e.g. data mining) as it is done in the CARETAKER and COFRIEND projects to infer new knowledge on observed scenes.
In the environmental domain, Pulsar is interested in the automation of early detection of bioagressor, especially in greenhouse crops, in order to reduce pesticide use. Attacks (from insects or fungi) imply almost immediate decision-taking to prevent irreversible proliferation. The goal of this work is to define innovative decision support methods for in situearly pest detection based on video analysis and scene interpretation from multi camera data. We promote a non-destructive and non-invasive approach to allow rapid remedial decisions from producers. The major issue is to reach a sufficient level of robustness for a continuous surveillance.
During the last decade, most studies on video applications for biological organism surveillance were limited to constrained environments where camerawork conditions are controlled. By contrast, we aim at monitoring pests in their natural environment (greenhouses). We thus intend to automate pest detection, in the same way as the management of climate, fertilization and irrigation which are carried out by a control/command computer system. To this end, vision algorithms (segmentation, classification, tracking) must be adapted to cope with illumination changes, plant movements, or insect characteristics.
Traditional manual counting is tedious, time consuming and subjective. We have developed a generic approach based on a prioriknowledge and adaptive methods for vision tasks. This approach can be applied to insect images in order, first, to automate identification and counting of bio-aggressors, and ultimately, to analyze insect behaviors. Our work takes place within the framework of cognitive vision . We propose to combine image processing, neural learning, and a prioriknowledge to design a system complete from video acquisition to behavior analysis. The ultimate goal of our system is to integrate a module for insect behavior analysis. Indeed, recognition of some characteristic behaviors is often closely related to epicenters of infestation. Coupled with an optimized spatial sampling of the video cameras, it can be of crucial help for rapid decision support.
Most of the studies on behavior analysis have concentrated on human beings. We intend to extend cognitive vision systems to monitor non-human activities. We will define scenario models based on the concepts of statesand eventsrelated to interesting objects, to describe the scenarios relative to white insect behaviors. We shall also rely on ontologies (such as a a video event one). Finally, in the long term, we want to investigate data mining for biological research. Indeed, biologists require new knowledge to analyze bioagressor behaviors. A key step will to be able to match numerical features (based on trajectories and density distributions for instance) and their biological interpretations (e.g., predation or center of infestation).
This work takes place in a two year collaboration (ARC BioSERRE) between Pulsar (INRIA Sophia Antipolis - Méditerranée), Vista (INRIA Rennes - Bretagne Atlantique), INRA Avignon UR407 Pathologie Végétale (Institut National de Recherche Agronomique), CREAT Research Center (Chambre d'Agriculture des Alpes Maritimes) started in 2008.
In the medical domain, Pulsar is interested in the long-term monitoring of people at home, which aims at supporting the caregivers by providing information about the occurrence of worrying change in people behavior. We are especially involved in the Ger'home project, funded by the PACA region and Conseil Général (CG06), in collaboration with two local partners: CSTB and Nice City hospital. In this project, an experimental home that integrates new information and communication technologies has been built in Sophia Antipolis. The purpose concerns the issue of monitoring and learning about people activities at home, using autonomous and non-intrusive sensors. The goal is to detect the sudden occurrence of worrying situations, such as any slow change in a person frailty. We have also started collaboration with Nice hospital to monitor Alzheimer patients with the help of geriatric doctors. The aim of the project is to design an experimental platform, providing services and allowing to test their efficiency.
Since September 1996, the Orion team (and now the Pulsar team) distributes the program supervision engine Pegase, based on the Lamaplatform. The Lisp version has been used at Maryland University and at Genset (Paris). The C++ version ( Pegase+) is now available and is operational at ENSI Tunis (Tunisia) and at CEMAGREF, Lyon (France).
SUPis a Scene Understanding Software Platform written in C and C++ (see figure ). SUP is the continuation of the VSIP platform. SUP is splitting the workflow of a video processing into several modules, such as acquisition, segmentation, etc. until scenario recognition. Each module has a precise interface, and different plugins implementing these interfaces can be used for each step of the video processing. This generic architecture is designed to facilitate:
integration of new algorithms in SUP;
sharing of the algorithms among the team.
Currently, 15 plugins are available, covering the whole processing chain. Several plugins are using the Genius platform, an industrial platform based on VSIP and exploited by Keeneo, the Start-up created in July 2005 by the research team.
Goals of SUP are twofolds:
From a video understanding point of view, the goal of SUP is that all the researchers of the Pulsar team can share the implementations of their researches through this platform;
From a software engineering point of view, the goal is to integrate the results of the dynamic management of the applications when applied to video surveillance.
The Clem Toolkit is a set of tools devoted to design, simulate, verify and generate code for le , programs. This latter is a synchronous language supporting a modular compilation. The language also supports automata possibly designed with a dedicated graphical editor. The Clem toolkitcomes with a simulation tool. Hardware description (Vhdl) and software code (C) are generated for leprograms. Moreover, we also generate files to feed the NuSMV model checker in order to perform validation of program behaviors.
This year, Pulsar has tackled several scene understanding issues and proposed new algorithms in the three following research axes:
Perception: people detection and human gesture recognition;
Understanding: multi-sensor activity recognition and gesture recognition using learned local motion descriptors;
Learning: online and offline trajectory clustering.
In the framework of BioSerre project, we investigate a video-surveillance solution for an early detection of pest attacks in the framework of pest management methods. Our system is to be used in a greenhouse endowed with a (Wifi) network of video-cameras. This year we have presented this work in June 2009, in Paris, during the Salon Européen de la Recherche et de l'Innovation(SERI'09), in October 2009, in Bordeaux during the Journées des ARCs de l'INRIA and in December 10th-11th, in Sophia Antipolis for the 2009 INRA-INRIA Seminary.
On top of the classical challenges in video-surveillance (lighting changes, shadows, etc.), we have to face specific challenges:
The high resolution of the video frames needed by the application (about 1.3Mega pixels per frame and about 2 frames every 1.5 sec.), which is necessary to visualize the insects of interest, but constitutes a serious challenge for quasi-real time processing;
The very low spatial resolution and color contrast of the harmful insects of interest in the videos;
The lack of powerful discriminative features in the insect species of interest, because their low spatial resolution in the videos does not allow us to see their detailed shapes.
The application is divided into different modules. A video acquisition moduleacquires videos from the remote cameras and stores them locally in a PC where the core of the application is running. Thanks to the trap extraction module, only the region of interest in a video (the trap area) is processed. Then, a background substraction modulemaintains a statistical model of the background, and detects pixels which deviate significantly from the learned background model. The detected pixels are then processed by an insect presence detection module(IPDM) to decide whether they are likely to be insect or not (e.g. due to illumination changes). A video-frame being divided into many patches (to speed up the subsequent image processing), a mere counting of the pixels that are classified as insect pixels by IPDM allows the system to vote for the patch which will be processed by the next module, insect detection module(IDM). The IDM consists of low level image processing operations (RGB to gray scale image transformation, image convolution, image differentiation, local maxima extraction, perceptual grouping, etc.) and relies on a rough prior geometric model about insects of interest (i.e., a salient rectangular intensity profile) to extract the patterns likely corresponding to be insects of interest. The insect classification moduleinvolves some additional processing in order to classify the extracted patterns into insects or to fake patterns generated either by noise or by illumination reflections. In order not to redo the detection of a previously detected insect, a (cheap) insect tracking modulemaintains a list of the already detected insects in the previous frames and updates it whenever a new insect is detected by IDM and confirmed by insect classification. All these routines are repeated continuously during scheduled daytime, and the counting results are stored and analyzed in quasi-real time.
We have shown the feasibility of a video-surveillance system for pest management in a greenhouse and managed to overcome most of the initially posed image processing challenges. Currently, we are in the phase of testing and refining our algorithms. The currently developed prototype should be deployed soon in actual greenhouse sites of our INRA partners (Avignon) for further testings and validations.
This project aims at developing an intelligent system for the real time surveillance of the plasma evolving in Tore Supra or other devices. The first goal is to improve the reliability and upgrade the current real time control system operating at Tore Supra. The ultimate goal is to integrate such a system into the future ITER imaging diagnosis. In this context, a first collaboration has recently started between the Plasma Facing Component group of CEA Cadarache and the Pulsar project-team. The goal is to detect events (expected or not) in real-time, in order to control the power injection for the protection of the Plasma Facing Components (PFC). In the case of a known event, the detection must lead to the identification of this event. Otherwise, a learning process is proposed to assimilate this new type of event. In such way, the objective of the project is twofold: machine protection and thermal event understanding. The system may take multimodal data as inputs: plasma parameters (plasma density, injected power by heating antenna, plasma position...), infrared and visible images, and signals coming from others sensors as spectrometers and bolometers. Recognized events are returned as outputs with their characteristics for further physical analysis. In this application, we benefit from the large amount of available data accumulated during thousands of pulses and for several devices. We rely on an ontology-based representation of the a priori domain expert knowledge of thermal events. This thermal event ontology is based on visual concepts and video event concepts useful to describe and to recognize a large variety of events occurring in thermal videos.
New results in thermal event detection
This year, we have focused on the improvement and the assessment of the thermal event detection system. As seen in Table , the proposed approach outperforms the previous system based on detection on threshold overrun in specified regions of interest. Improvements are visible both in terms of sensitivity (less false negative than the previous system) and precision (no false positives). This work has been published in . An extended journal version has been accepted and will be be published next year.
Antenna | no. of | Annotated | no. of detected arcing events | |||||
pulses | arcing events | TP | FN | FP | ||||
CS | PA | CS | PA | CS | PA | |||
C2 | 11 | 73 | 68 | 70 | 5 | 3 | 51 | 0 |
C3 | 7 | 17 | 11 | 13 | 6 | 4 | 7 | 0 |
C2 + C3 | 18 | 90 | 79 | 83 | 11 | 7 | 58 | 0 |
New results in thermal event understanding
In the case of a complex thermal event, we have studied learning techniques for temporal behaviors modeling. Our study case focuses on B4C flakes on the vertical edge of the Faraday screen. It is due to the flaking of the B4C coating consequently to the heating caused by fast ion losses. Temperature may overpass the acceptable threshold without apparent risk of damage. The recognition of a B4C flake is not evident and relies on a fine physical analysis based on the hot spot temporal behavior. Our goal is to build statistical models from training samples composed of positive and negative examples of B4C flakes. We used Hidden Markov Models trained with temperature of detected hot spots and injected power as inputs. Preliminary results are convincing and further evaluation of the proposed approach needs to be pursued. Finally, this approach may be extended for the modeling and the automatic recognition of other complex thermal events.
New software development
We are currently developing a Plasma Imaging data Understanding Platform (PInUP) dedicated to thermal event recognition and understanding. This platform is inspired from VSUP and is composed of several modules and knowledge bases. PInUP embeds also a dedicated tool for video annotation. The goal of this tool is twofold: first to build an annotation base of observed thermal events with precise spatiotemporal information, and second to retrieve from the resulting base useful information on thermal events for further PFC aging analysis for instance (see Figure ).
This platform is going to be deployed at Tore Supra and will be used by physicists and person in charge of the infrared imaging diagnostics. Concerning the real-time detection of thermal events, we are currently implementing most costly algorithms into a FPGA to reach real-time constraints. The real-time monitoring system should be operational in April 2010 and will be working in parallel of the existing system at Tore Supra.
Human activity requires the detection and tracking of people in often congested scenes captured by surveillance cameras. The common strategies used to detect objects with high frame rates rely on segmenting and grouping foreground pixels from a background scene captured by static cameras. The detected objects in a 3D calibrated environment are then classified according to predefined 3D models such as persons, luggage or vehicles. However, whenever occlusion occurs, objects are no longer classified as single individuals but are associated to a group of objects. Hence, standard tracking systems fail to differentiate objects within a group and often looses their tracks.
One way to handle occlusion issues is to use multiple cameras viewing the same scene but at different locations in order to cover the possible field of views where occlusion occurs e.g. when one camera fails to track one person, another camera takes over the tracking of this person. People detection in difficult scenarios can also be improved by extracting local descriptors. In the PULSAR team, we have used Histograms of Oriented Gradient, i.e. HOG, to model human appearance and people faces. The aim is to model body parts and people poses of people to better define people shapes. Figures shows results obtained by the HOG face detector and figures show tracking results from people detected by the HOG human detector.
We are not only adding visual signatures to better track people in independent cameras but we are also using visual signatures to allow people tracking in different cameras. These visual signatures would also allow us to re-identify people in more complex networks of cameras where camera fields of view do not overlap e.g. in underground stations or airports. It is thus desirable to determine whether a given person of interest has previously been observed by other cameras in such network of cameras. This constitutes the person re-identification issue. We have focused our first re-acquisition algorithm to combine Haar-like features with dominant colours extracted on mobile objects by the HOG based human detector described above.
People visual signatures are described by Haar-like descriptors shown in figure and by dominant colours extracted from two regions: the upper and lower part as shown in figures . The Adaboost algorithm is adapted to take the people visual signatures as input and to construct a model for each individual. We have tested our algorithm with two non overlapping cameras scenario: 10 people with Caviar database and 40 people in TrecVid database.
Detecting mobile objects is an important task in many video analysis applications such as video surveillance, people monitoring, video indexing for multimedia. Among various object detection methods, the ones based on adaptive background subtraction such as Gaussian mixture model, Kernel density estimation method, Codebook model are the most popular. However, the background subtraction algorithm alone can not easily handle various problems such as adapting to changes of environment, removing noise, detecting ghosts. To help background subtraction algorithms to deal with these problems we have constructed a controller for managing object detection algorithms. Being independent from one particular background subtraction algorithm, this controller has two main tasks:
Supervising background subtraction algorithms to update their background representation.
Adapting parameter values of background subtraction algorithms to be suitable for the current conditions of the scene.
To supervise background subtraction algorithms to update their background representation, the controller employs the feedback from the classification and tracking task. With this feedback, the controller can ask background subtraction algorithms to apply appropriate updating strategies for different blob types. For example, if the feedback from the classification and tracking tasks identify a noise region, the controller will ask background subtraction algorithms to update the corresponding region quickly so that this noise does not occur again in the detection results. With the updating supervision of the controller, background subtraction algorithms can handle various problems such as removing noise, keeping track of objects of interest, managing stationary objects, and removing ghosts.
To adapt the parameter values of background subtraction algorithms, the controller first needs to evaluate the foreground detection results. This evaluation is realized with the help of the feedback from classification and tracking tasks. Based on this evaluation, the background subtraction algorithm may change its parameter values to have a better performance.
The notion of shape is important in many fields of computer vision, from tracking to scene understanding. As for usual object features, it can be used as a prior, as in image segmentation,
or as a source of information, as in gesture classification. When image classification or segmentation tasks require high discriminative power or precision, the shape of objects naturally
appears relevant to our human minds. However, shape is a complex notion which cannot be dealt with directly like a simple parameter in
Rn. Modeling shape manually is tedious, and one arising question is the one of learning shapes automatically.
Shape evolutions, as well as shape matchings or image segmentation with shape prior, involve the preliminary choice of a suitable metric in the space of shapes. Instead of choosing a particular one, we propose a framework to learn shape metrics from a set of examples of shapes, designed to be able to handle sparse sets of highly varying shapes, since typical shape datasets, like human silhouettes, are intrinsically high-dimensional and non-dense. We formulate the task of finding the optimal metrics on an empirical manifold of shapes as a classical minimization problem ensuring smoothness, and compute its global optimum fast.
To achieve this, we design a criterion to compute point-to-point matching between shapes which deals with topological changes. Then, given a training set of shapes, we use these matchings to transport deformations observed on any shape to any other one. Finally, we estimate the metric in the tangent space of any shape, based on transported deformations, weighted by their reliability. We performed successful experiments on difficult sets, in particular we considered the case of a girl dancing fast (figure ). For each shape from the training set of shapes, we estimate the most probable deformations (see figure ) that it can undergo. More precisely we estimate the shape metrics (deformation costs) that fits the training set the best, which leads to a shape prior. We also proposed applications.
The novelty in this work is both theoretic and practical. On the theoretical side, usual approaches consist either in estimating a mean shape pattern and characteristic deformations, or in using kernels based on distances between shapes, while here the framework is based on reliable deformations and transport, and we provide a criterion on metrics to be minimized. On the practical side, usual approaches require either low shape variability in the training set, or a high sampling density, and these are not affordable in practice. Our assumptions are much weaker so we can deal with much more general datasets, for example the framework is well suited to videos. This work was published in . The links between texture and shape are also being studied, following an approach developed in .
A tracking algorithm can provide satisfying tracking results in some scenes and poor results in other real world scenes. A measure of performance evaluation of these algorithms is necessary to quantify how reliable a tracking algorithm is in a particular scene. Many types of metrics have been proposed and defined to address this issue but most of them are dependent on ground truth data in order to compare tracking results. We propose in this work a new online evaluation method that is independent from ground truth data. We want to compute the quality (i.e. coherence) of the obtained trajectories based on a set of seven features. Based on the frequency of each feature for the mobile objects, these features are divided into two groups: “one time features“ and “every time features“. While “one time features“ are the features that can be computed once (one unique time) for a mobile object (e.g. temporal length of its trajectory, zone where the object leaves the scene), “every time features“ can be computed for each frame during the tracked duration (e.g. color, speed, direction, area and shape ratio).
For each feature, we define a local score in the interval [0, 1]to determine whether the mobile object is correctly tracked or not. The quality of a trajectory is estimated by the summation of local scores computed from the extracted features. The score decreases when the system detects a tracking error and increases otherwise.
Using the seven features a global score which is in interval [0, 1], is defined to evaluate online the quality of a tracking algorithm at each frame. When the global score is greater than 0.5, we can say that the tracking algorithm performance is rather good. Whereas if the value of the global score is lower than 0.5, that means the tracker generally fails to track accurately the detected objects.
We have tested our approach in the video sequences of Caretaker project
We propose a multiple object tracking algorithm working with occlusions. First, for each detected object we compute feature points using FAST algorithm . Second, for each feature point we build a descriptor based on Histogram of Oriented Gradients (HOG) . Third, we track feature points using these descriptors. Object tracking is possible even if objects are partially occluded. If few objects are merged and detected as a single one, we assign newly detected feature points in such single object to one of these occluded objects. We apply a probabilistic method for this task using information from the previous frames like object size and motion information (i.e. speed and orientation). We use multi resolution images to decrease the processing time. Our approach has been tested on a synthetic video sequence and the public datasets KTH and CAVIAR. The preliminary tests confirm the effectiveness of our approach. This work has been published in .
An extension of this tracker have been proposed for crowd analysis. The recognition in real time of crowd dynamics in public places are becoming essential to avoid crowd related disasters and ensure safety of people. We introduce a new approach for Crowd Event Recognition. Our study begins with the previous tracking method, based on HOG descriptors, to finally use pre-defined models (i.e. crowd scenarios) to recognize crowd events. We define these scenarios using statistics analysis from the data sets used in the experimentation. The approach is characterized by combining a local analysis with a global analysis for crowd behavior recognition. The local analysis is enabled by a robust tracking method, and global analysis is done by a scenario modeling stage. This work has been published in .
We aim at recognizing gestures (e.g. hand raising) and more generally short actions (e.g. falling, bending) accomplished by an individual in a video sequence. Many techniques have been already proposed for gesture recognition in specific environment (e.g. laboratory) using the cooperation of several sensors (e.g. camera network, individual equipped with markers). Despite these strong hypotheses, gesture recognition is still brittle and often depends on the position of the individual relatively to the cameras. We propose to reduce these hypotheses in order to conceive a general algorithm enabling the recognition of the gesture of an individual acting in an unconstrained environment and observed through a limited number of cameras. The goal is to estimate the likelihood of gesture recognition in function of the observation conditions.
We propose a gesture recognition method based on local motion learning. First, for a given individual in a scene, we track feature points over its whole body to extract the motion of the body parts. Hence, we expect that the feature points are sufficiently distributed over the body to capture fine gestures. We have chosen corner points as feature points to improve the detection stage and HOG (Histogram of Oriented Gradients) as descriptor to increase the reliability of the tracking stage. Thus, we track the HOG descriptors in order to extract the local motion of the feature points.
In order to recognize gestures, we propose to learn and classify gesture based on the k-means clustering algorithm and the k-nearest neighbors classifier. For each video in a training dataset, we generate all local motion descriptors and annotate them with the associate gesture. Then, for each training video taken separately, the descriptors are clustered into k clusters using the kmeans clustering algorithm. The k parameter is set up empirically. Each cluster is associated to its corresponding gesture, so similar clusters can be labeled with different gestures. Finally, with all generated clusters as a database, the k-nearest neighbor classifier is used to classify gestures occurring in the test dataset. A video is classified according to the amount of neighbors which have voted for a given gesture providing the likelihood of the recognition.
We demonstrate the effectiveness of our motion descriptors by recognizing the actions of KTH and IXMAS public databases. This work has been published in , .
Participants:Nadia Zouba, Valery Valentin, Bernard Boulay, François Brémond, Monique Thonnat
In the framework of monitoring elderly activities at home, we have proposed an approach combining heterogeneous sensor data for recognizing elderly activities at home. This approach consists in combining data provided by video cameras with data provided by environmental sensors to monitor the interaction of people with the environment.
In this work we have done a strong effort in event modeling. The result is 100 models representing our knowledge base of events for home care applications. This knowledge base can be used in other applications in the same domain.
In this approach we have also proposed a sensor model able to give a coherent representation of the information provided by various types of physical sensors. This sensor model includes an uncertainty in sensor measurement.
The approach is used to define a behavioral profile for each person and to compare these behavioral profiles. The first step to establish a behavioral profile of an observed person is to determine his/her daily activities. This behavioral profile is defined as a set of the most frequent and characteristic (i.e. interesting) activities of an observed person. The basic goal of determine behavioral profile is to measure variables from persons during their daily activities in order to capture deviations of activity and posture to facilitate timely intervention or provide automatic alert in emergency cases.
In order to evaluate the whole proposed activity monitoring framework, several experiments have been performed. The main objectives of these experiments are to validate the different phases of the activity monitoring framework, to highlight interesting characteristics of the approach, and to evaluate the potential of the framework for real world applications.
The results of this approach are shown for the recognition of Activities of Daily Living (ADLs) of real elderly people living in an experimental apartment (Gerhome laboratory) equipped with video sensors and environmental sensors.
Results comparing volunteer 1 (male of 64 years) and volunteer 2 (female of 85 years), observed during 4 hours, show the greater ADLs ability of the 64 years old adult as compared to that of the 85 years old:
Volunteer 1 of 64 years changed zones more often than volunteer 2 of 85 years (for "entering livingroom" 20 vs. 13), and did this at a quicker pace (00:01:15 vs. 00:02:42), showing a greater ability to walk.
Volunteer 1 was more often seen "sitting on chair" (15 vs. 4), but volunteer 2 was "sitting on chair" for a longer duration (03:30:29 vs. 01:36:43), showing also a greater ability for the volunteer 1 to move in the apartment.
Volunteer 1 was using more the "upper cupboard" than the volunteer 2 (22 vs. 9), and in a quicker way (00:00:57 vs. 00:04:43).
Volunteer 1 was more able to use the stove (less trials for "using stove" 35 vs. 106).
Similarly volunteer 1 was "bending" twice as much as volunteer 2 (30 vs. 15), and in a quicker way (00:00:03 vs. 00:00:12), showing greater dynamism for the younger volunteer.
More details about the proposed approach and the obtained results are described in , and in .
In the current work, the proposed activity recognition approach was evaluated in the experimental laboratory (Gerhome) with fourteen real elderly people. The next step of this work requires to test this approach in hospital environment (see the next section) involving more people with different wellness and different health status (e.g. Alzheimer Patients).
Participants:Rim Romdhame, Daniel Zullo, Nadia Zouba, Bernard Boulay, François Brémond, Monique Thonnat
We propose to develop a framework for monitoring Alzheimer patients as the continuity of the ADL monitoring framework. The basic goal is to determine behavioral profiles of Alzheimer people and evaluate these profiles. With the help of doctors a specific scenario has been established to evaluate the behaviors of Alzheimer patients.
Some experiments have been performed in a room in CHU of Nice equipped with video where elderly people and medical volunteers have spend between 15 min and 1 hour:
1 Alzheimer Volunteer (80 years old)
3 Elderly Volunteers (64-85 years old)
5 Young Volunteers (20-30 years old)
1 medical staff Volunteer (25-30 years old)
The second goal of this framework is to handle the uncertainty of event recognition. Most previous approaches, able to recognize events and handling uncertainty, are 2D approaches which model an activity as a set of pixel motion vectors. These 2D approaches can only recognize short and primitive events but cannot address composite events. We propose a video interpretation approach based on uncertainty handling. The main goal is to improve the techniques of automatic video data interpretation taking into account the imprecision of the recognition. To attain our goal, we have extended the event recognition described by T.VU by modeling the scenario recognition uncertainty and computing the precision of the 3D information characterising the mobile objects moving in the scene. We have used the 3D information to compute the spatial probability of the event. We have also computed the temporal probability of an event based on its spatial probability at the previous instant. This approach is validated using a homecare application which tracks elderly people living at home and recognizes events of interest specified by gerontologists.
Trajectory information is a rich descriptor, which can produce essential information for activity learning and understanding. Our work on analysis of underground stations with trajectory clustering has shown that trajectory patterns associated to the clusters are indicative of specific behaviors and activities. Our new studies explore the application of trajectory clustering on two (different behavior types) new domains: 1) Monitoring of elderly people at home; 2) Monitoring the ground activities at an airport dockstation.
1) Monitoring of elderly people at home.
In this work we propose a framework to recognize and classify loosely constrained activities with minimal supervision. The framework uses basic trajectory information as input and goes up to video interpretation. The work reduces the gap between low-level information and semantic interpretation, building an intermediate layer of Primitive Events. The proposed representation for primitive events aims at capturing small coherent units of motion over the scene with the advantage of being learnt in an unsupervised manner. We propose the modeling of an activity using Primitive Events as the main descriptors. The activity model is built in a semi-supervised way using only real tracking data.
The approach is composed of 5 steps. First, people are detected and tracked in the scene and their trajectories are stored in a database, using a region based tracking algorithm. Second, the topology of the scene is learnt, using the regions where the person usually stands and stops interacting with fixed objects in the scene. Third, the transitions between these Slow Regions are computed by cutting the observed person trajectory. These transitions correspond to short unit of motion and can be seen as basic elements constituting more complex activities. Fourth, a coarse activity ground-truth is manually performed on a reference video corresponding to the first monitored person. Primitive Event histograms are computed and labelled by performing a matching stage with this ground-truth. Fifth, using these labeled Primitive Event histograms the activities of a second monitored person can be automatically recognized.
We validate the approach by recognizing and labeling modeled activities in a home-care (Gerhome project) application. The used video datasets capture the living room and kitchen of an apartment. Each video dataset contains an aged person performing activities, such as "the person is eating". These activities are learnt and discovered using different video datasets. This work has been published in .
2) Monitoring the ground activities at an airport dockstation.
In this work we employ trajectory-based analysis for activity extraction from apron monitoring in the Toulouse airport in France. We aim at helping the infrastructure managers; for the everyday operation we provide environmental figures, which include location and number of people in the monitored areas (occupation map), and the activities themselves. We have built a system thus to 1) learn which monitored areas are normally occupied, and then 2) perform activity pattern discovery with interpretable semantic.
Trajectory clustering is employed mainly to discover the points of entry and exit of mobiles appearing in the scene. Proximity relations between resulting clusters of detected mobiles as well as between clusters and contextual elements from the scene are employed to build the occupancy zones and characterise the ongoing different activities of the scene. We study the scene activity at different granularities which give the activity description in broad terms, or with detailed information thus managing different information levels. By including temporal information we are able to find spatio-temporal patterns of activity. Thanks to an incremental learning procedure, the system is capable to handle large amounts of data. We have applied our algorithm to five video datasets corresponding to different monitoring instances of an aircraft in the airport docking area. This corresponds to about five hours of video analysed. Figure shows occupancy zones at the system-selected information levels for scene activity reporting. We elaborate activity maps with a semantical description of the discovered zones (and thus of the associated activities). From the analysed sequences, we were able to recognize activities such as 'GPU arrival', 'Loading', 'Unloading'.
This year Pulsar has developed a new software platform: the SUP platform. It is the backbone of the team experiments to implement the new algorithms proposed by the team in perception, understanding and learning. We study a meta-modeling approach to support the development of video surveillance applications based on SUP. We also introduce the notion of relation in our knowledge description language. We study the development of a scenario recognition module relying on formal methods to support activity recognition in SUP platform. We began to study the definition of multiple services for device adaptive platform for scenario recognition. We continue to develop the Clem toolkit around a synchronous language dedicated to activity recognition applications.
subsubsectionSUP Software Platform
SUP is made as a framework allowing several video surveillance workflows to be implemented. Currently, the workflow is static for a given application. A given workflow is the composition of several plugins, each of them implementing an algorithmic step in the video processing (i.e. the segmentation of images, the classification of objects, etc.).
The design of SUP allows to execute at runtime the selected plugins. Currently 15 plugins are available:
6 plugins are wrappers on industrial implementations of algorithms (made available by Keeneo). They allow a quick deployment of a video processing chain encompassing image acquisition, segmentation, short-term and long-term tracking. These algorithms are robust and efficient algorithms, but with the drawback that some algorithms can lack of accuracy.
9 are implementations by the team members which cover the following fields:
one segmentation removing the shadows;
two classifiers, one being based on postures and one on people detection;
four frame-to-frame trackers, using as algorithm: (i) a simple tracking by overlapping, (ii) neural networks, (iii) tracking of feature points, (iv) tracking specialized for the tracking of persons in a crowd.
two scenario recognizers, one generic allowing expression of probabilities on the recognized events, and the other one focusing on the recognition of events based on postures.
From a software engineering point-of-view, the goal is to obtain a platform being dynamically reconfigurable as described in next section.
This year we have explored how model-driven engineering techniques can support the configuration and dynamic adaptation of the video surveillance systems designed with our SUP platform. This work is done in collaboration with the MODALIS team of UNSA/CNRS I3S laboratory.
In the video surveillance community, the focus has moved from individual vision algorithms to integrated and generic software platforms (such as SUP), and now to the security, scalability, evolution, and ease of use of these platforms. The last trends require a modeling effort for video surveillance component platforms as well as for application specification. Our approach is to apply modeling techniques to the application specification (describing the video surveillance task, and its context) as well as to the implementation (assembling the software components). Model-driven engineering uses models to represent partial or complete views of an application or a domain, possibly at different abstraction levels. MDE offers techniques to transform a source model into a target one.
The number of different tasks (such as detection, counting, or tracking), the complexity of contextual information, and the relationships among them induce many possible variants. The first activity of a video surveillance application designer is to sort out these variants to precisely specify the function to realize and its context. In our case study, the underlying software architecture is component-based (SUP). The processing chain consists of components that transform data before passing it to other components. As a result, the designer has to map this specification to software components that implement the needed algorithms.
The challenge is to cope with the many –functional as well as nonfunctional– causes of variability on both sides (specification and implementation). Hence, we first decided to separate these two concerns. We then applied domain engineering to identify the reusable elements on both sides. This led to two models: a generic model of video surveillance applications (for short application model) and a model of video processing components and chains (component platform configuration model, for short platform model). Both of them are feature modelsexpressing variability factors. Feature models are a popular formalism used to model software product lines commonalities and variabilities. They compactly define all features in an product line and their valid combinations. A feature model is basically an AND-OR graph with constraints which organizes hierarchically a set of features while making explicit the variability. Our models are also enriched with intra- and inter-models constraints. Inter-models constraints specify how the system should adapt to changes in its environment. It is convenient to use the same kind of models on both sides, leading to a uniform syntax. Feature models are appropriate to describe variants; they are simple enough for video surveillance experts to express their requirements. Yet, they are powerful enough to be liable to static analysis . In particular, the inter-feature constraints can be analyzed as SAT problem.
The application model describes the relevant concepts and features from the stakeholders' point of view, in a way that is natural in the video surveillance domain: characteristics and position of sensors, context of use (day/night, in/outdoors, target task)... It also includes notions of quality of service (performance, response time, detection robustness, configuration cost...). Such a model provides a user friendly way to specify a problem for SUP. The platform model describes the different software components and their assembly constraints (ordering, alternative algorithms...). Ultimately, we wish to automatically generate a SUP component assembly from an application specification, using model to model transformations. The first model can be transformed into one or several valid component configurations, in order to map the specification onto software components that implement the needed algorithms. Due to the multiple causes of variability, the result of the transformation is usually a set of possible component assemblies fulfilling the task and the context specification; hence the designer has to manually fine tune the assembly. This approach allows designers to define the static initial configuration of a video surveillance system .
Concretely, we have developed a generic feature diagram editor to manipulate the models, using ECLIPSE meta-modeling facilities (EMF, ECORE, GMF...). At this time, we have a first prototype that allowed us to represent both models. However, the current tool only supports natural language constraints, but we have experimented the KERMETA workbench to implement some model to model transformations. We have also developed an interface with SAT tools to verify the generated configurations.
An additional challenge is to manage the dynamic variability of the context to cope with possible run-time change of implementation triggered by context variations (e.g. lighting conditions, changes in the reference scene, etc.). Video surveillance systems are indeed a good example of dynamic adaptive systems, i.e. software systems which have to dynamically adapt in order to cope with a changing environment. Such systems must be able to sense their environment, to autonomously select an appropriate configuration and to efficiently migrate to this configuration. Handling these issues at the programing level proves to be challenging due to the large number of contexts and of software configurations. The use of models at run-time is an alternative solution. In our approach, the adaptation logic is defined by adaptation rules, corresponding to the dependency constraints between specification elements in one model and software variants in the other . A context change at run-time corresponds to a new configuration in the application model and must lead to a new configuration of the platform model. Adaptation rules use the constraint language of feature models (i.e. propositional logic-based language). These rules address features possibly connected with “and”, ”or”, “not”, their actions correspond to changes in the processing chain. As an example, the following adaptation rule:
Night and HeadLight implies HeadLightDetection
states that if the context changes (from day) to night and if headlights (e.g. of vehicles) must be taken into account, a new component from the platform (HeadLightDetection module) must be integrated in the running processing chain. To ensure its usability, the proposed approach has been built on top of the Domain Specific Modeling Language (DSML ) for adaptive systems.
This research axis contributes to the study and the development of a module to analyze scenarios.
This year we have studied models of scenarios dealing with both real time (to be realistic and efficient in the analyze phase) and logic time (to benefit from well-known mathematical models allowing re-usability, easy extension and verification). Scenarios are mostly used to specify the way a system may react to sensor inputs. Therefore, models of scenarios must also take into account the uncertainty of sensor results. To address these needs (logic time, real time and uncertainty) we have defined a language to express scenarios as compositions of sub-scenarios. Basic scenarios are composed of events, while general scenarios are expressed as composition of temporal relations (before, during, overlap) between sub-scenarios. Temporal constraints can also be expressed. Moreover, the language supports the definition of external types and functions in order to allow scenarios to handle events of different kinds.
For behavior recognition, as for all automated systems, validation is really a crucial phase and an exhaustive approach of validation is clearly needed. To be trusted behavior analyze must rely on formal methods from the very beginning of its design. Formal methods help to produce a sound code the size and frequency of which can be estimated. Hence, we defined a synchronous model for our scenario language. This modeling supports the representation of scenarios as equation systems which compute two Boolean variables for each scenario: beginningtrue when the first event(s) or sub scenario(s) of the scenario is recognized and terminationtrue when the scenario is recognized. This theoretical approach leads to the definition of a scenario analyze module ( sam). This module provides users with (1) a simulation tool to test scenario behaviors; (2) a recognition program in C for each scenario which must be completed by the definition (by the user) of external types and functions in a C or C++ environment; (3) an exhaustive verification of safety properties relying on model checking techniques our approach allows. This latter offers also the possibility to define the safety properties we want to prove as “observers” expressed in the scenario language.
Activity recognition and monitoring systems based on multi-sensor and multi-device approaches are more and more popular to enhance events production for scenario analysis. Underlying software and hardware infrastructures can be considered as static (no changes during the overall recognition process quasi-static (no changes during two reconfigurations of the process) or really dynamic (depending on dynamic appearance and disappearance of numerous sensors and devices in the scene, communicating with the system, during recognition process).
In this last case, we need to partially and reactively adapt the application to the evolution of the environment, while preserving invariants required for the validity of the recognition process.
In order to address such a challenge our researches try to federate the inherent constraints of platform devoted to action recognition, like SUP, with a service oriented middleware approach to deal with dynamic evolutions of the system infrastructure. Recent results, using a Service Lightweight Component Architecture (SLCA) to compose services for device and Aspects of Assembly (AA) to adapt them in a reactive way , present interesting prospects to deal with multi-devices and variable systems. They provide a user-friendly separated description for adaptations that shall be applied and composed at runtime as soon as the corresponding required devices are present (in context-sensitive security middleware layer for example ). They also underline performances and response times that allow reactive adaptation on appearance and disappearance of devices. However, although composition between these adaptations can verify proved properties, the use of blackbox components in the composition model of SLCA doesn't allow extracting a model of their behavior. Thus, existing approaches don't really succeed to ensure that the usage contract of these components is not violated during application adaptation. Only a formal analysis of the component behavior models associated with a well sound modeling of composition operation will allow us to secure the respect of the usage contracts.
In this axis, we propose to rely on a synchronous modeling of component behavior and component assembly to allow the usage of model checking techniques to formally validate services composition.
We began to consider this topic in 2008 through a collaborative action (SynComp) between Rainbow team at University of Nice Sophia Antipolis and INRIA Pulsar team. Within the Rainbow team, SLCA/AA experimental platform called WComp is dedicated to the reactive adaptation of applications in the domain of ubiquitous computing. During this collaboration, the management of concurrent access in WComp has been studied as a main source of disturbance for the invariant properties. A modeling of the behavior of components and of their accesses in a synchronous model has been defined in WComp.
This approach allows us to benefit from model checking techniques to ensure that there are no unpredictable states of WComp components on concurrent access. This year, during his training, Vivien Fighiera (already involved in the SynComp action), has completed the theoretical work done in SynComp. He studied how to prove safety properties regarding WComp component models relying on the NuSMV model checker. This year, the collaboration between Rainbow and Pulsar has been strengthened since Jean-Yves Tigli is a full time researcher at Pulsar team since September, in sabbatical year sponsored by INRIA. Now, we plan to modelize the overall assembly of WComp components with a synchronous approach to allow the usage of model checking techniques to formally validate application design. In order to obtain results based on experimental scenarios to evaluate SynComp improvements for adaptive recognition process, we plan to integrate SUP platform as a software services provider.
The SUP platform gathers a set of modules devoted to design applications in the domain of activity recognition. WComp is aimed at assembling services which evolve in a dynamic and heterogeneous environment. Indeed, the services provided by SUP can be seen as complex high-level services whose functionalities depend on the SUP treatments; this latter dealing with the dynamic change of the environment. Thus, considering SUP services as web services for devices for example, the devices associated with SUP services will be discovered dynamically by WComp and used with other heterogeneous devices.
This research axis concerns the theoretical study of a synchronous language lewith modular compilation and the development of a toolkit around the language to design, simulate, verify and generate code for programs.
lelanguage agrees with Model Driven Software Developement philosophy which is now well known as a way to manage complexity, to achieve high re-use level, and to significantly reduce the development effort. Therefore, we benefit from a formal framework well suited to compilation and formal validation. In practice, we defined two semantics for le: a behavioral semanticsto define a program by the set of its behaviors, avoiding ambiguities in program interpretations; an equational semanticsto allow modularcompilation of programs into software and hardware targets (C code, VHDL code, FPGA synthesis, observers...). Our approach fulfills two main requirements of critical realistic applications: modular compilation to deal with large systems and model-based approach to perform formal validation.
The main originality of this work is to be able to manage both modularity and causality. Indeed, only few approaches consider a modular compilation because there is a deep incompatibility between causalityand modularity. Causality means that for each event generated in a reaction, there is a causal chain of events leading to this generation. No causal loop may occur. Program causality is a well-known problem with synchronous languages, and therefore, it needs to be checked carefully. Thus, relying on semantics to compile a language ensures a modular approach but requires to complete the compilation process with a global causality checking. To tackle this problem, we introduced a new way to check causality from already checked sub programs and the modular approach we infer.
This year we focused on the algorithms to check causality. To compile leprograms we rely on an equational semanticsthat translates each program into an equation system. A program is causal (i.e it has no causality cycle) if its associated equation system has no dependency cycle between its variables. Thus, to check causality amounts to find an evaluation order for equation systems. Usually, several total orders are valid when sorting an equation system, and choosing one particular prevents modularity. Indeed, starting with two equation systems totally ordered could result in a false cyclic equation system when merging them. To ensure modularity, we define a sorting algorithm that computes all the valid partial order of an equation system. We complete our approach in also defining an algorithm to merge two previously sorted equation systems. These algorithms are the corner stone of our approach. This year we define the merge algorithm and we prove (1) that the sorting algorithm we have is the greatest fix point of a function defined on the dependency graph of equation systems; (2) that the merge algorithm is correct (i.e we get the same partial orders using the merge algorithm on two previously sorted equation systems or when sorting the union of the two equation systems considered).
On another hand, this year we have extended the language to support data handling. According to leprinciple, only signals carry data. Data belong to predefined types (usual programming language types) or to some external types. We improve the syntax to allow data definition for signals. Then we also extend the internal exchange format ( lec) we defined to support modularity, to take values into account. We also improve the compiler of the language to deal with signal data and finally, we improve the code generation. Now, the integration of data must be achieved in the simulator and in the code we generate to feed the NuSMV model checker.
The Pulsar team has strong collaborations with industrial partners through European projects and national grants. In particular with STMicroelectronics, Bull, Thales, Sagem, Alcatel, Bertin, AKKA, Metro of Turin (GTT), Metro of Paris (RATP) and Keeneo.
Pulsar team has been involved this year in two European projects: a project on multimedia information processing (VICoMo) and a project on machine learning and activity monitoring (COFRIEND).
ViCoMo is a ITEA 2 European Project which has started on the 1st of October 2009 and will last 36 months. This project concerns advanced video-interpretation algorithms on video data that are typically acquired with multiple cameras. ViCoMo is focusing on the construction of realistic context models to improve the decision making of complex vision systems and to produce a faithful and meaningful behavior. The context of an event (a crime, group activity, or computer-aided diagnosis) can be found with multiple sensors, like a multi-camera set-up in a surveillance system. The general goal of the ViCoMo project is thus to find the context of events that were captured by the cameras or image sensors, and model the context such that reliable reasoning about an event can be established. Hence, modeling of events and the 3D surroundings helps to recognize the behavior of persons, objects and events in a 3D view.
The project is executed by a strong international consortium, including large high-tech companies (e.g. Philips, Acciona, Thales), smaller innovative SMEs (CycloMedia, VDG Security) and is complemented with relevant research groups and departments from well-known universities (TU Eindhoven, University of Catalonia, Free University of Brussels) and research institutes (INRIA, CEA List, Multitel). Participating countries are France, Spain, Finland, Turkey, Belgium and Netherlands.
COFRIENDis a European project in collaboration with Akka, University of Hamburg (Cognitive Systems Laboratory), University of Leeds, University of Reading (Computational Vision Group), Toulouse-Blagnac Airport. It has begun at the beginning of February 2008 and will last 3 years. The main objectives of this project is to develop techniques to recognise and learn automatically all servicing operations around aircraft parked on aprons.
Pulsar is involved in an academic collaboration with MICA and University of Hanoi in Vietnam (a joint PhD has been defended at the beginning of 2009).
Pulsar has been cooperating with the Multimedia Research Center in Hanoi MICA on semantics extraction from multimedia data. Currently we continue through a joint supervision by A. Boucher and M. Thonnat of Thi Lan Le PhD on video retrieval (funded by an AUF grant). Thi Lan Le defended her PhD in February 2009 on the topic: Semantic-based Approach for Image Indexing and Retrieval , .
Pulsar is collaborating with Guillermo Sapiro's team, University of Minnesota, about the design of new shape distances and shape statistics. Guillaume Charpiat visited Sapiro's team during 3 weeks this July in Minneapolis.
Pulsar Team has six national grants: the two first ones concern the implication of the team in a “pôle de compétitivité” and an ANR project on videosurveillance. Two projects concern long term people monitoring at home. We continue both our collaboration with INRA and our collaboration with STmicroelectronics. We also continue our collaboration with Ecole des Mines de Paris.
Pulsar is strongly involved in SYSTEM@TIC “pôle de compétivité” and in particular in the SICproject Sécurité des Infrastructures Critiques which is a strategic initiative in perimeter security. More precisely the SICproject is funded for 42 months with the industrial partners including Thales, EADS, BULL, SAGEM, Bertin, Trusted Logic.
Pulsar is participating to an ANR research project on intelligent video surveillance and people biometrics. The project lasts 3 years and will be over on February 2011. The involved partners are: Thales-TSS, EURECOM, TELECOM and Management SudParis, UIC, Metro of Paris (RATP), DGA, STSI.
Pulsar cooperates with Vista (INRIA Rennes - Bretagne Atlantique), INRA Avignon UR407 Pathologie Végétale and CREAT Research Center (Chambre d'Agriculture des Alpes Maritimes) in an ARC project (BioSERRE) for early detection of crop pests, based on video analysis and interpretation.
Pulsar is involved in an Exploratory Action called MONITORE for the real-time monitoring of imaging diagnostics to detect thermal events in tore plasma. This work is a preparation for the design of the future ITER nuclear reactor and is done in partnership with the Plasma Facing Component group of the IRFM/CEA at Cadarache. This action is supported by EFDA (European Fusion Development Agreement) through a research fellowship attributed to V. Martin. Pulsar, through this action, is also a membership of the FR-FCM (french federation for research on controlled magnetic fusion).
Pulsar has a collaboration with CSTB (Centre Scientifique et Technique du Bâtiment) and the Nice City Hospital (Groupe de Recherche sur la Tophicité et le Viellissement) in the CIU Santé project, funded by DGCIS. CIU Santé project is devoted to experiment and develop techniques that allow long-term monitoring of elderly people at home. In this project an experimental room has been set up in Nice hospital and is relying on the research of the Pulsar team concerning event recognition.
We have started the SWEET-HOME project. It is an ANR TECSAN French project from Nov 1st 2009 to 2012 (3 years) on long-term monitoring of elderly people at Hospital with Nice City Hospital, Actis Ingenierie, MICA Center (CNRS unit - UMI 2954) in Hanoi, Vietnam, SMILE Lab at National Cheng Kung University, Taiwan and National Cheng Kung University Hospital. INRIA Grant is 240 Keuros out of 689 Keuros for the whole project. SWEET-HOME project aims at building an innovative framework for modeling activities of daily living (ADLs) at home. These activities can help assessing the evolution of elderly disease (e.g. Alzheimer, depression, apathy) or detecting pre-cursors such as unbalanced walking, speed, walked distance, psychomotor slowness, frequent sighing and frowning, social withdrawal with a result of increasing indoor hours. The SWEET-HOME project focuses on two aspects related to Alzheimer disease: (1) to assess the initiative ability of patient and whether the patient is involved in goal directed behaviors (2) to assess walking disorders and potential risk of falls. In this focus, the goal is to collect and combine multi-sensor (audio-video) information to detect activities and assess behavioral trends to provide user services at different levels. In this project experimental rooms are used in Nice-Cimiez Hospital for monitoring Alzheimer patients.
This year Pulsar has completed a cooperation with STmicroelectronics. A PhD thesis on the design of intelligent cameras including gesture recognition algorithms has been successfully defended on the 28th of October 2009 (Mohammed Bécha Kaâniche).
A cooperation took place with IFP (French Petrol Institute) and Ecole des Mines de Paris in the framework of a joint supervision by M Thonnat and M. Perrin of Philippe Verney PhD at IFP. This thesis has been defended in September 2009. The title is: Interprétation géologique de données sismiques par une méthode supervisée basée sur la vision cognitive.
Another collaboration with Ecole des Mines de Paris started in October 2006 through a joint supervision by A. Ressouche and V. Roy of Lionel Daniel PhD at CMA. The topic is : “Principled paraconsistent probabilistic reasoning - applied to scenarios recognition and voting theory”.
Keeneo (
http://
Monique Thonnat is member of the editorial board of the journal Image and Vision Computing (IVC), and is co-editor of a special issue in the journal CVIU Computer Vision and Image Understanding.
Monique Thonnat is a reviewer for the journals CVIU Computer Vision and Image Understanding, and MVA (Machine Vision and Applications).
Monique Thonnat is a Program Committee member for the following conferences: CVPR09 (22th IEEE International conference on computer vision and pattern recognition), ICCV09(12th IEEE International conference on computer vision), TAIMA09( 6ièmes atelier sur le traitement et analyse de l'information: methodes et applications), CVPR10 (23rd IEEE International conference on computer vision and pattern recognition),
Monique Thonnat is an expert to review a Cognitive System project IST eTRIMMS for the European Commission, and for the programme CIBLE 2009 of Region Rhones-Alpes.
Monique Thonnat is reviewer for the following theses: Robert Lundh (PhD, Univ Orebro, Sweden), Wolfgang Ponweiser (PhD, TU Wien, Austria), Regis Clouard (HDR, Univ. Caen), Sebastien Derivaux (Univ, Strasbourg).
Monique Thonnat had an invited talk at the Technical University of Vienna, Austria on the 23rd of November on Semantic Activity Recogniton.
M. Thonnat had an invited talk at CEA in Cadarache on the 4th of April 2009 on Activity recogntion based on vision systems
Monique Thonnat had an invited talk at the INTECH seminar "Santé, Maintien à Domicile" in Grenoble on April the 28th.
Monique Thonnat is member of the scientific board of ENPC, Ecole Nationale des Ponts et Chaussées since june 2008
Monique Thonnat was member of the scientific board of INRIA Sophia Antipolis (bureau du comité des projets) since September 2005 until September 2009.
Monique Thonnat was president of the hiring committee for junior scientists CR1 and CR2 for INRIA Sophia-Méditerranée
Monique Thonnat is deputy scientific director in charge of the domain Perception, Cognition and Interaction at INRIA since September 2009
Monique Thonnat and F. Brémond are co-founders and scientific advisors of Keeneo, the videosurveillance start-up created to exploit Pulsar research results on the VSIP/SUP software.
François Brémond is reviewer for the journals: IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Intelligent Systems, Pattern Recognition Letters, IEEE Transactions on Multimedia, Computer Vision and Image Understanding, Artificial Intelligence Journal, Transactions on Systems, Man, and Cybernetics, Sensors journal, Machine Vision and Applications Journal.
François Brémond is Program Committee member of ACM Multimedia 2009 Workshop Multimedia in Forensics, International Conference on Imaging for Crime Detection and Prevention (ICDP09), Tracking Humans for the Evaluation of their Motion in Image Sequences (THEMIS 2009), Pattern Recognition and Artificial Intelligence for Human Behaviour Analysis (PRAI4HBA), International Conference on Computer Vision Systems (ICVS 2009), Visual Surveillance (VS2009).
François Brémond is a reviewer for the conferences and workshops: International Conference on Computer Vision (ICCV09), British Machine Vision Conference (BMVC09), Asian Conference on Computer Vision (ACCV 2009), Computer Vision and Pattern Recognition CVPR09.
François Brémond had an invited Talk at “Etats généraux personnes âgées” in October 2009 in Marseille.
François Brémond was a reviewer for the PhD defense of Fida EL BAF from La Rochelle University, Pau Baiget from Computer Vision Center at the Autonomous University of Barcelona, Baptiste Hemery from ENSICAEN - GREYC.
François Brémond is an ANR reviewer for the 2009 edition of Project Call: "Concepts, Systèmes et Outils pour la Sécurité Globale".
François Brémond is an Expert for EC INFSO in the framework of Ambient Assisted Living FP7.
François Brémond was an area chair at IEEE Advanced Video and Signal Based Surveillance (AVSS2009) conference, Genova, Italy 2-4 September 2009
François Brémond is organizer of the “Journé SOOS de la SFO : Concepts Avancés de Réseaux de Cameras Optroniques pour la Surveillance Automatisée”, 15 October 2009 Paris.
François Brémond is a member of the 2nd European Network for the Advancement of Artificial Cognitive Systems, Interaction and Robotics (EUCogII network).
Sabine Moisan is a member of the Scientific Council of INRA for Applied Computing and Mathematics (MIA Department).
Jean-Paul Rigault is a member of AITO, the steering committee for several international conferences including in particular ECOOP. He is also a member of the Administration Board of the Polytechnic Institute of Nice University.
Annie Ressouche is a member of the Inria Cooperation Locales de Recherches (Colors) committee.
Guillaume Charpiat is organizing a working group about shapes between several teams in INRIA Sophia-Antipolis.
Guillaume Charpiat is a reviewer for the journals: the International Journal of Computer Vision (IJCV), Transactions on Pattern Analysis and Machine Intelligence (TPAMI), the Journal of Mathematical Imaging and Vision (JMIV), Computer Vision and Image Understanding (CVIU) and Medical Image Analysis.
Guillaume Charpiat's work on image colorization is the subject of a popular science article published by Interstices.
Vincent Martin is a reviewer for the journals: Transactions on Instrumentation & Measurement (TIM), Computers & Electronics in Agriculture (COMPAG).
Jean-Yves Tigli is ACM member and Program chair of ACM International Mobility Conference 2009.
Jean-Yves Tigli is member of the expert committee of the Sectoral consulting group (GCS3) of the Ministry for Higher Education and Research, DGRI A3, on "Ambient Intelligence" since 2009.
Jean-Yves Tigli had an invited talk on "Middleware for Ubiquitous Computing" at DAAD (German Academic Exchange Service) CTDS' 09, Summer School on Current Trends in Distributed Systems, 24 - 26 September, Gammarth, Tunisia.
Pulsar is a hosting team for the Master of Computer Science of UNSA.
Teaching at Master EURECOM on Video Understanding (3 h F. Bremond).
Teaching at Master of Computer Science at Polytechnic School of Nice Sophia Antipolis University, course on Synchronous Languages and Verification (24 h A. Ressouche).
Teaching in the “Module Général d'Inégration” at Ecole des Mines de Paris, course on SCADE tool and verification (6 h A. Ressouche).
Jean-Paul Rigault is full professor at the Polytechnic School of Nice Sophia Antipolis University (Computer Science Department).
Jean-Yves Tigli is Assistant Professor at Polytechnic Institute of Nice University.
Sabine Moisan and Jean-Paul Rigault obtained the best paper award of the Educator's Symposium at MODELS'09 .
Slawek Bak : People detection in temporal video sequences by defining a generic visual signature of individuals, Nice Sophia-Antipolis University.
Duc Phu Chau: Object Tracking for Activity Recognition, Nice Sophia-Antipolis University.
Guido-Tomas Pusiol : Learning Techniques for Video Understanding, Nice Sophia-Antipolis University.
Rim Romdham : Event Recognition in Video Scenes with Uncertain Knowledge, Nice Sophia-Antipolis University.
Anh Tuan Nghiem : Learning Techniques for the Configuration of the Scene Understanding Process, Nice Sophia-Antipolis University.
Nadia Zouba : Multi Sensor Analysis for Homecare Monitoring, Nice Sophia-Antipolis University.
Lan Le Thi : Semantic-based Approach for Image Indexing and Retrieval, Nice Sophia-Antipolis University and Hanoi University (Vietnam).
Mohammed-Bécha Kaâniche : Human Gesture Recognition from video sequences, Nice Sophia-Antipolis University.
Philippe Verney: Interprétation géologique de données sismiques par une méthode supervisée basée sur la vision cognitive, IFP (French Petrol Institute).