Section: Overall Objectives

Perception, Recognition and Multimodal Interaction for Smart Spaces.

The objective of Project PRIMA is to develop the scientific and technological foundations for human environments that are capable of perceiving, acting, communicating, and interacting with people in order to provide services. The construction of such environments offers a rich set of problems related to interpretation of sensor information, learning, machine understanding, dynamic composition of components and man-machine interaction. Our goal is make progress on the theoretical foundations for perception and cognition, as well as to develop new forms of man machine interaction, by using interactive environments as a source of example problems.

An environment is a connected volume of space. An environment is said to be “perceptive” when it is capable of recognizing and describing things, people and activities within its volume. Simple forms of applications-specific perception may be constructed using a single sensor. However, to be general purpose and robust, perception must integrate information from multiple sensors and multiple modalities. Project PRIMA creates and develops machine perception techniques fusing computer vision, acoustic perception, range sensing and mechanical sensors to enable environments to perceive and understand humans and human activities.

An environment is said to be “active” when it is capable of changing its internal state. Common forms of state change include regulating ambient temperature, acoustic level and illumination. More innovative forms include context-aware presentation of information and communications, as well as services for cleaning, materials organisation and logistics. The use of multiple display surfaces coupled with location awareness offers the possibility of automatically adapting information display to fit the current activity of groups. The use of activity recognition and acoustic topic spotting offers the possibility to record a log of human to human interaction, as well as to provide relevant information without disruption. The use of steerable video projectors (with integrated visual sensing) offers the possibilities of using any surface for presentation, interaction and communication.

An environment may be considered as “interactive” when it is capable of interacting with humans using tightly coupled perception and action. Simple forms of interaction may be based on observing the manipulation of physical objects, or on visual sensing of fingers, hands or arms. Richer forms of interaction require perception and understanding of human activity and context. PRIMA has developed a novel theory for situation modeling for machine understanding of human activity, based on techniques used in Cognitive Psychology  [44] . PRIMA explores multiple forms of interaction, including projected interaction widgets, observation of manipulation of objects, fusion of acoustic and visual information, and systems that model interaction context in order to predict appropriate action and services by the environment.

For the design and integration of systems for perception of humans and their actions, PRIMA has developed:

  • A theoretical foundation for machine understanding of human activity using situation models.

  • Robust, view invariant methods for computer vision systems using local appearance.

  • A software architecture model for reactive control of multimodal perceptual systems.

The experiments in project PRIMA are oriented towards developing interactive services for smart environments. Application domains include health and activity monitoring services for assisted living, smart habitat for smart energy, context aware video recording for lectures, meetings and collaborative work, context aware services for commercial environments new forms of man-machine interaction based on perception and new forms of interactive services for education, research and entertainment. Creating interactive services requires scientific progress on a number of fundamental problems, including:

  • Situation models for observing and understanding human to human interaction.

  • Lifelong interactive learning.

  • Robust, view-invariant image description for embedded services based on computer vision.

  • New forms of multimodal human-computer interaction.

  • Component-based software architectures for multimodal perception and action.

  • Service-oriented software architectures for smart environments.