Magrite is a research INRIA team issued from the INRIA ISA project.
Augmented reality (AR) is a field of computer research which deals with the combination of real world and computer generated data in order to provide the user with a better understanding of his surrounding environment. Usually this refers to a system in which computer graphics are overlaid onto a live video picture or projected onto a transparent screen as in a head-up display.
Though there exist a few commercial examples demonstrating the effectiveness of the AR concept for certain applications, the state of the art in AR today is comparable to the early years of Virtual Reality. Many research ideas have been demonstrated but few have matured beyond lab-based prototypes.
Computer vision plays an important role in AR applications. Indeed, the seamless integration of computer generated objects at the right place according to the motion of the user needs automatic real-time detection and tracking. In addition, 3D reconstruction of the scene is needed to solve occlusions and light inter-reflexion between objects and to make easier the interactions of the user with the augmented scene. Since fifteen years, much work has been successfully devoted to the problem of structure and motion, but these works are often formulated as off-line algorithms and require batch processing of several images acquired in a sequence. The challenge is now to design robust solutions to these problems with the aim to let the user free of his motion during AR applications and to widen the range of AR application to large and/or unstructured environments. More specifically, the Magrite team aims at addressing the following problems:
On-line pose computation for structured and non structured environments: this problem is the cornerstone of AR systems and must be achieved in real time with a good accuracy.
Long term management of AR applications: a key problem of numerous algorithms is the gradual drifting of the localization over time. One of our aims is to develop methods that improve the accuracy and the repeatability of the pose during arbitrarily long periods of motion.
3D modeling for AR applications: this problem is fundamental to manage light interactions between real and virtual objects, to solve occlusions and to obtain realistic fused images.
Software applicationsare developed in many domains, notably e-commerce and medical imaging.
The aim of the Magrite project is to develop vision based methods which allow significant progress of AR technologies in terms of ease of implementation, usability, reliability and robustness in order to widen the current application field of AR and to improve the freedom of the user during applications. Our main research directions concern two crucial issues, camera tracking and scene modeling. Methods are developed with a view to meet the expected robustness and to provide the user with a good perception of the augmented scene.
One of the most basic problems currently limiting Augmented Reality applications is the registration problem. The objects in the real and virtual worlds must be properly aligned with respect to each other, or the illusion that the two worlds coexist will be compromised.
As a large number of potential AR applications are interactive, real time pose computation is required. Although the registration problem has received a lot of attention in the computer vision community, the problem of real-time registration is still far from being a solved problem, especially for unstructured environments. Ideally, an AR system should work in all environments, without the need to prepare the scene ahead of time, and the user should walk anywhere he pleases.
For several years, the ISA project has aimed at developing on-line and markerless methods for camera pose computation. Within the European Project ARIS, we have proposed a real-time system for camera tracking designed for indoor scenes. The main difficulty with online tracking is to ensure robustness of the process. Indeed, for off-line processes, robustness is achieved by using spatial and temporal coherence of the considered sequence through move-matching techniques. To get robustness for open-loop systems, we have developed a method which combines the advantage of move-matching methods and model-based methods by using a piecewise-planar model of the environment. This methodology can then be used in a wide variety of environments: indoor scenes, urban scenes ... We are also concerned with the development of methods for camera stabilization. Indeed, statistical fluctuations in the viewpoint computations lead to unpleasant jittering or sliding effects, especially when the camera motion is small. We have proved that the use of model selection allows us to noticeably improve the visual impression and to reduce drift over time.
An important way to improve the reliability and the robustness of pose algorithms is to combine the camera with another form of sensor in order to compensate for the shortcomings of each technology. Indeed, each technology approach has limitations: on the one hand, rapid head motion cause image features to undergo large motion between frame that can cause visual tracking to fail. On the other hand, inertial sensors response is largely independent from the user's motion but their accuracy is bad and their response is sensible to metallic objects in the scene. We recently proposed a system that makes an inertial sensor (MT9- Xsens) cooperate with the camera based system in order to improve the robustness of the AR system to abrupt motions of the users, especially head motion. This work contributes to reduce the constraints on the users and the need to carefully control the environment during an AR application. his research area will be continued in the near future within the ASPI project in order to build a dynamic articulatory model from various image modalities and sensors data.
It must be noted that the registration problem must be addressed from the rather specific point of view of augmented reality: the success and the acceptation of an AR applications does not only depend on the accuracy of the pose computation but also on the visual impression of the augmented scene. The search for the best compromise between accuracy and perception is therefore an important issue in this project. This research topic is currently addressed in our project both in classical AR and in medical imaging in order to choose the camera model, including intrinsic parameters, which describes at best the considered camera , .
Finally, camera tracking largely depends on the quality of the matching stage which allows to detect and to match features over the sequence. Ongoing research in our team are conducted on the problem of establishing robust correspondences of features over time. The use of a contrariodecision is currently under study to achieve this aim.
Modeling the scene is a fundamental issue in AR for many reasons. First, pose computation algorithms often use a model of the scene or at least some 3D knowledge on the scene. Second, effective AR systems require a model of the scene to support occlusion and to compute light reflexions between the real and the virtual objects. Unlike pose computation which has to be computed in a sequential way, scene modeling can be considered as an off-line or an on-line problem according to the application.
Currently, we are mainly concerned with interactive scene modeling from various image modalities. This activity concerns our medical activities as well as the ASPI project where a complete dynamic articulatory model of a speaker must be designed from various image modalities (ultrasound, MRI, video and magnetic sensors).
For the last 10 years, we have been working in close collaboration with the neuroradiology laboratory (CHU-University Hospital of Nancy) and GE Healthcare. As several imaging modalities are now available in a per-operative context (2D and 3D angiography, MRI, ...), our aim is to develop a multi-modality framework to help therapeutic decision.
In , we proposed an efficient solution to the registration of 2D/3D angiographic images and 3DXA/MRI images. Since then, we have mainly been interested in the effective use of a multimodality framework in the treatment of arteriovenous malformations (AVM). The treatment of AVM is classically a two-stage process: embolization or endovascular treatment is first performed. This step is then followed by a stereotactic irradiation of the remnant. Hence an accurate definition of the target is a parameter of great importance for the treatment. Our short term aim is to perform an accurate detection of the AVM shape within a multimodality framework. Our long term aim is to develop multimodality and augmented reality tools which make cooperate various image modalities (2D and 3D angiography, fluoroscopic images, MRI, ...) in order to help and to guide physicians in clinical routine. From a practical point of view, we are involved in two research areas. First, S. Gorges began his PhD this year with the aim to define augmented realty tools for neuronavigation that make a real-time imagery (fluoroscopy) cooperate with a pre-operative imagery (3D angiography). On the other hand, we are currently involved in a urology project. Using contrast-enhanced CT scanners of the same patient acquired at different times, our aim is to build the vascular systems of the kidney. It will be used by the physician for planning laparoscopic surgery.
Besides interactive modeling, research on on-line reconstruction are conducted in our team. Sequential reconstruction of the scene structure needed by pose or occlusion algorithms is highly desirable for numerous AR applications for which instrumentation is not conceivable. Hence, structure and pose must be sequentially estimated over time. We are currently studying this problem for multi-planar scenes.
We have a significant experience in the AR field especially through the European project ARIS (2001–2004) which aimed at developing effective and realistic AR systems for e-commerce and especially for interior design. Beyond this restrictive application field, this project allowed us to develop nearly real time camera tracking methods for multi-planar environments. We currently continue and amplify our research on multi-planar environments in order to obtain effective and robust AR systems in such environments.
For ten years, we have been working in close collaboration with the University hospital and GE Healthcare in interventional neuroradiology with the aim to develop tools allowing the physicians to take advantage of the various existing imaging modalities of the brain in their clinical practice. As several imaging modalities that bring complementary informations on the various brain pathologies are now available in a pre-operative context (subtracted angiography 2D and 3D, fluoroscopy, MRI,...) our aim is to develop a multi-modality framework to help therapeutic decisions. In addition, we now investigate the use of AR tools for neuronavigation. The PhD thesis of Sebastien Gorges started in February 2004 in collaboration with GE Healthcare. Its aim is to design tools for neuronavigation that take advantage of a real-time imagery (fluoroscopy) and a pre-operative imagery (3D angiography) , .
We also have ongoing and promising research contacts with the urology department of Nancy the long term aim of which is to extract and to overlay preoperative information extracted from contrast-enhanced CT scanners onto the endoscopic view in the operating theater. Preliminary results are described in .
We are involved in the FET-STREP european ASPI project which started on November 2005. There is strong evidence that visual information of the speaker, especially jaw and lips, noticeably improves the speech intelligibility. Hence, having a realistic talking head could help language learning technology in giving the student a feedback on how to change articulation in order to achieve a correct pronunciation. This task is complex and necessitates a multidisciplinary effort involving speech production modeling and image analysis. The long term aim of the APSI project is the design of a 3D articulatory model to be used for the realistic animation of a talking head. Within this project, we will especially work on the tracking of the visible articulators using stereo-vision techniques and we intend to supplement the model with internal articulator (tongue, larynx) obtained from medical imaging (ultrasound images for tongue tracking and MRI for global model).
Our software efforts are integrated in a library called RAlib which contains our research development on image processing, registration (2D and 3D) and visualization. This library is licensed by the APP (french agency for software protection).
Our collaboration with GE Healthcare has given rise to several patent disclosures on specific calibration process, registration and localization. The list of patents is given in the bibliography of the team , .
In order to guide tools during the procedure, the interventional radiologist uses a vascular C-arm to acquire 2D fluoroscopy images in real time. Today, 3D X-ray images (3DXA) are also available on modern vascular C-arms. A large consensus is now met in that one important next step should be to leverage the high-resolution volumetric information provided by 3DXA to complement 2D fluoroscopy images and make the tool guidance easier. Making this step requires registering 3DXA volumes with fluoroscopy images, that is estimating the acquisition geometry assumed by the C-arm when acquiring the fluoroscopy image.
During this year, various models of a vascular C-arm have been studied in order to generate 3D augmented fluroscopic image in an interventional radiology context . A methodology based on the use of a multi-image calibration has been proposed to assess the physical behavior of the C-arm. By the knowledge of the main characteristics of the C-arm, realistic models of the acquisition geometry have been proposed. Experiments showed that any projection matrix can be predicted with a mean 2D reprojection error inferior to 0.5 mm. Application to 3D augmented fluroscopy with a phantom and clinical data was successfully evaluated.
A clear identification of arteries, veins and urinary tracts, together with their simultaneous visualization in 3D is required when planning laparoscopic surgery on kidneys. To this aim, multiple contrast-enhanced CT scans are acquired at different times during blood circulation, but the kidney motion, among other causes, seriously hampers their direct fusion. This study aims at developing algorithms to represent in 3D, from such images, the kidney surroundings as close as possible to the surgical reality . A full image processing protocol was proposed that combines non rigid 3D image registration and image segmentation.
The success and the acceptation of an AR applications does not only depend on the accuracy of the pose computation but also on the visual impression of the augmented scene. Though the registration error may be small for each view, unpleasant jittering effects may appear. These problems often originate in the fact that the pose is computed from theoretical camera models which are not suited to the considered camera.
In , we have confronted some theoretical camera models to reality and evaluated the suitability of these models for effective augmented reality (AR). We especially analyzed what level of accuracy can be expected in real situations using a particular camera model and how robust the results are against realistic calibration errors. An experimental protocol was used that consisted of taking images of a particular scene for various cameras mounted on a 4DOF micro-controlled device. This protocol enabled us to consider assessment criteria specific to AR as alignment error and visual impression, in addition to the classical camera positioning error.
A 3D acquisition infrastructure has been developed for building a talking head and studying some aspects of visual speech . Our short-term aim is to study coarticulation for the French language and to develop a model which respects a real talker articulation. One key factor is to be able to acquire a large amount of 3D data with a low-cost system more flexible than existing motion capture systems (using infrared cameras and glued markers). Our system only uses two standard cameras, a PC and painted markers that do not change speech articulation and provides a sufficiently fast acquisition rate to enable an efficient temporal tracking of 3D points. The obtained data have been used for studying strategies of labial coarticulation .
Shape recognition is the field of computer vision which addresses the problem of finding out whether a query shape lies or not in a shape database, up to a certain invariance. Most shape recognition methods simply sort shapes from the database along some (dis-)similarity measure to the query shape. Their Achilles' heel is the decision stage, which should aim at giving a clear-cut answer to the question: ``do these two shapes look alike?'' In , the proposed solution consists in bounding the number of false correspondences of the query shape among the database shapes, ensuring that the obtained matches are not likely to occur ``by chance''. As an application, one can decide with a parameterless method whether any two digital images share some shapes or not. In a paper submitted to a conference, we propose to apply the above a contrariomethodology to shapes which are described by size functions, in order to design a perceptual matching algorithm.
A further step consists in grouping matching shapes that share the same respective positions in two corresponding images. In , we intend to form spatially coherent groups of shapes. Each pair of matching shape elements indeed leads to a unique transformation (similarity or affine map). A unified a contrariodetection method is proposed to solve three classical problems in clustering analysis. The first one is to evaluate the validity of a cluster candidate. The second problem is that meaningful clusters can contain or be contained in other meaningful clusters. A rule is needed to define locally optimal clusters by inclusion. The third problem is the definition of a correct merging rule between meaningful clusters, permitting to decide whether they should stay separate or unit. As an application, the present theory on the choice of the right clusters is used to group shapes by detecting clusters in the transformation space.
The partnership with GE Healthcare (formerly GE Medical Systems) started in 1995. In the past few years, it bore on the supervision of CIFRE PhD fellows on the topic of using a multi-modal framework in interventional neuroradiology. A new PhD started in January 2004 on the design of augmented reality tools for neuronavigation.
This work is developed in close collaboration with Nancy Hospital. The aim of the CPRC (Contrat de Recherche Clinique) is to develop a multi-modality framework to help therapeutic decisions for brain pathologies.
ASPI is about Audiovisual-to-articulatory inversion. Participants in this project are INRIA Lorraine, ENST (Paris), KTH (Stokholm), the University Research Institute of National Technical University of Athens and the University of Bruxelles. Audiovisual-to-articulatory inversion consists in recovering the vocal tract shape dynamics (from vocal folds to lips) from the acoustical speech signal, supplemented by image analysis of the speaker's face. Being able to recover this information automatically would be a major break-through in speech research and technology, as a vocal tract representation of a speech signal would be both beneficial from a theoretical point of view and practically useful in many speech processing applications (language learning, automatic speech processing, speech coding, speech therapy, film industry...). The Magrite team is involved in the the development of articulatory models from various image modalities (ultrasound, video, MRI) and electromagnetic sensors.
M.-O. Berger was a member of the program committee of RFIA'06, MICCAI 05, ISMAR 05.
E. Kerrien was a member of the program committee of MICCAI 05.
G. Simon was a member of the program committee of BMVC 05, ISMAR 05.
Several members of the group, in particular assistant professors and Ph.D. students, actively teach at Henri Poincaré Nancy 1, Nancy 2 universities and INPL.
Other members of the group also teach in the computer science Master of Nancy and in the ``Master en sciences de la vie et de la santé'' (SVS).
Frédéric Sur has been a member of the board of the "Banque PT" entrance examination in mathematics ("concours d'entrée aux Grandes Écoles").
Members of the group participated in the following events: International Symposium on Mixed and Augmented Reality (ISMAR'05, Vienna, Austria), International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 05, Palm Spring, USA), Computer Assisted Radiology and Surgery (CARS 2005, Berlin, Germany), Conference on Auditory-Visual Speech Processing (AVSP 2005, Vancouver, Canada).