EN FR
EN FR


Section: New Results

Matching and localization

Participants : Marie-Odile Berger, Vincent Gaudilliere, Antoine Fond, Gilles Simon.

Vanishing point detection

Accurate detection of vanishing points (VPs) is a prerequisite for many computer vision problems such as camera self-calibration, single view structure recovery, video compass, robot navigation and augmented reality, among many others. More specifically, knowing three orthogonal VPs aligned with the buildings of a scene (the Manhattan directions) allows computing the intrinsic parameters of the camera as well as warped images where the buildings’ facades are orthorectified, facilitating their detection and registration. VPs are also used in our work on epipolar geometry estimation to help matching line segments, a particularly difficult task in low-textured environments.

We introduced an a-contrario method to solve this problem. Our key contribution was to show that, as soon as the horizon line (HL) is inside the image boundaries, this line can usually be detected as an alignment of oriented line segments. This comes from a simple geometric property, that any horizontal line segment at the height of the camera’s optical center projects to the HL regardless of its 3-D direction. This property generally yields statistically meaningful events, detectable from a-contrario analysis. Additional candidate HLs are sampled around these events using a Gaussian Mixture Model (GMM), and scored according to the strongest of the VPs hypothesized along them. VP hypotheses are also obtained from an a-contrario method, using integral geometry to accurately model the background noise. Experiments made on three urban datasets showed that our method, not only achieves state-of-the-art performance w.r.t. computation times and accuracy of the HL, but also yields much less spurious VPs than the previous top-ranked methods. This work was published at ECCV'2018 [23] and an article is in preparation for submission in a peer-reviewed journal. In this article, we show that our method also outperforms state-of-the-art methods on a new industrial dataset that we built and will make publicly available. We also establish a relation between the Number of False Alarms (NFA) obtained for the meaningful events and the spreads of the GMM. In addition, the Matlab code implementing our method has been made publicly available.

Urban AR

Urban localization plays a major role in many applications including navigation aid, labeling of local touristic landmarks, and robot localization. The outdoor accuracy of mobile phone GPS is only 12.5 meters and can be worse in urban areas where the street is flanked by buildings on both sides. By contrast, buildings’ facades are meaningful landmarks to rely on for large-scale localization. Last year, we proposed a method to automatically detect facades in an image, based on image cues that measure facade characteristics such as shape, color, contours, semantic structure and symmetry. Matching the detected facade with a facade database using a metric learned through a siamese neural network allowed us to estimate a first initialization of the registration parameters by solving the least-square problem that maps the four transformed corners of the reference to the four corners of the detection.

This year, we attempted to rely on semantic segmentation to improve the accuracy of that initial registration [11]. Simultaneously, we aimed to iteratively improve the quality of the semantic segmentation through registration. Registration and semantic segmentation were jointly solved in a Expectation-Maximization framework. We especially introduced a Bayesian model that uses prior semantic segmentation as well as geometric structure of the facade reference modeled by Generalized Gaussian Mixtures. We showed the advantages of our method in terms of robustness to clutter and change of illumination on urban images from various databases. We currently are assessing the relevance of the method using the large scale dataset SFM Aachen, in order to compare it with state-of-the-art SFM-based localization.

AR in industrial environments

Industrial environments are normally inundated with textureless objects, specular surfaces, repetitive objects and artificial lights, etc. which may fail traditional 2D/3D matching-based approaches. Line segments are numerous in industrial environments, but contrary to what happens in urban scenes, matching is a tough issue since most segments are silhouette contours whose appearance is viewpoint dependant. The combinatory of segment matches is thus very high, making impossible in practice the use of RANSAC algorithms for pose computation.

Within V. Gaudilliere's PhD thesis [21], [25], we took advantage of global properties of the environment, both geometric - such as the presence of numerous vertical planes - and contextual to guide matching. First, sub-image correspondences based on high level ConvNet features are used as prior for vertical planes detection and matching. Then, local homographies are detected between matched regions. To ensure efficient estimations, we have developed a dedicated RANSAC framework in which model hypotheses are first generated based on vanishing point and visual keypoint correspondences, and then validated on key points and line segments. This potential set of matched features are finally filtered with a robust fundamental matrix estimation. That scheme enables us to circumvent problems encountered in poorly-textured images (sparsity of visual keypoints and difficulties to match segments) while taking advantage of the abundance of segments and vanishing points characteristic of industrial environments