SIROCCO - 2017 - Annual activity report

SIROCCO

SIROCCO - 2017

Project-Team Sirocco

Personnel

Overall Objectives

Research Program

Application Domains

Highlights of the Year

New Software and Platforms

New Results

Bilateral Contracts and Grants with Industry

Bilateral Contracts with Industry

Partnerships and Cooperations

Dissemination

Bibliography

Previous |

Home | Next next

Section: New Results

Representation and compression of large volumes of visual data

Sparse representations, data dimensionality reduction, compression, scalability, perceptual coding, rate-distortion theory

Cloud-based image and video compression

Participants : Jean Begaint, Christine Guillemot.

The emergence of cloud applications and web services has led to an increasing use of online resources for storing and exchanging images and videos. Billions of images are already stored in the cloud, and hundreds of millions are uploaded every day. Redundancy between images stored in the cloud can be leveraged to efficiently compress images by exploiting inter-images correlations. We have developed a region-based prediction scheme to exploit correlation between images in the cloud. In order to compensate the deformations between correlated images, the reference image of the cloud is first segmented into multiple regions determined from matched local features and aggregated super-pixels. We then estimate a photometric and geometric deformation model between the matched regions in the reference frame and frame to be coded. Multiple references are then generated, by applying the estimated deformation models to the reference frame, and organized in a pseudo-sequence to be differentially encoded with classic video coding tools. Experimental results demonstrate that the proposed approach yields significant rate-distortion performance improvements compared to current coding solutions such as HEVC.

Rate-distortion optimized tone curves for HDR video compression

Participants : David Gommelet, Christine Guillemot, Aline Roumy.

High Dynamic Range (HDR) images contain more intensity levels than traditional image formats. Instead of 8 or 10 bit integers, floating point values requiring much higher precision are used to represent the pixel data. These data thus need specific compression algorithms. The goal of the collaboration with Ericsson is to develop novel compression algorithms that allow compatibility with the existing Low Dynamic Range (LDR) broadcast architecture in terms of display, compression algorithm and datarate, while delivering full HDR data to the users equipped with HDR display. In 2016, a scalable video compression was developed offering a base layer that corresponds to the LDR data and an enhancement layer, which together with the base layer corresponds to the HDR data. In 2017 instead, we developed a backward compatible compression algorithm of HDR images, where only the LDR data are sent [14]. The novelty of the approach relies on the optimization of an invertible mapping called Tone Mapping Operator (TMO) that maps efficiently the HDR data to the LDR data. Two optimizations have been carried out in a rate-distortion sense: in the first problem, the distortion of the HDR data is minimized under the constraint of minimum LDR datarate, while in the second problem, a new constraint is added in the optimization problem to insure that LDR data are closed to some “aesthetic” a priori. Taking into account the aesthetic of the scene in video compression is indeed novel, since video compression is traditionally optimized to deliver the smallest distortion with the input data at the minimum datarate. Moreover, we provided new statistical models for estimating the distortions and the rate and showed their accuracy to the real data. Finally, a novel axis is currently carried out to efficiently exploit the temporal redundancy in HDR videos.

Sparse image representation and deep learning for compression

Participants : Thierry Dumas, Christine Guillemot, Aline Roumy.

Deep learning is a novel research area that attempts to extract high level abstractions from data by using a graph with multiple layers. One could therefore expect that deep learning might allow efficient image compression based on these high level features. However, there are many issues that make the learning task difficult in the context of image compression. First, learning a transform is equivalent to learning an autoencoder, which is of its essence unsupervised and therefore more difficult that classical supervised learning, where deep learning has shown tremendous results. Second, the learning has to be performed under a rate-distortion criterion, and not only a distortion criterion, as is classically done in machine learning. Last but not least, deep learning, as classical machine learning, consists in two phases: (i) build a graph that can make a good representation of the data (i.e. find an architecture usually made with neural nets), and (ii) learn the parameters of this architecture from large-scale data. As a consequence, neural nets are well suited for a specific task (text or image recognition) and require one training per task. The difficulty to apply machine learning approach to image compression is that it is important to deal with a large variety of patches, and with also various compression rates. Different architectures have been proposed to design a single neural network that can work efficiently at any coding rate either by a Winner Take all approach [28] or an adaptation to the quantization noise during the training [40].

Graph-based multi-view video representation

Participants : Christine Guillemot, Thomas Maugey, Mira Rizkallah, Xin Su.

One of the main open questions in multiview data processing is the design of representation methods for multiview data, where the challenge is to describe the scene content in a compact form that is robust to lossy data compression. Many approaches have been studied in the literature, such as the multiview and multiview plus depth formats, point clouds or mesh-based techniques. All these representations contain two types of data: i) the color or luminance information, which is classically described by 2D images; ii) the geometry information that describes the scene 3D characteristics, represented by 3D coordinates, depth maps or disparity vectors. Effective representation, coding and processing of multiview data partly rely on a proper representation of the geometry information. The multiview plus depth (MVD) format has become very popular in recent years for 3D data representation. However, this format induces very large volumes of data, hence the need for efficient compression schemes. On the other hand, lossy compression of depth information in general leads to annoying rendering artefacts especially along the contours of objects in the scene. Instead of lossy compression of depth maps, we consider the lossless transmission of a geometry representation that captures only the information needed for the required view reconstructions. Our goal is to transmit “just enough” geometry information for accurate representation of a given set of views, and hence better control the effect of geometry lossy compression.

In 2016, we have developed a graph-based representation for complex camera configurations. In particular, a generalized Graph-Based Representation has beend eveloped which handles two views with complex translations and rotations between them. The proposed approach uses the epipolar segments to have a row-wise description of the geometry that is as simple as for rectified views. In 2017, the Graph-based Representation has been extended to build a rate-distortion optimized description of the geometry of multi-view images [22]. This work brings two major novelties. First the graph can now handle multiple views (more than 2) thanks to a recursive construction of the geometry across the views. Second, the number of edges describing the geometry information is carefully chosen with respect to a rate-distortion criterion evaluated on the reconstructed views.

An adaptation of the graph-based representations (GBR) has been proposed to describe color and geometry information of light fields (LF) in [38]. Graph connections describing scene geometry capture inter-view dependencies. They are used as the support of a weighted Graph Fourier Transform (wGFT) to encode disoccluded pixels. The quality of the LF reconstructed from the graph is enhanced by adding extra color information to the representation for a sub-set of sub-aperture images. Experiments show that the proposed scheme yields rate-distortion gains compared with HEVC based compression (directly compressing the LF as a video sequence by HEVC).

Light fields compression using sparse reconstruction

Participants : Fatma Hawary, Christine Guillemot.

Light field data exhibits large amount of information, which poses challenging problems in terms of storage capacity, hence the need for efficient compression schemes. In collaboration with Technicolor (Dominique Thoreau and Guillaume Boisson), we have developed a scalable coding method for the light field data based on the sparsity of light fields in the angular (view) domain. A selected set of the light field sub-aperture images is encoded as a video sequence in a base layer and transmitted to the decoder. The remaining light field views are then reconstructed from the decoded subset of views, by exploiting the light field sparsity in the angular continuous Fourier domain. The reconstructed light field is enhanced using a patch-based restoration method which further exploits the light field angular redundancy.

Light fields dimensionality reduction and compression

Participants : Elian Dib, Christine Guillemot, Xiaoran Jiang, Mikael Le Pendu.

We have investigated low rank approximation methods exploiting data geometry for dimensionality reduction of light fields. We have developed an approximation method in which homographies and the rank approximation model are jointly optimized [32]. The homographies are searched in order to align linearly correlated sub-aperture images in such a way that the batch of views can be approximated by a low rank model. The light field views are aligned using either one global homography or multiple homographies depending on how much the disparity across views varies from one depth plane to the other. The rank constraint is expressed as a product of two matrices, where one matrix contains basis vectors and where the other one contains weighting coefficients. The basis vectors and weighting coefficients can be compressed separately exploiting their respective characteristics. The optimization hence proceeds by iteratively searching for the homographies and the factored model of the input set of sub-aperture images (views), which will minimize the approximation error.

A light field compression algorithm based on a low rank approximation exploiting scene and data geometry has then be developed [18]. The best pair of key parameters (approximation rank and quantization step size), in terms of rate-distortion performance, of the algorithm are predicted based on a model learned from a set of training light fields. The model is learned as a function of several input light field features: disparity indicators defined as a function of the decay rate of the SVD values of the original and registered view matrices, as well as texture indicators defined in terms of the decay rate of SVD values computed on the central view. The parameter prediction problem is cast as a multi-output classification problem solved using a Decision Tree ensemble method, namely the Random Forest method. The approximation method is currently being extended to local super-ray based low rank models.

Previous |

Home | Next next