Vision is a key function to sense our world and perform complex tasks. It has high sensitivity and strong reliability, even though most of its input is noisy, changing, and ambiguous. A better understanding of how biological vision works opens up scientific challenges as well as promising technological, medical and societal breakthroughs. Fundamental aspects such as understanding how a visual scene is encoded by the retina into spike trains, transmitted to the visual cortex via the optic nerve through the thalamus, decoded in a fast and efficient way, and then creating a sense of perception, offers perspectives in research and technological developments for current and future generations.
Vision is not always functional though. Sometimes, "something" goes wrong. Although many visual impairments such as myopia, hypermetropia, cataract, can be cured by glasses, contact lenses, or other means like medicine or surgery, pathologies impairing the retina such as Age-Related Macular Degeneration (AMD) and Retinis Pigmentosa (RP) can't be fixed with these standard treatments 33. They result in a progressive degradation of vision (Figure 1), up to a stage of low vision (visual acuity of less than 6/18 to light perception, or a visual field of less than 10 degrees from the point of fixation) up to blindness. Thus, a person with low vision must learn to adjust to their pathology. Progress in research and technology can help them. Considering the aging of the population in developed countries and its strong correlation with the prevalence of eye diseases, low vision has already become a major societal problem.
In this context, the Biovision team's research revolves around the central theme biological vision and perception, and the impact of low vision conditions. Our strategy is based upon four cornerstones: To model, to assist diagnosis, to aid visual activities like reading, and to enable personalize content creation. We aim to develop fundamental research as well as technology transfer along three entangled axes of research:
These axes form a stable, three-pillared basis for our research activities, giving our team an original combination in expertise: neuroscience modelling, computer vision, Virtual and Augmented Reality (XR), and media analysis and creation. Our research themes require strong interactions with experimental neuroscientists, modellers, ophtalmologists and patients, constituting a large network of national and international collaborators. Biovision is therefore a strongly multi-disciplinary team. We publish in international reviews and conferences in several fields including neuroscience, low vision, mathematics, physics, computer vision, multimedia, computer graphics, and human-computer interactions.
In collaboration with neuroscience labs, we derive phenomenological equations and analyse them mathematically by adopting methods from theoretical physics or mathematics (Figure 2). We also develop simulation platforms like Pranas or Macular, helping us confront theoretical predictions to numerical simulations, or allowing researchers to perform in silico experimentation under conditions rarely accessible to experimentalists (such as simultaneously recording the retina layers and the primary visual cortex1 (V1)). Specifically, our research focuses on the modelling and mathematical study of:
In collaboration with low vision clinical centers and cognitive science labs, we develop computer science methods, open software and toolboxes to assist low vision patients, with a particular focus on Age-Related Macular Degeneration2. As AMD patients still have a plastic and functional vision in their peripheral visual field 38 they must develop efficient “Eccentric Viewing" (EV) to adapt to the central blind zone (scotoma) and to direct gaze away from the object they want to identify 43. Commonly proposed assistance tools involve visual rehabilitation methods 40 and visual aids that usually consist of magnifiers 34.
Our main research goals are:
We investigate the impact of visual media design on user experience and perception, and propose assisted creativity tools for creating personalized and adapted media content. We employ computer vision and deep learning techniques for media understanding in film and in complex documents like newspapers. We deploy this understanding in new media platforms such as virtual and augmented reality for applications in low-vision training, accessible media design, and generation of 3D visual stimuli:
Neuroscience research. Making in-silico experiments is a way to reduce the experimental costs, to test hypotheses and design models, and to test algorithms. Our goal is to develop a large-scale simulations platform of the normal and impaired retinas. This platefom, called Macular, allows to test hypotheses on the retina functions in normal vision (such as the role of amacrine cells in motion anticipation 50, or the expected effects of pharmacology on retina dynamics 11). It is also to mimic specific degeneracies or pharmacologically induced impairments 25, as well as to emulate electric stimulation by prostheses.
In addition, the platform provides a realistic entry to models or simulators of the thalamus or the visual cortex, in contrast to the entries usually considered in modeling studies.
The research themes in the Biovision team has direct social impacts on two fronts:
At the heart of Macular is an object called "Cell". Basically these "cells" are inspired by biological cells, but it's more general than that. It can also be a group of cells of the same type, a field generated by a large number of cells (for example a cortical column), or an electrode in a retinal prosthesis. A cell is defined by internal variables (evolving over time), internal parameters (adjusted by cursors), a dynamic evolution (described by a set of differential equations) and inputs. Inputs can come from an external visual scene or from other synaptically connected cells. Synapses are also Macular objects defined by specific variables, parameters, and equations. Cells of the same type are connected in layers according to a graph with a specific type of synapses (intra-layer connectivity). Cells of a different type can also be connected via synapses (inter-layer connectivity).
All the information concerning the types of cells, their inputs, their synapses and the organization of the layers are stored in a file of type .mac (for "macular") defining what we call a "scenario". Different types of scenarios are offered to the user, which they can load and play, while modifying the parameters and viewing the variables (see technical section).
Macular is built around a central idea: its use and its graphical interface can evolve according to the user's objectives. It can therefore be used in user-designed scenarios, such as simulation of retinal waves, simulation of retinal and cortical responses to prosthetic stimulation, study of pharmacological impact on retinal response, etc. The user can design their own scenarios using the Macular Template Engine (see technical section).
We present the GUsT-3D framework for designing Guided User Tasks in embodied VR experiences, i.e., tasks that require the user to carry out a series of interactions guided by the constraints of the 3D scene. GUsT-3D is implemented as a set of tools that support a 4-step workflow to :
(1) annotate entities in the scene with names, navigation, and interaction possibilities, (2) define user tasks with interactive and timing constraints, (3) manage scene changes, task progress, and user behavior logging in real-time, and (4) conduct post-scenario analysis through spatio-temporal queries on user logs, and visualizing scene entity relations through a scene graph.
Members of Biovision are marked with a
As part of ANR CREATTIVE3D, the Biovision team has established a techonological platform in the Kahn immersive space including:
We present here the new scientific results of the team over the course of the year. For each entry, members of Biovision are marked with a
1 Institut de la Vision, Sorbonne Université, Paris, France.
Description:
A long standing hypothesis is that retinal ganglion cells, the retinal output, do not signal the visual scene per se, but rather surprising events, eg. mismatches between observation and expectation formed by previous inputs.
A striking example of this is the Omitted Stimulus Response (OSR): when a regular sequence of flashes suddenly ends, the retina emits a large response signaling the missing stimulus, and the latency of this response shifts with the period of the flash sequence to respond to the omitted stimulus. However, the mechanism behind this predictive latency shift remains unclear.
Here we show that inhibition is necessary for this latency shift. Using a combination of modeling and experiments, we show that latency shift of the OSR in ganglion cells is obtained by amacrine cells with inhibitory synapses. Inhibition delays the response at the end of the sequence, and the depressing synapse shifts this delay as a function of the frequency of the flash sequence. High frequency sequence induce a strong depression of the inhibitory synapse, a weak inhibitory input and thus a small increase in the latency, while low frequency inputs induce a small depression of the inhibitory synapse, a strong inhibitory input and therefore a large increase in latency.
We build a circuit model that reproduces our experimental findings and generates new predictions, that we confirm by further experiments.
Since depressing inhibitory synapses are ubiquitous in sensory circuits, our results suggest they could be a key component to generate the predictive responses that have been observed in several brain areas.
This work has been presented in the conferences ICMNS 2022 – 2022 International Conference on Mathematical Neuroscience (on line) 27, NEUROMOD Meeting 2022 (Antibes) 32 AREADNE 2022 (Santorini, Greece) 31, Dendrites 2022 – Dentritic, anatomy, molecules and functions – Embo workshop, (Heraklion, Greece) 30 and will be soon submitted to the journal eLife.
1 Ecole Nationale Supérieure de Techniques Avancées, Institut Polytechnique de Paris, France
2 Institute of Neuroscience (ION), United Kingdom
3 Institute for Adaptive and Neural Computation, University of Edinburgh, United Kingdom
Description: Computing the Spike-Triggered Average (STA) is a simple method to estimate linear receptive field (RF) in sensory neurons. For random, uncorrelated stimuli the STA provides an unbiased RF estimate, but in practice, white noise at high resolution is not an optimal stimulus choice as it usually evokes only weak responses. Therefore, for a visual stimulus, images of randomly modulated blocks of pixels are often used. This solution naturally limits the resolution at which an RF can be measured. Here we present a simple super-resolution technique that can be overcome these limitations. We define a novel stimulus type, the shifted white noise (SWN), by introducing random spatial shifts in the usual stimulus in order to increase the resolution of the measurements. In simulated data we show that the average error using the SWN was 1.7 times smaller than when using the classical stimulus, with successful mapping of 2.3 times more neurons, covering a broader range of RF sizes. Moreover, successful RF mapping was achieved with brief recordings of light responses, lasting only about one minute of activity, which is more than 10 times more efficient than the classical white noise stimulus. In recordings from mouse retinal ganglion cells with large scale multi-electrode arrays, we successfully mapped 21 times more RF than when using the traditional white noise stimuli. In summary, randomly shifting the usual white noise stimulus significantly improves RF estimation, and requires only short recordings.
Figure 6 illustrates our method. This paper has been published in Journal of Neurophysiology14
Description: The retina is the entrance of the visual system. Although based on common biophysical principles the dynamics of retinal neurons is quite different from their cortical counterparts, raising interesting problems for modellers. In this work we have addressed mathematically stated questions in this spirit.
Figure 7 illustrates the retina model used to develop our results. For more details see our paper
published in J. Imaging, special issue "Mathematical Modeling of Human Vision and its Application to Image Processing", 202112.
1 EURECOM, Sophia Antipolis, France.
Description: We investigate the dynamics of stage II retinal waves via a dynamical system model, grounded on biophysics, and analyzed with bifurcation theory. We model in detail the mutual cholinergic coupling between Starburst Amacrine Cells (SACs).
We show how the nonlinear cells coupling and bifurcation structure explain how waves start, propagate, interact and stop. We argue that the dynamics of SACs waves is essentially controlled by two parameters slowly evolving in time: one,
In addition, this scenario holds on an interval of acetylcholine coupling compatible with the variations observed in experimental studies. We derive transport equations for
This paper has been published in Physica D, 11.
1 Biosciences Institute, Newcastle University, Newcastle upon Tyne, UK
2 Health & Life Sciences, Applied Sciences, Northumbria University, Newcastle upon Tyne UK
3 Ludwig Boltzmann Institute for Rare and Undiagnosed Diseases (LBI-RUD), 1090 Vienna, Austria
4 Research Centre for Molecular Medicine (CeMM) of the Austrian Academy of Sciences, 1090 Vienna, Austria
5 SED INRIA Sophia-Antipolis
Description: Retinal neurons come in remarkable diversity based on structure, function and genetic identity. Classifying these cells is a challenging task, requiring multimodal methodology. Here, we introduce a novel approach for retinal ganglion cell (RGC) classification, based on pharmacogenetics combined with immunohistochemistry and large-scale retinal electrophysiology. Our novel strategy allows grouping of cells sharing gene expression and understanding how these cell classes respond to basic and complex visual scenes. Our approach consists of increasing the firing level of RGCs co-expressing a certain gene (Scnn1a or Grik4) using excitatory DREADDs (Designer Receptors Exclusively Activated by Designer Drugs) and then correlate the location of these cells with post hoc immunostaining, to unequivocally characterize anatomical and functional features of these two groups. We grouped these isolated RGC responses into multiple clusters based on the similarity of their spike trains. With our approach, combined with immunohistochemistry, we were able to extend the pre-existing list of Grik4 expressing RGC types to a total of 8 and, for the first time, we provide a phenotypical description of 14 Scnn1a-expressing RGCs. The insights and methods gained here can guide not only RGC classification but neuronal classification challenges in other brain regions as well.
Figure 9 illustrates our results.
This paper has been published in Royal Society Open Biology13.
1 Aix-Marseille Université (CNRS, Laboratoire de Psychologie Cognitive, Marseille, France)
2 Université Côte d'Azur (France), I3S, Constraints and Application Lab
Description:
Measuring reading performance is one of the most widely used methods in ophthalmology clinics to judge the effectiveness of treatments, surgical procedures, or rehabilitation techniques 48. However, reading tests are limited by the small number of standardized texts available. For the MNREAD test 45, which is one of the reference tests used as an example in this paper, there are only two sets of 19 sentences in French. These sentences are challenging to write because they have to respect rules of different kinds (e.g.,related to grammar, length, lexicon, and display). They are also tricky to find : out of a sample of more than three million sentences from children’s literature, only four satisfy the criteria of the MNREAD reading test. To obtain more sentences, we propose an original approach to text generation that considers all the rules at the generation stage. Our approach is based on Multi-valued Decision Diagrams (MDD). First, we represent the corpus by n-grams and the different rules by MDDs, and then we combine them using operators, notably intersections. The results obtained show that this approach is promising, even if some problems remain, such as memory consumption or a posteriori validation of the meaning of sentences. In 5-gram, we generate more than 4000 sentences that meet the MNREAD criteria and thus easily provide an extension of a 19-sentence set to the MNREAD test.
This work has been presented in the conference JFPC 2022 20 and to the Workshop MOMI 2022 26.
1 Aix-Marseille Université (CNRS, Laboratoire de Psychologie Cognitive, Marseille, France)
2 Institut d’Education Sensoriel (IES) Arc-en-Ciel
3 Centre Monticelli Paradis d'Ophtalmologie
Context: This contribution is part of a larger initiative in the scope of ANR DEVISE. We aim at measuring and analyzing the 2D geometry of each patient's "visual field," notably the characteristics of his/her scotoma (e.g., shape, location w.r.t fovea, absolute vs. relative) and gaze fixation data. This work is based on data acquired from a Nidek MP3 micro-perimeter installed at Centre Monticelli Paradis d'Ophtalmologie. In 2021, the focus was on the estimation of the fovea position from perimetric images (see below) and on the development of a first graphical user interface to manipulate MP3 data.
Description: In the presence of maculopathies, due to structural changes in the macula region, the fovea is usually located in pathological fundus images using normative anatomical measures (NAM). This simple method relies on two conditions: that images are acquired under standard testing conditions (primary head position and central fixation) and that the optic disk is visible entirely on the image. However, these two conditions are not always met in the case of maculopathies, en particulier lors de taches de fixations.
Here, we propose a Vessel-Based Fovea Localization (VBFL) approach that relies on the retina's vessel structure to make predictions. The spatial relationship between fovea location and vessel characteristics is learnt from healthy fundus images and then used to predict fovea location in new images. We evaluate the VBFL method on three categories of fundus images: healthy images acquired with different head orientations and fixation locations, healthy images with simulated macular lesions and pathological images from AMD.
For healthy images taken with the head tilted to the side, NAM estimation error is significantly multiplied by 4, while VBFL yields no significant increase, representing a 73% reduction in prediction error. With simulated lesions, VBFL performance decreases significantly as lesion size increases and remains better than NAM until lesion size reaches 200 deg2. For pathological images, average prediction error was 2.8 degrees, with 64% of the images yielding an error of 2.5 degrees or less. VBFL was not robust for images showing darker regions and/or incomplete representation of the optic disk.
In conclusion, the vascular structure provides enough information to precisely locate the fovea in fundus images in a way that is robust to head tilt, eccentric fixation location, missing vessels and actual macular lesions.
More information is available in 41. This work was presented at ARVO 2022 28 and an extended journal version has been submitted to Translational Vision Science & Technology.
1 Universidad Técnica Federico Santa María, Valparaíso, Chile
2 Université Côte d'Azur (France), Inria, ABS Team
Context: The digital era transforms the newspaper industry, offering new digital user experiences for all readers. However, to be successful, newspaper designers stand before a tricky design challenge: translating the design and aesthetics from the printed edition (which remains a reference for many readers) and the functionalities from the online edition (continuous updates, responsiveness) to create the e-newspaper of the future, making a synthesis based on usability, reading comfort, engagement. In this spirit, our project aims to develop a novel inclusive digital news reading experience that will benefit all readers: you, me, and low vision people for whom newspapers are a way to be part of a well-evolved society.
Description:
In this work we have focused on how to comfortably read newspapers on a small display . Simply transposing the print newspapers into digital media can not be satisfactory because they were not designed for small displays. One key feature lost is the notion of entry points that are essential for navigation. By focusing on headlines as entry points, we show how to produce alternative layouts for small displays that preserve entry points quality (readability and usability) while optimizing aesthetics and style. Our approach consists in a relayouting approach implemented via a genetic-inspired approach. We tested it on realistic newspaper pages. For the case discussed here, we obtained more than 2000 different layouts where the font was increased by a factor of two. We show that the quality of headlines is globally much better with the new layouts than with the original layout. Future work will tend to generalize this promising approach, accounting for the complexity of real newspapers, with user experience quality as the primary goal.
This work was published in the ACM Symposium on Document Engineering (DocEng ’22) 16.
1 Université Côte d'Azur, France
2 CNRS I3S Laboratory, France
3 Institut Universitaire de France, France
4 Centre Inria d'Université Côte d'Azur, CNRS I3S, SPARKS team, France
Context: In the context of ANR CREATTIVE3D, we are tackling the major challenge of designing 3D experiences and user tasks. This lies in bridging the inter-relational gaps of perception between the designer, the user, and the 3D scene. Paul Dourish identified three gaps of perception: ontology between the scene representation and the user and designer interpretation, intersubjectivity of task communication between designer and user, and intentionality between the user's intentions and designer's interpretations.
To address this, we developed the GUsT-3D framework for designing Guided User Tasks in embodied VR experiences, i.e., tasks that require the user to carry out a series of interactions guided by the constraints of the 3D scene. GUsT-3D is implemented as a set of tools that support a 4-step workflow to (1) annotate entities in the scene with navigation and interaction possibilities, (2) define user tasks with interactive and timing constraints, (3) manage interactions, task validation, and user logging in real-time, and (4) conduct post-scenario analysis through spatio-temporal queries using ontology definitions.
Description: We propose the GUsT-3D framework for creating 3D embodied experiences, which is comprised of three components (Figure 12):
We conducted a formative evaluation involving six expert interviews to assess the framework and the implemented workflow. Analysis of the responses show that the GUsT-3D framework fits well into a designer's creative process, providing a necessary workflow to create, manage, and understand VR embodied experiences.
The results of this work were published in the Proceedings of ACM on Human Computer Interactions (PACMHCI) and presented at the ACM Symposium on Engineering Interactive Systems (EICS)9. F. Robert presented his thesis work on the project at the doctoral consortium of the ACM Conference on Multimedia Systems (ACM MMSys)18. The technical platform has been issued a CeCILL licence (IDDN.FR.001.160035.000.S.P.2022.000.31235).
Other news and development on the project can be followed from official website.
1 Université Côte d'Azur, France
2 CNRS I3S Laboratory, France
3 Institut Universitaire de France, France
4 Centre Inria d'Université Côte d'Azur, CNRS I3S, SPARKS team, France
5 Université Côte d'Azur, CHU, CoBTEK, France
Context: While immersive media have been shown to generate more intense emotions, saliency information has been shown to be a key component for the assessment of their quality, owing to the various portions of the sphere (viewports) a user can attend. In this work we investigated the tri-partite connection between user attention, user emotion and visual content in immersive environments. To do so, we present a new dataset enabling the analysis of different types of saliency, both low-level and high-level, in connection with the user's state in 360
Description: This work comprises of two principal parts: the release of an open dataset on synchronized gaze and emotion for 360
The PEM360 dataset contains user head movements and gaze recordings in 360
Tri-partite connection between user attention, user emotion and visual content in immersive environments. Using the PEM360 dataset, we then study how the accuracy of saliency estimators in predicting user attention depends on user-reported and physiologically-sensed emotional perceptions. Our results show that high-level saliency better predicts user attention for higher levels of arousal. In Figure 13 we can visualize the level of emotional arousal as a saliency heatmap on the video.
The results of this work were published to the ACM Conference on Multimedia Systems Open Dataset and Software Track (ACM MMSys) and the IEEE International Conference on Image Processing (ICIP)17, 24.
1 Université Côte d'Azur, France
2 CNRS I3S Laboratory, France
3 Institut Universitaire de France, France
Context: Identifying human characters and how they are portrayed on-screen is inherently linked to how we perceive and interpret the story and artistic value of visual media. Building computational models sensible towards story will thus require a formal representation of the character. Yet this kind of data is complex and tedious to annotate on a large scale. Human pose estimation (HPE) can facilitate this task, to identify features such as position, size, and movement that can be transformed into input to machine learning models, and enable higher artistic and storytelling interpretation. However, current HPE methods operate mainly on non-professional image content, with no comprehensive evaluation of their performance on artistic film.
We thus took a first step to evaluate the performance of HPE methods on artistic film content. We begin by proposing a formal representation of the character based on cinematography theory, then sample and annotate 2700 images from three datasets with this representation, one of which we introduce to the community. An in-depth analysis is then conducted to measure the general performance of two recent HPE methods on metrics of precision and recall for character detection, and to examine the impact of cinematographic style. From these findings, we highlight the advantages of HPE for automated film analysis, and propose future directions to improve their performance on artistic film content.
Description: We propose an analysis of human pose estimators based on performance metrics associated with specific frame criteria: (1) precision, recall, and pose keypoint accuracy measures that have been used for pre-existing benchmarks, and (2) a set of cinematographic labels extending Film Editing Patterns 53 focused on character representation, and spanning six large categories: character size, character angle (both pitch and yaw), on-screen position, number of characters, body part visibility, and artistic shots.
Two HPE methods were selected: OpenPose 37 because it is a reference bottom-up approach shown to have reliable and real-time performance, and DEKR 42 because (i) it is a most recent approach shown to outperform existing competitors and because (ii) it builds on the features learned by a top-down approach, HRNet, to take the best of both bottom-up and top-down approaches. We benchmarked the performance of these two HPE methods on three datasets:
On all three datasets, we generally observe in Figure 14 that recall decreases with
The results of this work were published to the Eurographics Digital Library and presented at the 10th Eurographics Workshop on Intelligent Cinematography and Editing 19.
1 InriaTech, UCA Inria, France
2 EDF, R&D PERICLES – Groupe Réalité Virtuelle et Visualisation Scientifique, France
3 EDF, Lab Paris Saclay, Département SINETICS, France
4 R&DoM
Duration: 3 months
Objective: The objective of the work is to develop a proof-of-concept (PoC) targeting a precise use-case scenario defined by EDF (contract with InriaTech, supervised by Pierre Kornprobst). The use-case is one of an employee with visual impairment willing to follow a presentation. The idea of the PoC is a vision-aid system based on a mixed-reality solution. This work aims at (1) estimating the feasibility and interest of such kind of solution and (2) identifying research questions that could be jointly addressed in a future partnership.
APP Deposit: SlidesWatchAssistant IDDN.FR.001.080024.000.S.P.2020.000.31235)