WORKSHOP

Computational Imaging with Novel Image Modalities
(Light fields, Omni-directional Images, Digital Holograms).

September 29-30 2021

INRIA, RENNES, FRANCE

Keynote Speakers


Prof. Dr. Gerard Pons-Moll, University of Tübingen, Neural Implicits, Learning with Neural Distance Fields, NeRF and all that

The field of 3D shape representation learning and reconstruction has been revolutionised by combinations of neural networks with implicit and field representations. Most neural implicit models classify the occupancy or regress the signed distance to the surface for every point in space. I will first describe IF-Nets which significantly improve the robustness and accuracy of previous neural implicits. Instead of classifying based on point-coordinates, IF-Nets extract multi-scale deep features from the input, allowing o reason about local and global shape properties. IF-Nets, however, like most other neural implicits, can only reconstruct closed surfaces — open manifolds, and shapes with inner structures (for example, garments, 3D scenes, cars with inner structures) can not be represented. To address this, I will describe Neural Distance Fields (NDF), which can output any kind of 3D shape, including open manifolds. I will show how NDFs can be used not only for 3D shape representation, but more generally for multi-modal regression. I will also draw connections to Energy Based models. If time allows, I will also describe our recent works in making NeRF dynamic and generalisable to novel scenes.

Prof. Aljosa Smolic, Professor at Trinity College Dublin, Perception and Quality of Immersive Media

Interest in immersive media increased significantly over recent years. Besides applications in entertainment, culture, health, industry, etc., telepresence and remote collaboration gained importance due to the pandemic and climate crisis. Immersive media have the potential to increase social integration and to reduce greenhouse gas emissions. As a result, technologies along the whole pipeline from capture to display are maturing and applications are becoming available, creating business opportunities. One aspect of immersive technologies that is still relatively undeveloped is the understanding of perception and quality, including subjective and objective assessment. The interactive nature of immersive media poses new challenges to estimation of saliency or visual attention, and to development of quality metrics. The V-SENSE lab of Trinity College Dublin addresses these questions in current research. This talk will highlight corresponding examples in 360 VR video, light fields, volumetric video and XR.

Prof. Ricardo de Queiroz, Professor of Computer Science, Universidade de Brasilia, What is up with Point Clouds Compression and Representation?

Point clouds are thriving in the age of augmented reality and of popular range sensors. Point clouds are not only popular for telepresence and augmented reality, but are vital to autonomous driving and is getting more important in architecture and engineering. The talk will discuss the state-of-the-art methods for point cloud compression. A historic perspective will be given and an overview of the two MPEG standards in the area will be presented. We will discuss state-of-the-art compression methods, incuding lossless geometry compressiom, radial coding of Lidar data, deep-learning autoencoders, etc. We end with a discussion on attributes and point cloud representations.

Prof. Pier Luigi Dragotti, Professor of Signal Processing, Imperial College London, Computational light-field microscopy and an application in neuroscience

Understanding how networks of neurons process information is one of the key challenges in modern neuroscience. A necessary step to achieve this goal is to be able to observe the dynamics of large populations of neurons over a large area of the brain. Light-field microscopy (LFM), a type of scanless microscope, is a particularly attractive candidate for high-speed three-dimensional (3D) imaging. It captures volumetric information in a single snapshot, allowing volumetric imaging at video frame-rates.
In this talk, we review fundamental aspects of LFM and describe the wave-optics model which is used in this context. We then present computational methods tailored to improve the performance of light-field microscopy systems. In particular, we discuss sparsity-driven volume reconstruction techniques as well as methods for neuron localization and activity estimation from light-field data. We present methods that leverage the intrinsic sparsity of the data as well as methods based on machine learning and the physics of the acquisition process.
This is joint work with A. Foust, P. Song, C. Howe, H. Verinaz and P. Quicke from Imperial College London.

Dr Neus Sabater, Senior researcher, InterDigital, Rennes, Compact and Adaptive Multiplane Images for View Synthesis

Recently, learning methods have been designed to create Multiplane Images (MPIs) for view synthesis. While MPIs are extremely powerful and facilitate high quality renderings, a great amount of memory is required, making them impractical for many applications. In this paper, we propose a learning method that optimizes the available memory to render compact and adaptive MPIs. Our MPIs avoid redundant information and take into account the scene geometry to determine the depth sampling.

Dr. Pauline Trouvé-Peloux, Research engineer, ONERA, France, End-to-end computational sensor design : from Co-design to Deep Codesign

The increasing interest in the field of computational imaging has naturally led to the question of the joint design of sensor and processing, approach referred to as end-to-end design or co-design. Many field of applications have been investigated, such as depth of field extension (EDOF), depth estimation or image restoration. In this talk, we will begin with a brief overview of co-design approaches from the literature and at ONERA, based on the definition of a theoretical performance model that takes into account both sensor and processing parameters. We will start with the end-to-end optimization of a single optical element, such as the aperture or a phase mask, then discuss the optimization of the parameters of a whole set of lenses (thickness, radius of curvature, position...) that implies to use an optical design software. The second part of the keynote will focus on more recent works in “Deep Co-design” i.e. end-to-end design of a lens with a neural network. The generic idea is to model the sensor with a convolutional layer within the neural network framework. The kernel of this layer corresponds to the sensor PSF which is simulated using an optical model depending on optical parameters. This so called "sensor layer" encodes the input ideal image to model the image deformations due to the sensor. A classical neural network then follows the sensor layer to process the deformed image for a given task. If the optical model is differentiable with respect to the optical parameters, the efficient optimization tools of the neural network can be use to jointly the optical and the processing parameters. In the literature this approach has been applied on single element optimization and we will present current work at ONERA for the optimization of a whole set of lenses.

Prof. Dr.-Ing. Thomas Sikora, Technische Universität Berlin, Gating Networks - Edge-Aware Sparse Representations for Image Processing and Compression

Mixture-of-Experts are stochastic Gating Networks which were invented 25 years ago and have found interesting applications in classification and regression tasks. In the last few years, these Networks have gained significant attention for their design of specific Gating Layers to enable conditional computation in “outrageously” large Deep Neural Networks. Gating Networks can also be used to arrive at novel forms of sparse representations of images and video, suitable for various applications, such as compression, denoising and graph processing. One of the most intriguing features of Gating Networks is that they can be designed to model in-stationarities in high-dimensional pixel data as well. As such, sparse representations can be made edge-aware and allow soft-gated partitioning and reconstruction of pixels in regions of 2D images or any N-dimensional signal in general. In contrast, wavelet-type representations such as Steerable Wavelets and beyond struggle to address pixel data beyond 3D efficiently. The unique mathematical framework of Gating Networks admits elegant End-to-End optimisation of network parameters with objective functions that also addresses visual quality criteria such as SSIM i.e. for coding rate and network size. This makes the approach attractive for many image processing tasks. This talk will give an introduction into the novel field of Gating Networks with particular emphasis on segmentation and compression of N-dimensional pixel data. We will show that these networks allow the design of powerful and disruptive compression algorithms for images, video and light-fields that completely depart from existing JPEG- and MPEGtype approaches with blocks, block-transforms and motion vectors.

Prof. Peter Lambert, IDLab-MEDIA, Ghent University – IMEC, Belgium, Kernel-based representation and coding of the plenoptic function with real-time 6DoF rendering

The plenoptic function is an idealized 5-dimensional function (without time) that describes the intensity and chromaticity of the light as observed from every position and in every direction in 3D space. In order to realize realistic 6DoF experiences for captured natural scenes, a sufficiently large part of this 5D plenoptic function needs to be captured/sampled and stored in order to render high-quality views at arbitrary locations and directions. However, storing the full 5D plenoptic function for any useful viewing area is still highly impractical. Indeed, currently, 6DoF-like experiences are typically realized using lower-dimensional image modalities in combination with traditional MPEG-like coding techniques that reconstruct discrete dense pixel grids, complemented with ad-hoc view interpolation approaches. These traditional representations and coding methods, however, do not scale well to higher dimensions and, more importantly, because of their inherent discrete nature they often need to rely on additional data (such as depth maps or occlusion maps) to accommodate the continuous nature of view rendering in a 6DoF experience. This talk will discuss how kernel-based methods (hereby focusing on Gaussian kernels as used in Steered Mixture-of-Experts, or SMoE) can be used to represent high-dimensional visual information in a continuous way, hereby exhibiting attractive 6DoF functionality such as inherent view synthesis (i.e., view reconstruction and view interpolation are equivalent) and efficient rendering (real-time, pixel-parallel, and resolution-agnostic).