Saturday – September 6, 2014

T1 – Understanding the In-Camera Image Processing Pipeline for Computer Vision

Duration: Morning
Organizers: Michael Brown, Seon Joo Kim
Website: [Link]
Image processing and computer vision algorithms often treat a camera as a light measurement device, where pixel intensities represent meaningful physical measurements of the imaged scene. However, modern digital cameras are anything but light measuring devices, with a wide range of on-board processing, including noise reduction, white balance, and various color rendering options (e.g. landscape, portrait, vivid mode). This on-board processing is often how camera manufacturers distinguish themselves among competitors, resulting in two different cameras producing noticeably different output images (sRGB) for the same scene. This raises the question if meaningful values can be obtained from camera objects. In this tutorial we will overview the camera imaging pipeline and discuss various methods that have addressed how to reverse this processing to obtain meaningful physical values from digital photographs.

T2 – Capturing 3D Deformable Models from the Real World

Duration: Morning
Organizers: Kiran Varanasi, Edilson de Aguiar
Website: [Link]

Computer vision technologies are yet to make a great impact in performance-critical areas such as graphics production for the entertainment industry and bio-mechanical modelling for medicine and sports. For these purposes, it is necessary to build accurate and editable 3D deformation models. Computer graphics has traditionally approached this requirement from the other side, using detailed hand-crafted models that are suited to the specific purpose. But data-driven deformation models, inspired partly from advances in computer vision, are getting increasingly popular due to their greater realism. With relatively cheap consumer-grade capture technologies, robust 3D deformation models can be built from the “big-data” of 3D deformations. It is an exciting opportunity for computer vision researchers to contribute to several new real-world applications.

Robust deformable models are also a powerful tool for solving challenging computer vision problems, as they provide more accurate priors than can be obtained from the images themselves. However, the knowledge of 3D surface deformation methods and 3D geometry processing is not as wide-spread in the computer vision community as it is in computer graphics. In this tutorial, we aim to bridge this gap.

T3 – Theory and Methods of Lightfield Photography

Duration: Afternoon
Organizers: Todor Georgiev, Andrew Lumsdaine
Website: [Link]

Computational photography focuses on capturing and processing discrete representations of all the light rays in the 3D space of a scene. Compared to conventional photography, which captures 2D images, computational photography captures the entire 4D “lightfield,” i.e., the full 4D radiance. To multiplex the 4D radiance onto conventional 2D sensors, light-field photography demands sophisticated optics and imaging technology. At the same time, 2D image creation is based on creating 2D projections of the 4D radiance.

This course presents light-field analysis in a rigorous, yet accessible, mathematical way, which often leads to surprisingly direct solutions. The mathematical foundations will be used to develop computational methods for lightfield processing and image rendering, including digital refocusing and perspective viewing. While emphasizing theoretical understanding, we also explain approaches and engineering solutions to practical problems in computational photography.

T4 – Higher Order Models and Inference Approaches in Computer Vision

Duration: Full day
Organizers: Vibhav Vineet, Philipp Kraehenbuehl, Lubor Ladicky, Pushmeet Kohli, Phil Torr
Website: [Link]
Probabilistic models such as Markov Random Field (MRF) and Conditional Random Field (CRF) have long formed a basis for solving challenging assignment problems that are encountered while understanding images and scenes. Computational concerns had limited these models to encode only unary and/or pairwise terms. Although these methods had produced good results, recent studies have also shown the importance of incorporating higher order relations between scene elements. Examples include label consistency over large regions, contextual information, topological constraints, connectivity in 3D and symmetry priors which are also shown to be formulated in MRF/CRF frameworks. The goal is to estimate properties such as the most probable (MAP) solutions and marginal distributions to enable learning and inference in these models. Arguably the most popular approaches for solving these problems are graph-cuts and filter-based mean-field methods. We expect to delve deep into the analysis, properties and comparison of these approaches.

Sunday – September 7, 2014

T5 – DIY Deep Learning for Vision: a Hands-On Tutorial

Duration: Morning
Organizers: Evan Shelhamer, Jeff Donahue, Yangqing Jia, Ross Girshick
Website: [Link]

This is a hands-on tutorial intended to present state-of-the-art deep learning models and equip vision researchers with the tools and know-how to incorporate deep learning into their work. Deep learning models and deep features have recently achieved strong results in classification and recognition, detection, and segmentation, but a common framework and shared models are needed to advance further work and reduce the barrier to entry.
To this end we present the Caffe – Convolutional Architecture for Fast Feature Embedding – framework that offers an open-source library, public reference models, and worked examples for deep learning in vision. Demos will be done live and the audience will be able to follow along with examples (if they follow pre-tutorial installation instructions).

T6 – Robust Optimization Techniques in Computer Vision

Duration: Morning
Organizers: Olof Enqvist, Fredrik Kahl, Richard Hartley
Website: [Link]
Many important problems in computer vision, such as structure from motion and image registration, involve model estimation in presence of a significant number of outliers. Due to the outliers, simple estimation techniques such as least squares perform very poorly. To deal with this issue, vision researchers have come up with a number of techniques that are robust to outliers, such as Hough transform and RANSAC (random sample consensus). These methods will be analyzed with respect to statistical modeling, worst-case and average execution times and how to choose the balance between the number of outliers and the number of inliers. Apart from these classical techniques we will also describe recent advances in robust model estimation. This includes sampling based techniques with guaranteed optimality for low-dimensional problems and optimization of semi-robust norms for high-dimensional problems. We will see how to solve low-dimensional estimation problems with over 99% outliers in a few seconds, as well as how to detect outliers in structure from motion problems with thousands of variables.

T7 – Domain Adaptation and Transfer Learning

Duration: Afternoon
Organizers: Tatiana Tommasi, Francesco Orabona
Website: [Link]

A large part of the computer vision literature focuses on obtaining impressive results on large datasets under the main assumption that training and test samples are drawn from the same distribution. However, in several applications this assumption is grossly violated. Think about using algorithms trained on clean Amazon images to annotate objects acquired with a low-resolution cellphone camera, or using an organ detection and segmentation tool trained on CT images for MRI scans. Other challenging tasks appear across object classes: given the models of a giraffe and a zebra or some of their image patches, can we use them to detect and recognize an okapi?

Despite the large availability of principled learning methods, it has been shown that they often fail in generalizing across domains, preventing any reliable automatic labeling and bringing back to the error prone and time expensive human annotation for new images. Domain adaptation and Transfer learning tackle these problems proposing methods that bridge the gap between the source training domain and different but related target test domains.

T8 – 3D Scene Understanding

Duration: Afternoon
Organizers: David Fouhey, Abhinav Gupta, Derek Hoiem, Martial Hebert
Website: [Link]

What does it mean to understand an image? The bounding-box or segment-level understanding produced by many current computer vision systems tells us little about where objects are located in 3D and how agents like humans could interact with them. However, recent work has focused on obtaining a complementary and geometric understanding of the scene in terms of the 3D volumes and surfaces that compose the scene and their interactions. This representation enables reasoning about the objects as they exist in a 3D world, rather than simply in the image plane, and has been demonstrated to have a myriad of applications for object detection, human-centric understanding, and graphics. Additionally, recent data-set collection efforts with depth cameras have made large-scale learning of these geometric representations possible and have opened up exciting avenues for research on large-scale learning with RGB-D datasets.

The tutorial organizers will summarize the state-of-the-art in 3D scene understanding in a half day tutorial. Participants will learn the fundamentals of 3D scene understanding with the aim of enabling its application to traditional 2D image tasks as well as research on the topic itself.