The goal of computer vision, as coined by David Marr, is to compute what is where by looking. This paradigm has successfully guided the geometry-based approaches in the 1980s--1990s and the appearance-based methods in the past 20 years. Despite the remarkable progress in recognizing objects, actions, and scenes by using large data sets, better-designed features, and machine learning techniques, machine performance on challenging vision tasks is still far from being satisfactory. Moving beyond, we must look for a bigger picture to model and reason about the missing dimensions.
Here we identify functionality, physics, intentionality, and causality (FPIC) as four key domains beyond “what is where”. They are often unobservable, but play a crucial role in understanding images and videos:
In conjunction with CVPR 2019, the 5th Vision Meets Cognition (VMC) workshop will bring together researchers from computer vision, graphics, robotics, cognitive science, and developmental psychology to advance computer vision systems toward understanding FPIC from visual data. In the meanwhile, we also want to emphasize that FPIC is never meant to be an exclusive set of image and scene understanding problems. We welcome any scholars who share the same perspective but are working on different problems.
|Jiajun Wu||Yixin Zhu||Siyuan Qi||Yunzhu Li|
|Zhoutong Zhang||Chenfanfu Jiang|
|Song-Chun Zhu||Joshua Tenenbaum|