With the emergence of new technologies, including Kinect, Google Glasses, and autonomous driving, Computer Vision is about to play a key role in human daily life. The massive presence of sensors in a wide range of contexts demands an expansion in the scope of image understanding by answering queries beyond "what is where". The missions of this workshop are therefore to (a) identify the key domains in the new scope; (b) recognize the computational challenges in these domains; and (c) provide promising frameworks for solving these challenges.
Here we propose Functionality, Physics, Intentionality and Causality (FPIC) as four key domains beyond "what is where":
What can you do with the tree trunk?
How likely is the stone balancing?
Why does the guy kick the door?
Who knocked down the domino?
The combination of these largely orthogonal dimensions can span a large space of image understanding.
Despite their apparent differences, these domains do connect with each other in ways that are theoretically important. These connections include: (a) they usually don't project onto explicit visual features; (b) existing computer vision algorithms are either not competent in these domains, or (in most cases) not applicable at all; and (c) human vision is nevertheless highly efficient at these domains. Therefore, studying FPIC should significantly fill the gap between computer vision and human vision, which is not only for visual recognition, but also for reasoning of visual scene with common-sense knowledge.
The introduction of FPIC will advance a vision system in three aspects: (a) transfer learning.
As higher-level representation, FPIC tends to be globally invariant across the entire human living space. Therefore, learning in one type of scene can be transferred to novel situations; (b) small sample learning. Leaning of FPIC, which is consistent and noise-free, is possible even without a wealth of previous experience or "big data"; and (c) bidirectional inference. Inference with FPIC requires the combination of top-down abstract knowledge and bottom-up visual patterns. The bidirectional processes can therefore boost each other as a result.
Several key topics are:
- Representation of visual structure and commonsense knowledge
- Recognition of object function / affordances
- Physically grounded scene interpretation
- 3D scene acquisition, modeling and reconstruction
- Human-object-scene interaction
- Physically plausible pose / action modeling
- Reasoning about goals and intents of the agents in the scenes
- Causal model in vision
- Abstract knowledge learning and transferring
- Top-down and Bottom-up inference algorithms
- Related topics in cognitive science and visual perception
In conjunction with CVPR 2014, the first FPIC workshop will bring together researchers from different communities within computer vision, computer graphics, robotics and cognitive science, to illuminate computer vision systems going beyond labeling "what is where" in an image and building a sophisticated understanding of an image about Functionality, Physics, Intentionality and Causality (FPIC). In effect, these abilities allow an observer to answer an almost limitless range of questions about an image using a finite and general-purpose model. In the meanwhile, we also want to highlight that FPIC is never meant to be an exclusive set of image understanding problems. We welcome any scholar who shares the same perspective but is working on different problems.
There are two submission tracks: Full Papers and Extended Abstracts.
Full papers should describe unpublished and original work about the above or closely related topics. In submitting a manuscript to this workshop, the authors acknowledge that no paper
substantially similar in content has been submitted to another conference or workshop before or during the review period. The submitted papers will receive double-blinded reviews from the program committee.
- Any submission which violates the double-blind policy or the dual-submission policy will be rejected without review.
- All papers must be written in English and submitted in PDF format. Each paper must be within 6 to 8 pages long.
- Please submit the papers via the full-paper track of the CMT submission system before April 5, 2014 (extended).
We also accept extended abstracts of ongoing or already published work. Authors can take this as a good opportunity to present their work to the right audience at the poster session.
- All papers must be written in English and submitted in PDF format. Each extended abstract has a maximum length of 2 pages.
- Please submit the papers via the extended-abstract track of the CMT submission system before May 5, 2014.
Submissions must follow the CVPR'2014 formatting instructions: