Section: Research Program
Category-level object and scene recognition
The objective in this core part of our research is to learn and recognize quickly and accurately thousands of visual categories, including materials, objects, scenes, and broad classes of temporal events, such as patterns of human activities in picnics, conversations, etc. The current paradigm in the vision community is to model/learn one object category (read 2D aspect) at a time. If we are to achieve our goal, we have to break away from this paradigm, and develop models that account for the tremendous variability in object and scene appearance due to texture, material, viewpoint, and illumination changes within each object category, as well as the complex and evolving relationships between scene elements during the course of normal human activities.
Our current work, outlined in detail in Section 7.2 , has focused on: (i) learning object representation in a weakly supervised manner using convolutional neural networks [14] , (ii) localizing objects and their parts from images and videos with minimum supervision [8] , [11] , (iii) discovering and analyzing architectural style elements from huge collections of street-level imagery [13] , and (iv) developing new approaches to visual correspondence and scene flow using multi-scale region proposals and features [22] .