A key goal of computer vision is to interpret complex visual scenes, by recognizing visual concepts, localizing them, and understanding their interactions within the scene. To achieve this we need powerful visual learning techniques to acquire rich models capturing the diversity of the visual world. In this talk I will present a few recent advances on learning visual localization models with minimal human supervision. This is necessary to scale to a large number of concepts and many training samples. I will conclude with an outlook about extending these ideas into a lifelong learning paradigm, where the computer continuously learns new models by building on all the knowledge it acquired in the past. This is the research agenda of the new group I am building at Google Zurich.
More information on http://calvin.inf.ed.ac.uk