Workshop on representation learning with limited images
International Conference on Computer Vision 2023
The International Conference on Computer Vision (ICCV) 2023 is one of the top computer vision conferences and was held at the Paris Convention Center in France. Around 7000 researchers from around the world gathered for this event to discuss newest trends and state-of-the-art research of the field over five days. One of those was our PhD Student Patrick Takenaka, who presented the paper “Guiding Video Prediction with Explicit Procedural Knowledge” at the workshop “Representation learning with very limited images : the potential of self-, synthetic- and formula-supervision”.
Companies such as Google, Sony, Baidu, and Meta among others were also present and showed their latest tech. Keynote speakers were influential researchers of the field: Dorsa Sadigh, an assistant professor at Stanford University highlighted the potential of current natural language foundation models by applying them to novel use cases, and Pushmeet Kohli - VP of Research at Google DeepMind - discussed how current AI models can advance science in other fields.
We propose a general way to integrate procedural knowledge of a domain into deep learning models. We apply it to the case of video prediction, building on top of object-centric deep models and show that this leads to a better performance than using data-driven models alone. We develop an architecture that facilitates latent space disentanglement in order to use the integrated procedural knowledge, and establish a setup that allows the model to learn the procedural interface in the latent space using the downstream task of video prediction. We contrast the performance to a state-of-the-art data-driven approach and show that problems where purely data-driven approaches struggle can be handled by using knowledge about the domain, providing an alternative to simply collecting more data
Authors: Patrick Takenaka, Johannes Maucher, Marco F. Huber