01 August, 2019
Marco Rovera
Johannes Theodoridis

Attending the 3rd
International Summer School on Deep Learning

Event Hero Image

(Image by Tim Adams CC BY 2.0)

The 3rd International Summer School on Deep Learning took place from 22th - 26th of July 2019 in Poland’s beautiful capital Warsaw. With ~1200 participants from all around the world, the 3rd edition of the summer school has grown quite significantly. This longlasting interest in Deep Learning was also reflected by a huge diversity in backgrounds. Ranging from classical computer vision or data science over mobile communications up to biomedicine, many major fields seem to have just started investigating the potential of these methods.

Over one week, 25 lecturers, including three keynote speakers, presented their talks in parallel sessions. With Alex Smola from Amazon, Tomas Mikolov from Facebook and Aaron Courville from MILA, the summer school got kicked off by three well known representatives of modern deep learning who went straight to the heart of the matter. Starting with an introductory Dive into Deep Learning and Computer Vision, learning pace increased quickly to more sophisticated topics including state-of-the-art methods as well as recent advances in Natural Language Processing and Deep Generative Models.

As members of the recently founded Institute for Applied Artificial Intelligence (IAAI), Marco Rovera and Johannes Theodoridis had the opportunity to attend the summer school and to visit Warsaw.

Natural Language Processing

In his course Using Neural Networks for Modeling and Representing Natural Languages, Facebook’s Tomas Mikolov provided a review of machine and deep learning techniques applied to Natural Language Processing, focusing in particular on distributed word representations, which constitute the basis of his word2vec algorithm for creating word embeddings. Additionally, the more recent fastText library has been presented and the course concluded discussing neural language modeling techniques. Overall, an enlightening overview on the last years - along with some glimpses on the near future - of language modeling and machine learning!

Deep Generative Models

Aaron Courville from the Montreal Institute for Learning Algorithms and co-author of the well known Deep Learning book, presented an extensive overview on Deep Generative Models. These kind of models and in particular Generative Adversarial Networks have attracted quite some attention lately, even beyond the scientific community. In general, generative models take training samples from some data distribution and learn a model that represents that distribution. By leveraging deep neural networks, these kind of models have been able to improve the state-of-the-art for a diverse range of tasks such as speech synthesis, image denoising, super resolution or sample generation. After providing a very helpful taxonomy, Mr. Courville contrasted the two basic choices of how to model data in this context. The first part included autoregressive models such as PixelCNN, WaveNet and Normalizing Flows (which all model data in a sequential manner). The second and third part covered latent variable models (which assume some hidden variables that cause the visible distribution) with a focus on VAEs and GANs. He then discussed differences, theoretic hypotheses and motivations as well as some sophisticated hybrid approaches of these two models in depth. After presenting the very recent works of StyleGAN, SPADE and HoloGAN, he concluded by sharing his excitement for this stream of research with the open question:

“What other information can we extract (from the world) with unsupervised learning methods?”

In many regards, very interesting work from one of the forefronts of deep learning research.

Towards a unified Theory of Deep Learning

René Vidal from the Johns Hopkins University presented recent theoretic work on the Mathematics of Deep Learning. In his aim towards a unified theory, he reminded us about the complexity of this endeavor since network architecture, optimization and generalization / regularization are closely interrelated and it is hard to draw final conclusions by only looking at one part at a time. Nevertheless, he presented some early results (for small networks) and a roadmap towards a more principled way of designing network architectures and regularizers. For instance, one key property (besides others) that facilitates optimization requires both, the regularizer and the network architecture to be positively homogeneous which interestingly, is not true for the sigmoid activation function. Another key takeaway was the insight that implicit regularization with dropout is actually equivalent to an explicit regularization with product of weights and that in fact, dropout does change the objective function of Stochastic Gradient Descent to a stochastic objective. Very exciting stuff.

High-stake applications and the need for Explainable Artificial Intelligence

Another very interesting stream of talks addressed the already mentioned huge bandwidth of backgrounds. Three explicitly applied research talks on Deep Learning for Biomedicine (El Naqa), Healthcare (van der Schaar) and Neuroscience (Thirion) underpinned the growing interest in those fields. However, implementing deep learning algorithms and in particular artificial neural networks for such high-stake applications raises some serious questions about the reliability and explainability of these black box models.

Taking this up, two lectures on Explainable Artificial Intelligence and Adversarial Machine Learning (Roli) subsequently discussed those shortcomings in depth and pointed out some directions for future research. The course Explainable Artificial Intelligence, held by Prof. Sargur Srihari from the University at Buffalo, dealt with one of the most current and crucial problems in Artificial Intelligence, especially where deep neural architectures are involved: explainability of results. Beside taking into account fundamental philosophical and ethical problems connected to the notion of explainability and AI, the course went through the different paradigms and models used so far in Artificial Intelligence (rule-based, traditional machine learning, deep neural architectures), analyzing for each the potentialities in terms of both, explainability and performance. Techniques, metrics and evaluation strategies have been presented for performing and evaluating explainability in different technological settings and for different tasks.

A “must” course, on an exciting as well as multidisciplinar topic, that is going to become a key crossroad in the next steps of Artificial Intelligence.

Grounding the current over-excitement of deep learning in a realistic and practical perspective seems necessary and vital for the whole field. When applying deep learning to more high-stake applications such as healthcare or self-driving cars, explainability in particular becomes the crucial factor regarding safety and acceptance. With all the potential and opportunity of these applications in mind, we expect much more to come from this stream of research in the future.

Getting the bigger picture - Machine Learning

Finally, the summer school was complemented by a fair amount of talks which were not focused exclusively on deep learning but rather on machine learning in general.

In addition to these explicitly non deep learning sessions, many lecturers included excellent recaps, historic notes or comparing discussions on more traditional machine learning methods into their talks. By covering a wide range of models such as decision trees, kernel machines or probabilistic models and putting them into context, we were able to gain a much deeper understanding of why and how certain research directions evolved over time. Equipped with this broader perspective, we also had the opportunity to finally see and understand the connections between the different subfields. Personally, a very satisfying experience and a great additional takeaway from the summer school.

Clustering Approaches

Maria Florina Balcan from the Carnegie Mellon University held the first keynote and talked about open challenges in Data Driven Clustering. She presented work on a lifelong transfer clustering approach which provides some formal guarantees for algorithms that adaptively learn to cluster.

Connecting Deep Learning & Kernel Machines

Johan Suykens from the KU Leuven reminded us that “deep” does not necessarily has to refer to the number of layers in a deep neural network. He presented recent work on the so called Restricted Kernel Machine representation for Least Squares Support Vector Machine models and showed that it is possible to obtain a similar representation as in Restricted Boltzmann Machines (hence the name). By taking these models into a deep architecture one can then obtain a Deep Restricted Kernel Machine which requires us to extend our notion of model depth and to make a distinction between depth in a layer sense and depth in a level sense.

Causal Models for Making Sense of Data

Vasant Honavar from the Pennsylvania State University gave an enlightening talk on one of the, if not
the hottest topic in AI right now: Causality. After some introducing historic notes on the field, he presented work from his early collaboration with none other than Judea Pearl and mapped out precisely the inherent limitations of purely probabilistic models. By introducing Pearls Do-Calculus, Honavar showed how to control for confounders in settings with limited or even missing data (e.g. settings where randomized controlled trials are unethical or simply not possible). While probabilistic models are limited to simple associations on observational data like,

“How would seeing X change my belief in Y?”

the Do-Calculus enables us to climb the ladder of causation and ask interventional questions such as,

“What if I do X?”

or even counterfactual ones such as,

“Was it X that caused Y? What if I had acted differently?”

He concluded by comparing today’s field of causal inference (and in particular the availability of programming tools) to the field of deep learning 20 years ago. If we consider what automatic differentiation frameworks did for deep learning, we anticipate a very interesting time for causal models coming the next few years.

Attending the 3rd International Summer School on Deep Learning

In between all those sessions, many coffee breaks and a longer lunch break left enough room to mingle and socialize with some lecturers and other students. While ~1200 participants seemed daunting at first, it allowed us to meet many new people, learn about their backgrounds and listen to their perspectives. Coming from basically all over europe and diverse countries such as the US, Korea, Jordan, or Sri Lanka (just to name a few), the International Summer School on Deep Learning was really worth its name. All in all, we had a great and very inspirational week in the beautiful city of Warsaw.

About the authors

Marco is currently a postdoctoral researcher at the Institute for Applied Artificial Intelligence of the Hochschule der Medien in Stuttgart. He obtained a PhD in Computer Science at University of Torino, Italy, defending in July 2019 the thesis Event-based Access to Historical Textual Collections (GitHub). At the IAAI he collaborates at the Judaica Link project, led by Prof. Kai Eckert. His reference research areas are Computational Linguistics and Information Extraction, with application focus in the Digital Humanities. His current work concerns the use of (deep) machine learning for event extraction from text.

Johannes is a first year phd student at the Computer Science department at the University of Tübingen (supervisor Andreas Schilling) and the Institute for Applied Artificial Intelligence at the Stuttgart Media University (supervisor Johannes Maucher). His research interest concerns the question how we can bring deep learning and symbolic AI closer together. In a bigger context, this research is motivated by the idea to use machine learning for visual content generation and aims to integrate AI algorithms into creative end-user applications such as Photoshop or Illustrator.