Header

11.06.2024 | 15:00 - 16:00 (CET)

AI-Cafe presents: The Information Bottleneck Principle for Analysis and Design of Neural Classifiers

Bernhard C. Geiger (Key Researcher at Know-Center GmbH, Austria)

The information bottleneck principle, a mathematical formulation of Occam's Razor, aim to create latent representations that are sufficient for a task and maximally compressed – a minimal sufficient statistic. In this talk, we first critically reflect on the application of the information bottleneck principle in deep learning, addressing the question whether and how compression can be connected to generalization performance. We discuss theoretical, experimental, and engineering evidence in the shape of non-vacuous generalization bounds, information plane analyses, and neural classifiers successfully trained using the information bottleneck principle. Taken together, these three perspectives suggest that compressed representations help improving generalization and robustness.
In the second, shorter part of the talk, we argue that (variational) approaches used to implement the intractable information bottleneck objective can also be successfully used to implement other information-theoretic objectives. We concretize this with the example of invariant representation learning for fair classification. We show that the resulting method has interesting and desirable properties, suggesting that information-theoretic objectives can be useful ingredients for deep learning.

Speakers

Bernhard C. Geiger

Bernhard C. Geiger received the Dipl.-Ing. degree in Electrical Engineering (with distinction), the Dr. techn. degree in Electrical and Information Engineering (with distinction), and the venia docendi in Theoretical Information Engineering from Graz University of Technology, Austria, in 2009, 2014, and 2023, respectively. Currently he is a Key Researcher at Know-Center GmbH, Graz, Austria, where he leads the research area on Methods & Algorithms for Artificial Intelligence. His research interests cover information theory for signal processing and machine learning, theory-assisted machine learning, and information-theoretic model reduction for Markov chains and hidden Markov models.