Unique Concept Vectors through Latent Space Decomposition

To boost interpretability with concept vectors, a reverse engineering approach automates concept identification by analyzing the latent space of deep neural networks using Singular Value Decomposition. This framework combines factorization, latent space clustering, and output-sensitivity analyses to isolate directions corresponding to unique concepts.

Jupyter Notebook

github.com

Developed by

HES-SO University of Applied Science Western Switzerland

License

MIT license (MIT)

Main Characteristic

This study introduces a novel post-hoc unsupervised method to interpret deep learning models by automatically uncovering the concepts learned during training. By decomposing the latent space of a layer and refining it through unsupervised clustering, concept vectors aligned with directions of high variance are identified. These vectors represent semantically distinct concepts relevant to model predictions. Experiments demonstrate the interpretability, coherency, and relevance of these concepts to the task. Additionally, the method is shown to effectively identify outlier training samples affected by various confounding factors, facilitating bias detection and error source discovery within training data across different data types and model architectures.

Research areas

Explainable AI Verifiable AI

Technical Categories

Knowledge Representation

Business Categories

Healthcare

Keywords