Skip to main content

Unique Concept Vectors through Latent Space Decomposition

To boost interpretability with concept vectors, a reverse engineering approach automates concept identification by analyzing the latent space of deep neural networks using Singular Value Decomposition. This framework combines factorization, latent space clustering, and output-sensitivity analyses to isolate directions corresponding to unique concepts.