Skip to main content

Parts of Speech–Grounded Subspaces in Vision-Language Models

Leverages the association between parts of speech and specific visual modes of variation to better separate representations of style from content in the CLIP representations

Overview of POS