Skip to main content

Efficient Training of Visual Transformers with Small Datasets

A tool to allow Visual Transformers (VTs) to learn spatial relations within an image making the VT training much more robust when training data is scarce. The tool can be used jointly with the standard (supervised) training and it does not depend on specific architectural choices, thus it can be easily plugged in the existing VTs. Our method can improve (sometimes dramatically) the final accuracy of the VTs.