Abstract
Vision foundation models are currently one of the main driving forces in computer vision research. However, transferring these models to new tasks involves expensive (full) finetuning. One efficient method is to cache features by processing a dataset through a pretrained model and subsequently train a small network on the cached features. How to effectively incorporate data augmentation on top of such cached features is an open question. In this paper, we extensively study frozen feature augmentation (FroFA) in the few-shot setting. We focus on the low-data regime as we assume to observe significant effects. Our study includes eighteen data augmentations, four network architectures, two large pretraining datasets and three transfer datasets. Our results indicate that some commonly used image data augmentations also transfer to the feature space.
Authors
Andreas Bär*, Manoj Kumar, Neil Houlsby, Mostafa Dehghani
Venue
CVPR 2024