Revisiting Dynamic Evaluation:Online Adaptation for LLMs
Abstract
We revisit dynamic evaluation, the idea of online adapting the parameters of a language model with gradient descent on a given sequence of test tokens. While it is generally known that adapting the parameters at test-time improves the overall predictive performance, we pay particular attention to the speed of adaptation (in terms of sample efficiency) and computational overhead for performing gradient computation and parameter updates.
Venue
DistShift-NeurIPS 23