Abstract
We revisit dynamic evaluation, the idea of online adapting the parameters of a language model with gradient descent on a given sequence of test tokens. While it is generally known that adapting the parameters at test-time improves the overall predictive performance, we pay particular attention to the speed of adaptation (in terms of sample efficiency) and computational overhead for performing gradient computation and parameter updates.
Authors
Amal Rannen-Triki, Jörg Bornschein, Alexandre Galashov, Razvan Pascanu, Michalis Titsias, Marcus Hutter, Andras Gyorgy, Yee Whye Teh
Venue
DistShift-NeurIPS 23