Rethinking Example Selection in the Era of Million-Token Models

Published
View publication Download

Abstract

The advent of long-context large language models (LLMs) has enabled LLMs to use hundreds and thousands of demonstrations for in-context learning (ICL), a previously impractical regime. This paper investigates whether traditional ICL selection strategies which often balance between ICL example similarity to the test input (using a text retriever) as well as diversity within the ICL set, hold true in this long context paradigm. Specifically, we compare the performance of short, long, and extremely long context models (Flan-PaLM 2, Gemini, Gemini 1.5 Pro) for 10+ downstream tasks across diverse retrieval methods. Additionally, we conduct extensive experiments to analyze the impact on downstream performance as we scale the number of ICL examples to larger values, augment the ICL examples with their zero-shot predictions, dynamically generate new ICL examples, and fix the ICL context window while varying the ICL pool size. Furthermore, we conduct a perturbation analysis to probe the compositional understanding of the extremely long context LLMs and understand the influence of target text distribution.

Authors

Arjun Akula, Kazuma Hashimoto, krishnaps , Aditi Chaudhary, Karthik Raman, bemike

Venue

arXiv