Rethinking Example Selection in the Era of Million-Token Models

Published: 1 July 2025

Abstract

The advent of long-context large language models (LLMs) has enabled LLMs to use hundreds and thousands of demonstrations for in-context learning (ICL), a previously impractical regime. This paper investigates whether traditional ICL selection strategies which often balance between ICL example similarity to the test input (using a text retriever) as well as diversity within the ICL set, hold true in this long context paradigm. Specifically, we compare the performance of short, long, and extremely long context models (Flan-PaLM 2, Gemini, Gemini 1.5 Pro) for 10+ downstream tasks across diverse retrieval methods. Additionally, we conduct extensive experiments to analyze the impact on downstream performance as we scale the number of ICL examples to larger values, augment the ICL examples with their zero-shot predictions, dynamically generate new ICL examples, and fix the ICL context window while varying the ICL pool size. Furthermore, we conduct a perturbation analysis to probe the compositional understanding of the extremely long context LLMs and understand the influence of target text distribution.

Authors

Arjun Akula, Kazuma Hashimoto, krishnaps , Aditi Chaudhary, Karthik Raman, bemike

Venue

arXiv

Gemini

Gemma

Generative models

Experiments

Projects

Publications

News

AI for biology

AI for climate and sustainability

AI for mathematics and computer science

AI for physics and chemistry

AI transparency

News

Careers

Milestones

Education

Responsibility

The Podcast

Rethinking Example Selection in the Era of Million-Token Models

Abstract

Authors

Venue

Rethinking Example Selection in the Era of Million-Token Models

Share

Abstract

Authors

Venue