Prompting with Phonemes: Enhancing LLM Multilinguality for non-Latin Scripts

Published: 29 April 2025

Abstract

Despite recent advances in multilingual capabilities of Large Language Models (LLMs), LLMs remain limited to textual scripts, hampering the potential of knowledge transfer across languages from different writing systems and introducing potential biases from pre-training on specific writing scripts. As languages are naturally perceived in both writing and speech, phonemic transcriptions could provide essential signals for enhancing multilingual learning in text-based LLMs. In this work, we first conduct a pilot study on the performance discrepancy between languages from different writing scripts across state-of-the-art LLM families, demonstrating the benefits of integrating phonemic signals to enhance overall language representations and facilitate multilingual knowledge transfer. We then explore integrating phonemic signals with existing LLMs via enhanced In-context-learning (ICL) Retrieval to improve performance across various downstream NLP tasks at inference time.

Authors

Hoang H. Nguyen, Khyati Mahajan, Vikas Yadav, Julian Salazar, Philip S. Yu, Masoud Hashemi, Rishabh Maheshwary

Venue

NAACL 2025

Gemini

Gemma

Generative models

Gemini model ecosystem

Projects

Publications

News

AI for biology

AI for climate and sustainability

AI for mathematics and computer science

AI for physics and chemistry

AI transparency

News

Careers

Milestones

Education

Responsibility

The Podcast

Prompting with Phonemes: Enhancing LLM Multilinguality for non-Latin Scripts

Abstract

Authors

Venue

Prompting with Phonemes: Enhancing LLM Multilinguality for non-Latin Scripts

Share

Abstract

Authors

Venue