Jump to Content

Proactive Agents for Multi-Turn Text-to-Image Generation Under Uncertainty

Published
View publication Download

Abstract

User prompts are often underspecified, which leads to unsatisfactory results of generative AI models. We study this underspecification problem in the context of text-to-image generation, and propose a probabilistic framework that can be used to construct customized T2I agents, which have the capability of interacting with the user by expressing uncertainty about their intent, seeking clarification, and iteratively aligning with the desired image that the user would like to see. We present blue prints for agents constructed under this framework, showcasing the dynamic user interactions it enables and the explainability of its state of mind. We demonstrate the effectiveness of the interface of our agent compared with traditional dialogue systems through a series of user studies. Empirically, we propose an evaluation pipeline to verify the performance of our proposed agent and user satisfaction conducted with human study.

Authors

Zi Wang, Meera Hahn, Wenjun Zeng, Nithish Kannen, Been Kim, Rich Galt, Kartikeya Badola

Venue

arXiv