Abstract
User prompts are often underspecified, which leads to unsatisfactory results of generative AI models. We study this underspecification problem in the context of text-to-image generation, and propose a probabilistic framework that can be used to construct customized T2I agents, which have the capability of interacting with the user by expressing uncertainty about their intent, seeking clarification, and iteratively aligning with the desired image that the user would like to see. We present blue prints for agents constructed under this framework, showcasing the dynamic user interactions it enables and the explainability of its state of mind. We demonstrate the effectiveness of the interface of our agent compared with traditional dialogue systems through a series of user studies. Empirically, we propose an evaluation pipeline to verify the performance of our proposed agent and user satisfaction conducted with human study.
Authors
Zi Wang, Meera Hahn, Wenjun Zeng, Nithish Kannen, Been Kim, Rich Galt, Kartikeya Badola
Venue
arXiv