Abstract
Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding the accuracy and appropriateness of their generated content. A contemporary methodology, self-correction, has been proposed as a remedy to these issues. Building upon this premise, this paper critically examines the role and efficacy of self-correction within LLMs, shedding light on its true potential and limitations. Central to our investigation is the notion of intrinsic self-correction, whereby an LLM attempts to correct its initial responses based solely on its inherent capabilities, without any external feedback. Contrary to popular belief, our research indicates that LLMs might find this task challenging, especially in the context of reasoning, and at times, their performance might even degrade post self-correction. Given our findings, we encourage the community to approach this concept with both prudence and critical consideration.
Authors
jeffhj , Xinyun Chen, Swaroop Mishra, Steven Zheng, Adams Yu, Xinying Song, Denny Zhou
Venue
arXiv