糖心TV

Skip to main content Skip to navigation

Calendar

Show all calendar items

DR@W/EBER Seminar: Rafael Jimenez-Duran (Stanford)

- Export as iCalendar
Location: Economics S2.79

Abstract: Large Language Models (LLMs) are said to exhibit sycophancy, a tendency to agree with users irrespective of the truth. We propose an economic framework that defines sycophancy as a preference for user approval, and develop an outcome-based sufficient statistic to detect it. Our identification strategy exploits a key architectural feature of LLMs: they are stateless, and "memory" of past interactions is constructed by summarizing conversations into short profiles appended to each new prompt. Because this memory can be controlled, toggled, and varied experimentally, we can isolate the causal path from user feedback to sycophantic behavior. We instrument the LLM's perceived cost of disagreement with a one-word variation in simulated prior user feedback. In an experiment with leading LLMs across three domains (moral judgments, factual questions, and common misconceptions) we find evidence that LLMs are sycophantic. Sycophancy is larger in subjective domains where baseline accuracy is lower and is heterogeneous across models.

Tags: Draw Forum

Show all calendar items

Let us know you agree to cookies