The Entity-Deduction Arena: A Playground for Probing the Conversational Reasoning and Planning Capabilities of LLMs

AuthorsYizhe Zhang, Jiarui Lu, Navdeep Jaitly

LLMs are currently effective at answering questions that are clearly asked. However, they may encounter difficulties when faced with ambiguous queries. This emphasizes the need for the development of intelligent agents capable of asking clarification questions, which require complex understanding, state tracking, and planning in multi-turn conversations. In this paper, we study a surrogate problem by employing entity-deducing games as evaluation metrics to assess the conversational planning capabilities of different models. We systematically evaluate various LLMs and discover significant performance discrepancies in conversational planning capabilities. Drawing inspiration from Reinforcement Learning from Human Feedback (RLHF), we utilize Reinforcement Learning from Self-Playing (RLSP) on vanilla Vicuna models to enhance planning capacity through self-play in the game. This research offers insights into potential advancements in achieving more intelligent and autonomous agents.

The Entity-Deduction Arena: A Playground for Probing the Conversational Reasoning and Planning Capabilities of LLMs

Related readings and updates.

COMPASS: Benchmarking Constrained Optimization in LLM Agents

Towards Learning Multi-Agent Negotiations via Self-Play

Discover opportunities in Machine Learning.