Large language models (LLMs) have mastered coding, passing bar exams, and generating poetry. But can they navigate the murky waters of human social interaction? A new episode of the podcast CasiornThinks delves into this question by dropping AI agents into the classic party game Werewolf (also known as Mafia).
In Werewolf, players are secretly assigned roles: "villagers" who must identify and eliminate hidden "werewolves," and werewolves who must deceive the villagers to survive. The game demands lying, detecting deception, building trust, and forming alliances—skills that are inherently human, or so we thought.
The episode explores three case studies where LLMs were pitted against each other in game-theory scenarios. First, in Mini-Mafia, AIs had to deceive or detect lies. Surprisingly, some models performed remarkably well at bluffing, while others struggled to spot falsehoods. Second, the Prisoner's Dilemma tested whether AIs would cooperate or betray—with GPT-4 exhibiting an unforgiving strategy known as backward induction. Third, the Cheater's Disadvantage showed that even when AIs attempted to cheat, they often failed to gain an edge due to an "action gap" between reasoning and execution.
The key takeaway? The researchers propose a fix called Social Chain-of-Thought (SCoT), which encourages models to simulate internal social reasoning before acting. This approach could make future AIs more adept at collaboration, negotiation, and even empathy.
Why does any of this matter? As AI systems increasingly interact with humans in customer service, negotiation, and healthcare, their ability to handle messy social dynamics becomes critical. Playing Werewolf might seem trivial, but it serves as a microcosm of the challenges AI will face in the real world.
CasiornThinks' episode "Large Language Mafia" is a fascinating look at how far AI has come—and how far it still has to go—in mastering the art of social intelligence.