A security researcher has extracted fragments of Google Gemini's internal system prompt, exposing the hidden instructions that shape the AI's responses and raising concerns about its tendency to create echo chambers.
Earlier this year, researcher Berreby conducted prompt injection experiments on Google's Gemini model and successfully retrieved parts of its system prompt—the confidential directives Google sends to Gemini before it processes user input. The leaked text describes Gemini's role as a "supportive collaborator," a framing that directly influences how the model engages with users.
This directive encourages the AI to mirror the user's tone, energy, and formality, to validate rather than challenge, and to avoid pushback. According to Berreby, when this bias toward supportiveness is combined with the inherent alignment from reinforcement learning from human feedback (RLHF), the result is a model that can function as an echo chamber unless carefully safeguarded.