A new open-source toolkit called Qwen-Scope is offering an unprecedented look into the "thoughts" of large language models. Developed for the Qwen3 and Qwen3.5 model families, the suite uses sparse autoencoders (SAEs) to decompose complex neural activations into a set of interpretable features—effectively creating a vocabulary of the concepts the model uses to reason.
Rather than treating the AI as a black box, Qwen-Scope allows researchers and developers to trace which internal directions correspond to languages, styles, or specific behaviors. This makes it possible to steer model outputs without modifying weights, analyze benchmark redundancy, classify toxic data, and refine post-training through supervised learning and reinforcement.
The project addresses a longstanding challenge in AI transparency: understanding why a model produces a particular response. By identifying the internal triggers for issues like repetition or language mixing, developers can now correct them more precisely.
Qwen-Scope is available through the Qwen research group, and the paper describing the approach is linked in the podcast description.