Running an Uncensored Gemma 4 Model Locally: A Guide and a GPU Crash

June 12, 2026 · 5:42 AM

A new AI model called Heretic 12B, based on Google's Gemma 4, is now available for local execution via Ollama. The model is a fine-tuned, uncensored version designed to run without guardrails, entirely on the user's machine—no API calls, no cloud dependency, and no data leaving the system.

In a demonstration video, a user walks through the setup process:

Downloading the GGUF file from HuggingFace.
Building the Modelfile with the correct Gemma 4 chat template.
Fixing a validation error during ollama create using a manual manifest workaround.
Encountering a GPU crash, then falling back to a CPU workaround.
Testing 10 questions that mainstream AI models typically refuse to answer (responses were blurred for YouTube).

The release highlights both the potential for local, unrestricted AI and the hardware challenges that may arise when pushing models to the limit.

Running an Uncensored Gemma 4 Model Locally: A Guide and a GPU Crash

We Care About Your Privacy

How and why we process data