OpenAI has quietly released Privacy Filter, an open-source model designed to detect and redact personally identifiable information (PII) in text. Published on Hugging Face under the Apache 2.0 license, the model is compact enough to run in a web browser or on a laptop, making it suitable for edge deployment.
What It Does
Privacy Filter is a named entity recognition (NER) model specialized for privacy. It identifies eight categories of sensitive data: account numbers, private addresses, emails, person names, phone numbers, URLs, dates, and secrets (including credentials and high-entropy strings). The model card notes known limitations, such as missing novel credential formats or secrets split across surrounding syntax.
The primary use case is for development teams needing to sanitize datasets, scrub logs, or preprocess user-generated content before training or storage. Because it runs on-premises, organizations can avoid routing sensitive data to third-party APIs.
Architecture and Efficiency
The model boasts 1.5 billion total parameters but activates only 50 million at inference time—a 30x reduction achieved through a sparse mixture-of-experts (MoE) feed-forward design. It comprises 8 pre-norm transformer blocks with a residual stream width of 640. Attention uses grouped-query attention (GQA) with rotary positional embeddings (RoPE) and 14 query heads over 2 KV heads, reducing memory footprint. The feed-forward layers employ 128 experts with top-4 routing, activating only 4 experts per token.
Three-Phase Training Pipeline
The model underwent a unique three-phase training process:
- Autoregressive pretraining as a standard next-token prediction language model (GPT-style decoder).
- Architectural conversion: The language-model head was replaced with a token-classification head, and causal attention was switched to bidirectional banded attention with a band size of 128, providing each token a context window of 257 tokens.
- Post-training on the privacy label taxonomy.
This approach combines the benefits of autoregressive pretraining with bidirectional fine-tuning for token classification, resulting in a model that is both efficient and accurate for PII detection.