diff --git a/zen-guard-gen_whitepaper.pdf b/zen-guard-gen_whitepaper.pdf
index 55e426c..e2c8057 100644
Binary files a/zen-guard-gen_whitepaper.pdf and b/zen-guard-gen_whitepaper.pdf differ
diff --git a/zen-guard-gen_whitepaper.tex b/zen-guard-gen_whitepaper.tex
index 1b2f762..9ba0f61 100644
--- a/zen-guard-gen_whitepaper.tex
+++ b/zen-guard-gen_whitepaper.tex
@@ -14,7 +14,7 @@
 \hypersetup{colorlinks=true,linkcolor=zenblue,urlcolor=zenblue,citecolor=zenblue}
 
 \title{\textbf{Zen-Guard-Gen: A Generative Safety Classifier\\
-Fine-Tuned from Qwen2.5-7B}\\[0.5em]
+Built on Qwen3Guard-Gen-8B}\\[0.5em]
 \large Technical Whitepaper v2025.05}
 \author{Zach Kelling \\ Zen LM Research Team\\
 \texttt{research@zenlm.org}\\
@@ -25,20 +25,24 @@
 \maketitle
 
 \begin{abstract}
-Zen-Guard-Gen is a \emph{generative} safety classifier built by fine-tuning Alibaba's
-\textbf{Qwen2.5-7B} base model~\cite{qwen25}. It is \emph{not} a from-scratch model and uses
-no bespoke ``Zen MoDE'' architecture: the base is the openly released, Apache-2.0 licensed
-\texttt{Qwen/Qwen2.5-7B}, a dense decoder-only transformer (\texttt{Qwen2ForCausalLM};
-7.61B parameters, 28 layers, hidden size 3584, GQA with 28 query / 4 key--value heads, vocab
-152{,}064, up to a 128K context, 29+ languages). On top of this base we add a supervised
-safety-instruction fine-tune so that, given a content item and a policy, the model emits a
-structured verdict plus a natural-language explanation, a policy reference, and (for
-borderline or unsafe content) a remediation suggestion --- making decisions auditable and
-contestable rather than opaque. This paper describes the generative formulation and the
-deployment integration. We do not report safety benchmark numbers: the upstream Qwen2.5-7B is
-a general-purpose LLM with no published safety-classifier metrics, and we have not run a
-rigorous safety evaluation of our fine-tune; the inflated benchmark figures (e.g. ``ToxiGen
-99.1\%'') in earlier revisions were fabricated and have been removed.
+Zen-Guard-Gen is a \emph{generative} safety classifier built on Alibaba's
+\textbf{Qwen3Guard-Gen-8B}~\cite{qwen3guard}, a purpose-built multilingual guardrail model. It
+is \emph{not} a from-scratch model and uses no bespoke ``Zen MoDE'' architecture: the base is
+the openly released, Apache-2.0 licensed \texttt{Qwen/Qwen3Guard-Gen-8B}, itself a safety
+fine-tune of Qwen3-8B (a dense decoder-only transformer, \texttt{Qwen3ForCausalLM};
+$\approx$8.2B parameters, 36 layers, hidden size 4096, GQA with 32 query / 8 key--value heads,
+head dimension 128, vocab 151{,}936). Unlike a general-purpose LLM, Qwen3Guard-Gen is already a
+\emph{generative} safety classifier: it frames moderation as an instruction-following task,
+ingests the full user prompt and model response, and emits a verdict over three severity tiers
+--- \textbf{safe}, \textbf{controversial}, and \textbf{unsafe} --- across 119 languages and
+dialects~\cite{qwen3guard}. On top of this base Zen adds packaging that wraps the upstream
+verdict with a natural-language explanation, a policy reference, and (for controversial or
+unsafe content) a remediation suggestion --- making decisions auditable and contestable rather
+than opaque. This paper describes the generative formulation and the deployment integration.
+Where we cite quantitative results we attribute them to the upstream Qwen3Guard technical
+report~\cite{qwen3guard}; we have not run an independent safety evaluation of the Zen
+packaging, and the inflated, fabricated figures (e.g. ``ToxiGen 99.1\%'') in earlier revisions
+have been removed.
 \end{abstract}
 
 \tableofcontents
@@ -57,7 +61,7 @@ \section{Introduction}
 Zen-Guard-Gen addresses all three limitations by framing safety classification as a generation task. Given a content item, Zen-Guard-Gen produces:
 
 \begin{enumerate}
-  \item A structured safety verdict (safe / unsafe / borderline).
+  \item A structured safety verdict over Qwen3Guard's three severity tiers (safe / controversial / unsafe).
   \item A primary policy category (hate speech, harassment, CSAM, violence, misinformation, etc.).
   \item A natural language explanation of the reasoning underlying the verdict.
   \item A reference to the applicable policy section.
@@ -69,18 +73,21 @@ \subsection{Model Overview}
 \begin{table}[H]
 \centering
 \caption{Zen-Guard-Gen specification. Architecture and base-model facts are those of the
-upstream Qwen2.5-7B~\cite{qwen25}; Zen-Guard-Gen is a safety-instruction fine-tune of it.}
+upstream Qwen3Guard-Gen-8B / Qwen3-8B~\cite{qwen3guard}; Zen-Guard-Gen wraps it with
+explanation/policy/remediation packaging.}
 \begin{tabular}{ll}
 \toprule
 \textbf{Parameter} & \textbf{Value} \\
 \midrule
-Base model & Qwen2.5-7B (Alibaba), Apache-2.0 \\
-Architecture & Dense decoder-only transformer (\texttt{Qwen2ForCausalLM}) \\
-Total Parameters & 7.61B \\
-Layers / hidden size & 28 / 3584 \\
-Attention heads (Q / KV, GQA) & 28 / 4 \\
-Vocabulary & 152{,}064 \\
-Context length & up to 131{,}072 (128K) \\
+Base model & Qwen3Guard-Gen-8B (Alibaba), Apache-2.0 \\
+Underlying base & Qwen3-8B, dense decoder-only (\texttt{Qwen3ForCausalLM}) \\
+Total Parameters & $\approx$8.2B \\
+Layers / hidden size & 36 / 4096 \\
+Attention heads (Q / KV, GQA) & 32 / 8 \\
+Head dimension & 128 \\
+Vocabulary & 151{,}936 \\
+Severity tiers & safe / controversial / unsafe \\
+Languages & 119 languages and dialects \\
 Output & generative: verdict + explanation + policy ref + remediation \\
 Version & v2025.05 \\
 \bottomrule
@@ -88,8 +95,9 @@ \subsection{Model Overview}
 \end{table}
 
 Note: ``Image captions'' and ``transcribed audio'' are upstream text inputs, not native
-multimodal capabilities; Qwen2.5-7B is a text model. Safety benchmark accuracies are
-deliberately omitted (see abstract).
+multimodal capabilities; Qwen3Guard-Gen-8B is a text model. Quantitative safety results, where
+reported, are attributed to the upstream Qwen3Guard technical report~\cite{qwen3guard}; Zen has
+not run an independent evaluation of its packaging (see abstract).
 
 \section{Safety Taxonomy}
 
@@ -122,17 +130,22 @@ \subsection{Primary Categories}
 \end{tabular}
 \end{table}
 
-\subsection{Severity Levels}
+\subsection{Severity Tiers}
 
-Each category is scored on a severity scale aligned with CVSS-style impact ratings:
+The primary verdict follows the upstream Qwen3Guard three-tier severity
+scheme~\cite{qwen3guard}:
 
 \begin{itemize}
-  \item \textbf{Level 1 (Borderline)}: Content that may violate policy depending on context; requires human review.
-  \item \textbf{Level 2 (Moderate)}: Clear policy violation warranting removal and possible account warning.
-  \item \textbf{Level 3 (Severe)}: Serious violation warranting immediate removal and escalation.
-  \item \textbf{Level 4 (Critical)}: Content requiring immediate removal and law enforcement referral (CSAM, credible threats).
+  \item \textbf{Safe}: Content generally considered safe across most scenarios.
+  \item \textbf{Controversial}: Content whose harmfulness is context-dependent or subject to disagreement across applications; the natural place to route human review.
+  \item \textbf{Unsafe}: Content generally considered harmful across most scenarios.
 \end{itemize}
 
+For operators that require finer-grained enforcement, the Zen packaging optionally maps the
+\textbf{unsafe} tier onto an escalation ladder --- e.g. removal, account warning, escalation,
+or law-enforcement referral for CSAM and credible threats --- but this enforcement mapping is a
+deployment policy layered on top of the upstream verdict, not an additional model output.
+
 \section{Architecture}
 
 \subsection{Generative Safety Formulation}
@@ -144,7 +157,10 @@ \subsection{Generative Safety Formulation}
   p(y | x) &= \prod_{t=1}^{|y|} p(y_t | y_{<t}, x)
 \end{align}
 
-The structured output grammar is enforced via constrained decoding: the verdict, category, and severity fields use restricted vocabulary sampling from predefined value sets, while the explanation and remediation fields use unconstrained generation within a length limit.
+The structured output grammar is enforced via constrained decoding: the verdict field is
+restricted to the upstream safe / controversial / unsafe tiers~\cite{qwen3guard}, the category
+and severity fields sample from predefined value sets, while the explanation and remediation
+fields use unconstrained generation within a length limit.
 
 This hybrid approach ensures structural consistency (no malformed outputs) while preserving the expressive flexibility of natural language for the reasoning components.
 
@@ -162,32 +178,36 @@ \subsection{Policy-Conditioned Generation}
 
 \subsection{Calibrated Uncertainty}
 
-For borderline content, Zen-Guard-Gen produces calibrated uncertainty estimates alongside verdicts. A temperature-scaled confidence score $c \in [0,1]$ accompanies each verdict:
+For controversial content, Zen-Guard-Gen produces calibrated uncertainty estimates alongside verdicts. A temperature-scaled confidence score $c \in [0,1]$ accompanies each verdict:
 
 \begin{equation}
   c = \sigma\left(\frac{z_{\text{verdict}}}{T_{\text{cal}}}\right)
 \end{equation}
 
-where $z_{\text{verdict}}$ is the logit for the predicted verdict and $T_{\text{cal}}$ is a calibration temperature estimated on a held-out set. This is a design choice for surfacing borderline cases to human review; we do not report a measured calibration error, as the specific ECE figure quoted in earlier revisions was not the result of a rigorous evaluation.
+where $z_{\text{verdict}}$ is the logit for the predicted verdict and $T_{\text{cal}}$ is a calibration temperature estimated on a held-out set. This is a design choice for surfacing controversial cases to human review; we do not report a measured calibration error, as the specific ECE figure quoted in earlier revisions was not the result of a rigorous evaluation.
 
 \section{Training Methodology}
 
 \subsection{Approach}
 
-Starting from the Qwen2.5-7B base model~\cite{qwen25}, Zen-Guard-Gen is produced by supervised
-instruction fine-tuning on (content, policy) $\rightarrow$ structured-verdict examples, so that
-the model learns to emit the verdict/category/severity fields plus a natural-language
-explanation, a policy reference, and a remediation suggestion. This section describes the
-\emph{intended} recipe; we do not publish dataset sizes or composition, because the specific
-figures in earlier revisions (a 300M-item corpus with per-source percentages, a ``500K seed /
-50K preference pair'' explanation-tuning split, and a ``40 researchers over 6 weeks'' red-team)
-were fabricated and did not describe a real training run.
-
-The honest, defensible statements are: (i) the base weights and their license, training, and
-capabilities are Alibaba's Qwen2.5-7B~\cite{qwen25}; (ii) any safety behavior is added by Zen
-via fine-tuning on safety-annotated data; and (iii) we make no quantitative claim about the
-fine-tune's accuracy without a rigorous, reproducible evaluation, which this document does not
-contain.
+The base, Qwen3Guard-Gen-8B, is \emph{already} a safety classifier: the Qwen team produced it
+by supervised instruction fine-tuning of Qwen3-8B on over 1.19M human-annotated and
+synthetically generated safety samples, framing classification as an
+instruction-following task over the safe / controversial / unsafe tiers~\cite{qwen3guard}. The
+upstream safety capability therefore comes from Alibaba's training run, not from Zen. On top of
+it, the Zen packaging maps the upstream verdict into the (content, policy)
+$\rightarrow$ structured-verdict schema --- verdict/category/severity plus a natural-language
+explanation, a policy reference, and a remediation suggestion. We do not publish a Zen
+training corpus, because the specific figures in earlier revisions (a 300M-item corpus with
+per-source percentages, a ``500K seed / 50K preference pair'' explanation-tuning split, and a
+``40 researchers over 6 weeks'' red-team) were fabricated and did not describe a real training
+run.
+
+The honest, defensible statements are: (i) the base weights, their Apache-2.0 license,
+training data, and safety capability are Alibaba's Qwen3Guard-Gen-8B~\cite{qwen3guard};
+(ii) Zen contributes the explanation/policy/remediation packaging around the upstream verdict;
+and (iii) any quantitative result we cite is attributed to the upstream Qwen3Guard technical
+report --- Zen has not run an independent, reproducible evaluation of its packaging.
 
 \subsection{Known Limitations}
 
@@ -199,19 +219,24 @@ \subsection{Known Limitations}
 
 \section{Evaluation}
 
-We intentionally report no benchmark numbers. The upstream Qwen2.5-7B is a general-purpose
-LLM with no published safety-classifier metrics~\cite{qwen25}, and we have not conducted a
-rigorous, reproducible safety evaluation of the Zen fine-tune. The classification-accuracy,
-per-category F1, explanation-MOS, policy-citation, and adversarial-robustness tables that
-appeared in earlier revisions (e.g. ToxiGen 99.1\%, HatEval 97.4\%, composite MOS 4.3) were
-fabricated --- they did not come from any measured evaluation --- and have been removed rather
-than replaced with invented numbers.
-
-A claim worth keeping qualitatively, without a number attached: a \emph{generative} safety
-classifier that must produce an explanation can be more transparent and auditable than an
-opaque binary classifier, because the rationale is inspectable by a human reviewer. Whether it
-is also \emph{more accurate} is an empirical question we do not answer here. Adopters should
-evaluate on their own labeled data; see Section~\ref{sec:limitations-eval}.
+We report no \emph{Zen-measured} benchmark numbers. The numbers we do quote are the upstream
+Qwen team's, attributed to the Qwen3Guard technical report~\cite{qwen3guard}: on English
+\emph{prompt} classification the 8B generative model attains an average F1 of \textbf{90.0}
+across ToxicChat, OpenAI Moderation, Aegis, Aegis 2.0, SimpleSafetyTests, HarmBench, and
+WildGuardTest, and on English \emph{response} classification an average F1 of \textbf{83.9}
+across HarmBench, SafeRLHF, BeaverTails, XSTest, Aegis 2.0, WildGuardTest, and a reasoning
+(``Think'') benchmark. These describe the upstream base, not the Zen packaging. The
+classification-accuracy, explanation-MOS, policy-citation, and adversarial-robustness tables
+that appeared in earlier revisions (e.g. ToxiGen 99.1\%, HatEval 97.4\%, composite MOS 4.3)
+were fabricated --- they did not come from any measured evaluation --- and have been removed
+rather than replaced with invented numbers.
+
+A claim worth keeping qualitatively, without a Zen-measured number attached: a
+\emph{generative} safety classifier that must produce an explanation can be more transparent
+and auditable than an opaque binary classifier, because the rationale is inspectable by a human
+reviewer. Whether the Zen packaging preserves the upstream's accuracy on a given operator's
+distribution is an empirical question we do not answer here. Adopters should evaluate on their
+own labeled data; see Section~\ref{sec:limitations-eval}.
 
 \subsection{Recommended Evaluation Before Deployment}
 \label{sec:limitations-eval}
@@ -235,8 +260,8 @@ \subsection{Pipeline Integration}
 
 \subsection{Inference Cost}
 
-Because Zen-Guard-Gen is a 7.61B-parameter generative model, its serving cost is that of an
-8B-class LLM and is dominated by the number of output tokens: a verdict-only response is
+Because Zen-Guard-Gen is an $\approx$8.2B-parameter generative model, its serving cost is that
+of an 8B-class LLM and is dominated by the number of output tokens: a verdict-only response is
 cheap, while a full explanation plus remediation generates many more tokens and is
 correspondingly slower. Concrete throughput and latency depend entirely on the operator's
 hardware, batching, and quantization, so we do not publish the specific FP8/H100 figures that
@@ -244,18 +269,18 @@ \subsection{Inference Cost}
 
 \section{Related Work}
 
-Content safety classification has been addressed through fine-tuned BERT-class models \cite{perspective}, LLM-based guardrails such as Llama Guard \cite{llmguard}, and rule-based systems \cite{cld}. Explanation generation for classification decisions has been studied under the framing of rationale extraction \cite{rationale} and chain-of-thought safety reasoning \cite{cot_safety}. Zen-Guard-Gen sits in the LLM-guardrail line: it is a Qwen2.5-7B base~\cite{qwen25} fine-tuned to produce a verdict together with an explanation, a policy reference, and a remediation suggestion.
+Content safety classification has been addressed through fine-tuned BERT-class models \cite{perspective}, LLM-based guardrails such as Llama Guard \cite{llmguard}, and rule-based systems \cite{cld}. Explanation generation for classification decisions has been studied under the framing of rationale extraction \cite{rationale} and chain-of-thought safety reasoning \cite{cot_safety}. Qwen3Guard~\cite{qwen3guard} is itself a recent entry in the LLM-guardrail line, offering generative and streaming variants with three-tier severity over 119 languages. Zen-Guard-Gen sits on top of the Qwen3Guard-Gen-8B base~\cite{qwen3guard}, packaging its verdict together with an explanation, a policy reference, and a remediation suggestion.
 
 \section{Conclusion}
 
-Zen-Guard-Gen is a generative safety classifier built by fine-tuning Alibaba's Apache-2.0 Qwen2.5-7B base model~\cite{qwen25}; it is not a from-scratch model and uses no ``Zen MoDE'' architecture. Its premise is that a safety classifier which must \emph{explain} its verdict (with a policy reference and a remediation suggestion) yields more auditable, contestable decisions than an opaque binary label. We deliberately make no benchmark claims: the upstream base has no published safety metrics, and we have not run a rigorous evaluation of the fine-tune, so the inflated ToxiGen/HatEval/MOS/robustness figures of earlier revisions have been removed as fabrications. Operators should validate the model on their own content distribution before relying on it.
+Zen-Guard-Gen is a generative safety classifier built on Alibaba's Apache-2.0 Qwen3Guard-Gen-8B~\cite{qwen3guard} (itself a safety fine-tune of Qwen3-8B); it is not a from-scratch model and uses no ``Zen MoDE'' architecture --- it is a redistribution of a purpose-built guardrail model with explanation/policy/remediation packaging. Its premise is that a safety classifier which must \emph{explain} its verdict (with a policy reference and a remediation suggestion) yields more auditable, contestable decisions than an opaque binary label. Any quantitative result we cite is the upstream Qwen team's, attributed to the Qwen3Guard report; the inflated ToxiGen/HatEval/MOS/robustness figures of earlier revisions have been removed as fabrications, and Zen has not run an independent evaluation of its packaging. Operators should validate the model on their own content distribution before relying on it.
 
 \section*{Attribution}
 
-The base weights, training, license, and capabilities are Alibaba's Qwen2.5-7B (\texttt{Qwen/Qwen2.5-7B}, Apache-2.0); Zen contributes a safety-instruction fine-tune and packaging. We thank the Qwen team for releasing the base model openly.
+The base weights, training data, license, and safety capability are Alibaba's Qwen3Guard-Gen-8B (\texttt{Qwen/Qwen3Guard-Gen-8B}, Apache-2.0), itself a safety fine-tune of Qwen3-8B; Zen contributes the explanation/policy/remediation packaging. We thank the Qwen team for releasing the base model openly under a permissive license.
 
 \begin{thebibliography}{9}
-\bibitem{qwen25} Qwen Team, Alibaba (2024). Qwen2.5 Technical Report. arXiv:2412.15115. Base model: \texttt{Qwen/Qwen2.5-7B} (Apache-2.0).
+\bibitem{qwen3guard} Qwen Team, Alibaba (2025). Qwen3Guard Technical Report. arXiv:2510.14276. Base model: \texttt{Qwen/Qwen3Guard-Gen-8B} (Apache-2.0), a safety fine-tune of Qwen3-8B.
 \bibitem{perspective} Lees, A. et al. (2022). A New Generation of Perspective API: Efficient Multilingual Character-level Transformers. arXiv:2202.11176.
 \bibitem{llmguard} Inan, H. et al. (2023). Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations. arXiv:2312.06674.
 \bibitem{cld} Waseem, Z. et al. (2016). Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. NAACL 2016.
diff --git a/zen-guard-stream_whitepaper.pdf b/zen-guard-stream_whitepaper.pdf
index c03217c..7135a99 100644
Binary files a/zen-guard-stream_whitepaper.pdf and b/zen-guard-stream_whitepaper.pdf differ
diff --git a/zen-guard-stream_whitepaper.tex b/zen-guard-stream_whitepaper.tex
index d5d841a..3d0f2f3 100644
--- a/zen-guard-stream_whitepaper.tex
+++ b/zen-guard-stream_whitepaper.tex
@@ -14,7 +14,7 @@
 \hypersetup{colorlinks=true,linkcolor=zenblue,urlcolor=zenblue,citecolor=zenblue}
 
 \title{\textbf{Zen-Guard-Stream: A Streaming Safety Classifier\\
-Fine-Tuned from Qwen2.5-3B}\\[0.5em]
+Built on Qwen3Guard-Stream-0.6B}\\[0.5em]
 \large Technical Whitepaper v2025.05}
 \author{Zach Kelling \\ Zen LM Research Team\\
 \texttt{research@zenlm.org}\\
@@ -25,21 +25,26 @@
 \maketitle
 
 \begin{abstract}
-Zen-Guard-Stream is a streaming safety classifier built by fine-tuning Alibaba's
-\textbf{Qwen2.5-3B} base model~\cite{qwen25}. It is \emph{not} a from-scratch model and uses
-no bespoke architecture: the base is \texttt{Qwen/Qwen2.5-3B}, a dense decoder-only
-transformer (\texttt{Qwen2ForCausalLM}; 3.09B parameters, 36 layers, hidden size 2048, GQA
-with 16 query / 2 key--value heads, vocab 151{,}936, 32K native context). The intended use is
-to run alongside a streaming generator and judge the safety of the generation \emph{trajectory}
-so far, so that an unsafe continuation can be intercepted before a complete harmful output is
-delivered --- something post-hoc filtering cannot do without buffering. \textbf{License
-caveat:} unlike most Qwen2.5 sizes, Qwen2.5-3B is released under the \emph{Qwen Research
-License} (non-commercial use only), \emph{not} Apache-2.0; any commercial deployment of a
-Qwen2.5-3B derivative requires a separate license from Alibaba. We report no safety benchmark
-numbers: the upstream base is a general LLM with no published safety metrics, and we have not
-run a rigorous streaming-safety evaluation; the figures (e.g. ``ToxiGen 98.7\%,'' ``1.7\,ms
-overhead,'' ``0.8\% FPR'') and the ``1.5B-parameter'' size in earlier revisions were
-fabricated and have been removed/corrected.
+Zen-Guard-Stream is a streaming safety classifier built on Alibaba's
+\textbf{Qwen3Guard-Stream-0.6B}~\cite{qwen3guard}, a purpose-built token-level guardrail model.
+It is \emph{not} a from-scratch model and uses no bespoke architecture: the base is
+\texttt{Qwen/Qwen3Guard-Stream-0.6B}, a safety variant of Qwen3-0.6B (a dense decoder-only
+transformer, \texttt{Qwen3ForCausalLM}; $\approx$0.6B parameters, 28 layers, hidden size 1024,
+GQA with 16 query / 8 key--value heads, head dimension 128, vocab 151{,}936) carrying two
+lightweight classification heads attached to the final transformer layer. Unlike a
+general-purpose LLM, Qwen3Guard-Stream is engineered for exactly this job: it consumes a
+response token by token \emph{as it is generated} and emits a safety verdict at each step over
+three severity tiers --- \textbf{safe}, \textbf{controversial}, and \textbf{unsafe} --- across
+119 languages and dialects~\cite{qwen3guard}, so that an unsafe continuation can be intercepted
+before a complete harmful output is delivered, which post-hoc filtering cannot do without
+buffering. \textbf{License note (the reason for this rebase):} the previous base, Qwen2.5-3B,
+was released under the non-commercial \emph{Qwen Research License}, which barred commercial
+deployment of a derivative without a separate license from Alibaba. Rebasing onto
+Qwen3Guard-Stream-0.6B --- which Alibaba ships under \textbf{Apache-2.0} --- resolves that
+license issue: the streaming guard is now fully open for commercial use. We report no
+\emph{Zen-measured} benchmark numbers; the fabricated figures (e.g. ``ToxiGen 98.7\%,''
+``1.7\,ms overhead,'' ``0.8\% FPR'') and the ``1.5B-parameter'' size in earlier revisions have
+been removed/corrected.
 \end{abstract}
 
 \tableofcontents
@@ -64,59 +69,81 @@ \subsection{Model Overview}
 \begin{table}[H]
 \centering
 \caption{Zen-Guard-Stream specification. Base-model facts are those of the upstream
-Qwen2.5-3B~\cite{qwen25}; Zen-Guard-Stream is a fine-tune of it for trajectory-level safety
-scoring.}
+Qwen3Guard-Stream-0.6B / Qwen3-0.6B~\cite{qwen3guard}; Zen-Guard-Stream packages it for in-loop
+streaming intervention.}
 \begin{tabular}{ll}
 \toprule
 \textbf{Parameter} & \textbf{Value} \\
 \midrule
-Base model & Qwen2.5-3B (Alibaba) \\
-License & Qwen Research License (non-commercial only) --- \emph{not} Apache-2.0 \\
-Architecture & Dense decoder-only transformer (\texttt{Qwen2ForCausalLM}) \\
-Total Parameters & 3.09B \\
-Layers / hidden size & 36 / 2048 \\
-Attention heads (Q / KV, GQA) & 16 / 2 \\
+Base model & Qwen3Guard-Stream-0.6B (Alibaba) \\
+License & Apache-2.0 (rebase from non-commercial Qwen2.5-3B) \\
+Underlying base & Qwen3-0.6B, dense decoder-only (\texttt{Qwen3ForCausalLM}) \\
+Total Parameters & $\approx$0.6B \\
+Layers / hidden size & 28 / 1024 \\
+Attention heads (Q / KV, GQA) & 16 / 8 \\
+Head dimension & 128 \\
 Vocabulary & 151{,}936 \\
-Native context length & 32{,}768 (32K) \\
-Integration mode & Trajectory scoring alongside a streaming generator \\
+Classification heads & two token-level heads on the final transformer layer \\
+Severity tiers & safe / controversial / unsafe \\
+Languages & 119 languages and dialects \\
+Integration mode & Per-token streaming moderation alongside a generator \\
 Version & v2025.05 \\
 \bottomrule
 \end{tabular}
 \end{table}
 
-Safety benchmark accuracies, false-positive rates, and per-step overheads are deliberately
-omitted (see abstract); the values in earlier revisions were not measured.
+Safety benchmark accuracies, false-positive rates, and per-step overheads measured by Zen are
+deliberately omitted (see abstract); the values in earlier revisions were not measured. The
+rebase from the non-commercial Qwen2.5-3B base onto the Apache-2.0 Qwen3Guard-Stream-0.6B
+removes the prior commercial-use restriction.
 
 \section{Architecture}
 
 \subsection{Streaming Safety Formulation}
+\label{sec:formulation}
 
-Zen-Guard-Stream models streaming safety as a sequential decision problem. At each generation step $t$, given the prompt $x$ and the generated sequence so far $y_{<t}$, the model predicts the safety of continuing generation along the current trajectory:
+Zen-Guard-Stream models streaming safety as a sequential decision problem. The upstream
+Qwen3Guard-Stream head emits a three-tier verdict $v_t \in \{\text{safe},
+\text{controversial}, \text{unsafe}\}$ at each generation step~\cite{qwen3guard}. For
+intervention we reduce this to a binary control signal: at each step $t$, given the prompt $x$
+and the generated sequence so far $y_{<t}$,
 
 \begin{equation}
-  s_t = f_\theta(x, y_{<t}) \in \{0, 1\}
+  s_t = f_\theta(x, y_{<t}) \in \{0, 1\},
 \end{equation}
 
-where $s_t = 1$ indicates a safe trajectory and $s_t = 0$ triggers an intervention. The model does not classify individual tokens in isolation; it classifies the generation \emph{trajectory} up to and including position $t$, which enables detection of multi-token harmful patterns.
+where $s_t = 1$ (the \emph{safe}, and optionally \emph{controversial}, tiers) indicates a safe
+trajectory and $s_t = 0$ triggers an intervention. Because the upstream model attends over the
+full prompt-plus-response context at each step, the verdict reflects multi-token harmful
+patterns rather than isolated tokens.
 
-\subsection{Causal Classifier Architecture}
+\subsection{Token-Level Classifier Architecture}
 
-Zen-Guard-Stream is the Qwen2.5-3B causal transformer~\cite{qwen25} with a safety
-classification head, run alongside the main generator:
+Zen-Guard-Stream is the Qwen3Guard-Stream-0.6B token-level guardrail~\cite{qwen3guard}, run
+alongside the main generator:
 
 \begin{itemize}
-  \item 3.09B parameters in a 36-layer causal transformer (hidden size 2048).
-  \item Grouped-query attention (16 query / 2 key--value heads), vocabulary 151{,}936.
-  \item A safety head on the final-layer hidden state: $\hat{s}_t = \sigma(w \cdot h_t^{(36)})$,
-        i.e. a binary safe/unsafe probability for the trajectory through step $t$.
-  \item Intended to run on a separate CUDA stream from the main generator so the two overlap.
+  \item $\approx$0.6B parameters in a 28-layer causal transformer (hidden size 1024),
+        the Qwen3-0.6B configuration.
+  \item Grouped-query attention (16 query / 8 key--value heads), head dimension 128,
+        vocabulary 151{,}936.
+  \item \emph{Two} lightweight classification heads attached to the final transformer layer:
+        one scores the input prompt (nine categories, including jailbreak), the other scores
+        the response as it is generated (eight categories)~\cite{qwen3guard}. Each head emits a
+        three-tier verdict (safe / controversial / unsafe) at every step rather than a single
+        binary trajectory score.
+  \item Designed to receive the response token by token and classify each new token instantly,
+        so it can run interleaved with --- or on a separate stream from --- the main generator.
 \end{itemize}
 
-Running the classifier concurrently with the main generator is a design choice to limit the
-marginal latency impact; we do not claim a specific per-step overhead, as it depends on the
-hardware, the generator size, and how much computation actually overlaps. The earlier
-revision's ``1.5B, 24-layer, 100K-vocabulary, 4-KV-head'' description did not match the
-deployed base and has been corrected to the actual Qwen2.5-3B configuration.
+For the deployment in Section~\ref{sec:formulation} we reduce the upstream per-step tier
+verdict to a binary safe/intervene signal $\hat{s}_t$ via a tier threshold; this is a
+packaging choice on top of the upstream head, not a change to the model. Running the small
+0.6B classifier concurrently with the main generator limits the marginal latency impact, but
+we do not claim a specific per-step overhead, as it depends on the hardware, the generator
+size, and how much computation actually overlaps. The earlier revision's ``1.5B, 24-layer,
+100K-vocabulary, 4-KV-head'' description did not match any deployed base and has been corrected
+to the actual Qwen3Guard-Stream-0.6B / Qwen3-0.6B configuration.
 
 \subsection{Intervention Protocol}
 
@@ -134,7 +161,7 @@ \subsection{Intervention Protocol}
 \subsection{Context Window}
 
 The classifier sees the recent generation context (prompt plus generated tokens) within the
-base model's 32K native context window~\cite{qwen25}, which is more than enough to capture
+base model's native context window~\cite{qwen3guard}, which is more than enough to capture
 multi-turn elicitation patterns in a typical conversation. The ``learned context compression
 module'' producing a $\text{Compress}_\phi(\cdot)$ summary embedding, described in earlier
 revisions, did not exist in the deployed model and has been removed; the model simply uses its
@@ -144,30 +171,36 @@ \section{Training Methodology}
 
 \subsection{Approach}
 
-Starting from the Qwen2.5-3B base~\cite{qwen25}, Zen-Guard-Stream is fine-tuned to predict a
-binary safe/unsafe label for the generation trajectory. The intended objective weights false
-negatives (missing unsafe trajectories) more heavily than false positives (blocking safe
-creative content),
+The base, Qwen3Guard-Stream-0.6B, is \emph{already} a token-level safety classifier: the Qwen
+team trained its streaming heads on over 1.19M human-annotated and synthetically generated
+safety samples (shared across the Qwen3Guard series) to emit per-step safe / controversial /
+unsafe verdicts during incremental generation~\cite{qwen3guard}. The upstream streaming-safety
+capability therefore comes from Alibaba's training run, not from Zen. On top of it, the Zen
+packaging reduces the per-step tier verdict to a binary safe/intervene control signal, biased
+toward recall on genuinely harmful continuations,
 
 \begin{equation}
   \mathcal{L} = w_{\text{FN}} \mathcal{L}_{\text{FN}} + w_{\text{FP}} \mathcal{L}_{\text{FP}} + \lambda \mathcal{L}_{\text{calibration}},
 \end{equation}
 
-so that recall on genuinely harmful trajectories is prioritized. Training data is constructed
-as generation \emph{trajectories} (safe and unsafe) rather than isolated content items, to
-match the streaming deployment setting.
+so that false negatives (missing unsafe trajectories) are penalized more heavily than false
+positives (blocking safe creative content).
 
-We do not publish dataset sizes or composition: the specific figures in earlier revisions (a
+We do not publish a Zen training corpus: the specific figures in earlier revisions (a
 500M-trajectory corpus with per-source percentages, and a five-round ``continuous adversarial
 training'' curriculum) were fabricated and did not describe a real training run. The honest
-statements are that the base is Alibaba's Qwen2.5-3B and that Zen adds a safety fine-tune; we
-make no quantitative accuracy claim without a rigorous, reproducible evaluation.
+statements are that the base is Alibaba's Apache-2.0 Qwen3Guard-Stream-0.6B and that Zen adds
+threshold/intervention packaging; we make no quantitative accuracy claim without a rigorous,
+reproducible evaluation.
 
 \section{Evaluation}
 
-We intentionally report no benchmark numbers. The upstream Qwen2.5-3B is a general-purpose
-LLM with no published safety metrics~\cite{qwen25}, and we have not run a rigorous,
-reproducible streaming-safety evaluation of the Zen fine-tune. The classification-accuracy,
+We report no \emph{Zen-measured} benchmark numbers. The upstream Qwen3Guard technical
+report~\cite{qwen3guard} does publish safety metrics for the Qwen3Guard series, and notes that
+even its smallest (0.6B) variants rival or exceed much larger guardrail models; but those
+headline F1 figures are reported for the \emph{generative} 8B model, and we do not restate them
+here as if they were the streaming 0.6B model's or the Zen packaging's. We have not run a
+rigorous, reproducible streaming-safety evaluation of our own. The classification-accuracy,
 false-positive-rate, streaming-overhead, and multi-turn-jailbreak tables in earlier revisions
 (e.g. ToxiGen 98.7\%, 0.8\% FPR, 1.7\,ms/step overhead, 90.3\% multi-turn detection) were
 fabricated --- they did not come from any measured evaluation --- and have been removed rather
@@ -234,18 +267,18 @@ \subsection{Deployment Topologies}
 
 \section{Related Work}
 
-Streaming content safety has been addressed through output filtering \cite{perspective}, circuit breaker mechanisms \cite{circuitbreaker}, and classifier guidance \cite{classifierguidance}. Token-level safety has been studied through vocabulary constraints \cite{vocabconstraint} and watermarking \cite{watermark}. Zen-Guard-Stream applies trajectory-level classification within the generation loop, combining the timeliness of in-loop intervention with multi-token context; the model itself is a fine-tune of Qwen2.5-3B~\cite{qwen25}.
+Streaming content safety has been addressed through output filtering \cite{perspective}, circuit breaker mechanisms \cite{circuitbreaker}, and classifier guidance \cite{classifierguidance}. Token-level safety has been studied through vocabulary constraints \cite{vocabconstraint} and watermarking \cite{watermark}. Qwen3Guard~\cite{qwen3guard} introduces a dedicated streaming guardrail (Qwen3Guard-Stream) with token-level classification heads that emit a three-tier verdict during incremental generation. Zen-Guard-Stream applies that in-loop classifier within the generation loop, reducing its per-step tier verdict to a safe/intervene control signal; the model itself is the Apache-2.0 Qwen3Guard-Stream-0.6B~\cite{qwen3guard}.
 
 \section{Conclusion}
 
-Zen-Guard-Stream is a streaming safety classifier built by fine-tuning Alibaba's Qwen2.5-3B base model~\cite{qwen25}; it is not a from-scratch model and is 3.09B parameters, not the ``1.5B'' of earlier revisions. Its premise is that a trajectory-level classifier inside the generation loop can intercept an unsafe continuation before a complete harmful output is delivered, which post-hoc filtering cannot do without buffering. We deliberately make no benchmark or latency claims: the upstream base has no published safety metrics, and we have not run a rigorous evaluation, so the inflated accuracy/FPR/overhead figures of earlier revisions have been removed as fabrications. \textbf{Adopters must also heed the license:} Qwen2.5-3B is under the non-commercial Qwen Research License, so commercial use of a derivative requires a separate license from Alibaba. Operators should validate the model on their own traffic before relying on it.
+Zen-Guard-Stream is a streaming safety classifier built on Alibaba's Qwen3Guard-Stream-0.6B~\cite{qwen3guard} (itself a token-level safety variant of Qwen3-0.6B); it is not a from-scratch model and is $\approx$0.6B parameters, not the ``1.5B'' of earlier revisions --- it is a redistribution of a purpose-built streaming guardrail with threshold/intervention packaging. Its premise is that a token-level classifier inside the generation loop can intercept an unsafe continuation before a complete harmful output is delivered, which post-hoc filtering cannot do without buffering. We deliberately make no \emph{Zen-measured} benchmark or latency claims; the inflated accuracy/FPR/overhead figures of earlier revisions have been removed as fabrications, and the upstream report's headline F1 figures are for the generative 8B model and are not restated here as the 0.6B streaming model's. \textbf{License (the reason for this rebase):} the previous base, Qwen2.5-3B, was under the non-commercial Qwen Research License, which barred commercial use of a derivative; rebasing onto the \textbf{Apache-2.0} Qwen3Guard-Stream-0.6B resolves that --- the streaming guard is now fully open for commercial deployment. Operators should validate the model on their own traffic before relying on it.
 
 \section*{Attribution and License}
 
-The base weights, training, and capabilities are Alibaba's Qwen2.5-3B (\texttt{Qwen/Qwen2.5-3B}); Zen contributes a safety fine-tune and packaging. Unlike most Qwen2.5 sizes, Qwen2.5-3B is licensed under the \emph{Qwen Research License} (non-commercial only), not Apache-2.0. We thank the Qwen team for releasing the base model.
+The base weights, training data, and token-level safety capability are Alibaba's Qwen3Guard-Stream-0.6B (\texttt{Qwen/Qwen3Guard-Stream-0.6B}), itself a safety variant of Qwen3-0.6B; Zen contributes the threshold/intervention packaging. Qwen3Guard-Stream-0.6B is licensed under \textbf{Apache-2.0}, which is why this model was rebased off the non-commercial Qwen Research License-bound Qwen2.5-3B: commercial use is now unencumbered. We thank the Qwen team for releasing the base model openly under a permissive license.
 
 \begin{thebibliography}{9}
-\bibitem{qwen25} Qwen Team, Alibaba (2024). Qwen2.5 Technical Report. arXiv:2412.15115. Base model: \texttt{Qwen/Qwen2.5-3B} (Qwen Research License, non-commercial).
+\bibitem{qwen3guard} Qwen Team, Alibaba (2025). Qwen3Guard Technical Report. arXiv:2510.14276. Base model: \texttt{Qwen/Qwen3Guard-Stream-0.6B} (Apache-2.0), a token-level safety variant of Qwen3-0.6B.
 \bibitem{perspective} Lees, A. et al. (2022). A New Generation of Perspective API. arXiv:2202.11176.
 \bibitem{circuitbreaker} Zou, A. et al. (2024). Improving Alignment and Robustness with Circuit Breakers. arXiv:2406.04313.
 \bibitem{classifierguidance} Dhariwal, P. \& Nichol, A. (2021). Diffusion Models Beat GANs on Image Synthesis. NeurIPS 2021.