diff --git a/docs.json b/docs.json
index 27930c2..a224d34 100644
--- a/docs.json
+++ b/docs.json
@@ -433,6 +433,17 @@
                     "pages": [
                       "/phala-cloud/confidential-ai/confidential-model/confidential-ai-api",
                       "/phala-cloud/confidential-ai/confidential-gpu/model-template",
+                      {
+                        "group": "API Reference",
+                        "pages": [
+                          "/phala-cloud/confidential-ai/confidential-model/api-reference/chat-completions",
+                          "/phala-cloud/confidential-ai/confidential-model/api-reference/models",
+                          "/phala-cloud/confidential-ai/confidential-model/api-reference/attestation",
+                          "/phala-cloud/confidential-ai/confidential-model/api-reference/signature",
+                          "/phala-cloud/confidential-ai/confidential-model/api-reference/embeddings",
+                          "/phala-cloud/confidential-ai/confidential-model/api-reference/embedding-models"
+                        ]
+                      },
                       "/phala-cloud/confidential-ai/confidential-model/tool-calling",
                       "/phala-cloud/confidential-ai/confidential-model/images-and-vision",
                       "/phala-cloud/confidential-ai/confidential-model/structured-output",
@@ -2122,4 +2133,4 @@
   "thumbnails": {
     "background": "/images/phala-docs-og.png"
   }
-}
\ No newline at end of file
+}
diff --git a/phala-cloud/confidential-ai/confidential-model/api-reference/attestation.mdx b/phala-cloud/confidential-ai/confidential-model/api-reference/attestation.mdx
new file mode 100644
index 0000000..618dfde
--- /dev/null
+++ b/phala-cloud/confidential-ai/confidential-model/api-reference/attestation.mdx
@@ -0,0 +1,165 @@
+---
+title: Attestation Report
+description: Fetch TEE attestation evidence for a Confidential AI model.
+---
+
+## Endpoint
+
+```bash
+GET https://api.redpill.ai/v1/attestation/report?model={model_id}&nonce={nonce}&signing_address={address}
+```
+
+The attestation report proves a model endpoint is backed by TEE hardware and provides the evidence needed for hardware, software, and signer binding checks.
+
+<Warning>
+Always include a fresh random `nonce` when fetching attestations for security-sensitive verification. A nonce prevents replay of an older valid attestation.
+</Warning>
+
+## Parameters
+
+<ParamField query="model" type="string" required>
+  Model ID to attest.
+
+  Examples: `phala/qwen3.5-27b`, `phala/qwen-2.5-7b-instruct`, `openai/gpt-oss-120b`, `z-ai/glm-5`.
+</ParamField>
+
+<ParamField query="nonce" type="string">
+  Fresh 32-byte random value encoded as 64 hex characters. The nonce is embedded in the TEE report data.
+</ParamField>
+
+<ParamField query="signing_address" type="string">
+  Ethereum address or public key used to filter attestations in multi-server deployments. Use this when binding a response signature to a specific TEE signer.
+</ParamField>
+
+## Examples
+
+<CodeGroup>
+```bash cURL
+NONCE=$(openssl rand -hex 32)
+
+curl "https://api.redpill.ai/v1/attestation/report?model=phala/qwen3.5-27b&nonce=$NONCE" \
+  -H "Authorization: Bearer <API_KEY>"
+```
+
+```python Python
+import secrets
+import requests
+
+nonce = secrets.token_hex(32)
+
+response = requests.get(
+    "https://api.redpill.ai/v1/attestation/report",
+    params={
+        "model": "phala/qwen3.5-27b",
+        "nonce": nonce,
+    },
+    headers={"Authorization": "Bearer <API_KEY>"},
+)
+
+attestation = response.json()
+```
+</CodeGroup>
+
+## Response Formats
+
+The response format depends on the provider behind the model.
+
+### Phala / NearAI Two-Layer Format
+
+Models may return separate gateway and model attestations:
+
+```json
+{
+  "gateway_attestation": {
+    "signing_address": "0x...",
+    "signing_algo": "ecdsa",
+    "intel_quote": "hex-encoded-tdx-quote",
+    "event_log": [],
+    "report_data": "...",
+    "request_nonce": "...",
+    "info": {
+      "vm_config": "..."
+    }
+  },
+  "model_attestations": [
+    {
+      "model_name": "phala/qwen3.5-27b",
+      "signing_address": "0x...",
+      "signing_algo": "ecdsa",
+      "intel_quote": "hex-encoded-tdx-quote",
+      "nvidia_payload": "{...json gpu attestation...}",
+      "event_log": [],
+      "info": {
+        "tcb_info": "{...app_compose...}",
+        "vm_config": "..."
+      }
+    }
+  ]
+}
+```
+
+### Chutes Format
+
+Some models return Chutes-style instance attestations:
+
+```json
+{
+  "attestation_type": "chutes",
+  "nonce": "...",
+  "all_attestations": [
+    {
+      "instance_id": "uuid",
+      "nonce": "...",
+      "intel_quote": "base64-encoded-tdx-quote",
+      "gpu_evidence": [
+        { "certificate": "...", "evidence": "...", "arch": "HOPPER" }
+      ],
+      "e2e_pubkey": "..."
+    }
+  ]
+}
+```
+
+### Flat Format
+
+Older Phala-native responses may expose fields at the top level:
+
+```json
+{
+  "signing_address": "0x...",
+  "signing_algo": "ecdsa",
+  "request_nonce": "...",
+  "intel_quote": "hex-encoded-tdx-quote",
+  "nvidia_payload": "{...}",
+  "info": {
+    "tcb_info": "{\"app_compose\":\"...\"}"
+  }
+}
+```
+
+## Important Fields
+
+| Field | Description |
+|-------|-------------|
+| `signing_address` | Address or key used by the TEE to sign responses |
+| `signing_algo` | Signature algorithm, commonly `ecdsa` |
+| `request_nonce` / `nonce` | Nonce included in the attestation |
+| `intel_quote` | Intel TDX quote for CPU TEE verification |
+| `nvidia_payload` | NVIDIA GPU attestation payload |
+| `event_log` | Boot event log for software stack verification |
+| `info.vm_config` | VM configuration evidence |
+| `info.tcb_info.app_compose` | Docker Compose application evidence |
+| `gateway_attestation` | Gateway TEE attestation |
+| `model_attestations` | One or more model runtime attestations |
+| `all_attestations` | Provider-specific list of model instance attestations |
+
+## Verification Flow
+
+1. Generate a fresh nonce.
+2. Fetch an attestation report for the exact model.
+3. Verify the Intel TDX quote.
+4. Verify GPU evidence when `nvidia_payload` or `gpu_evidence` is present.
+5. Confirm the report data binds the nonce and expected signing address.
+6. Verify application measurements such as compose hash and image provenance when available.
+
+For a walkthrough, see [Verify Attestation](/phala-cloud/confidential-ai/verify/verify-attestation).
diff --git a/phala-cloud/confidential-ai/confidential-model/api-reference/chat-completions.mdx b/phala-cloud/confidential-ai/confidential-model/api-reference/chat-completions.mdx
new file mode 100644
index 0000000..c910d00
--- /dev/null
+++ b/phala-cloud/confidential-ai/confidential-model/api-reference/chat-completions.mdx
@@ -0,0 +1,157 @@
+---
+title: Chat Completions
+description: Create OpenAI-compatible chat completion responses with Confidential AI models.
+---
+
+## Endpoint
+
+```bash
+POST https://api.redpill.ai/v1/chat/completions
+```
+
+Creates a response for a chat conversation. Use the same OpenAI-compatible request shape you already use with the OpenAI SDK, then set the base URL to `https://api.redpill.ai/v1`.
+
+## Request Body
+
+<ParamField body="model" type="string" required>
+  Model ID to use for completion.
+
+  Examples: `phala/qwen3.5-27b`, `phala/gemma-3-27b-it`, `z-ai/glm-5`, `openai/gpt-oss-120b`.
+</ParamField>
+
+<ParamField body="messages" type="array" required>
+  Conversation messages. Each message includes `role` and `content`.
+
+  ```json
+  [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "Explain GPU TEE in one paragraph."}
+  ]
+  ```
+</ParamField>
+
+<ParamField body="temperature" type="number">
+  Sampling temperature. Typical range is `0` to `2`.
+</ParamField>
+
+<ParamField body="max_tokens" type="integer">
+  Maximum number of output tokens for most open models and GPU TEE models.
+</ParamField>
+
+<ParamField body="max_completion_tokens" type="integer">
+  Maximum output tokens for newer OpenAI reasoning models that do not accept `max_tokens`.
+</ParamField>
+
+<ParamField body="stream" type="boolean">
+  Set to `true` to receive server-sent event chunks.
+</ParamField>
+
+<ParamField body="tools" type="array">
+  Function/tool definitions that supported models can call.
+</ParamField>
+
+<ParamField body="tool_choice" type="string | object">
+  Controls whether the model may call tools. Common values are `auto`, `none`, or a specific tool selection object.
+</ParamField>
+
+<ParamField body="response_format" type="object">
+  Requests structured output from supported models, including JSON schema mode.
+</ParamField>
+
+## Examples
+
+<CodeGroup>
+```bash cURL
+curl https://api.redpill.ai/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer <API_KEY>" \
+  -d '{
+    "model": "phala/qwen3.5-27b",
+    "messages": [
+      {"role": "user", "content": "What privacy guarantees does GPU TEE provide?"}
+    ]
+  }'
+```
+
+```python Python
+from openai import OpenAI
+
+client = OpenAI(
+    api_key="<API_KEY>",
+    base_url="https://api.redpill.ai/v1",
+)
+
+response = client.chat.completions.create(
+    model="phala/qwen3.5-27b",
+    messages=[
+        {"role": "user", "content": "What privacy guarantees does GPU TEE provide?"}
+    ],
+)
+
+print(response.choices[0].message.content)
+```
+
+```typescript TypeScript
+import OpenAI from "openai";
+
+const client = new OpenAI({
+  apiKey: "<API_KEY>",
+  baseURL: "https://api.redpill.ai/v1",
+});
+
+const response = await client.chat.completions.create({
+  model: "phala/qwen3.5-27b",
+  messages: [
+    { role: "user", content: "What privacy guarantees does GPU TEE provide?" },
+  ],
+});
+
+console.log(response.choices[0].message.content);
+```
+</CodeGroup>
+
+## Response
+
+```json
+{
+  "id": "chatcmpl-123",
+  "object": "chat.completion",
+  "created": 1677652288,
+  "model": "phala/qwen3.5-27b",
+  "choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "GPU TEE protects inference by..."
+      },
+      "finish_reason": "stop"
+    }
+  ],
+  "usage": {
+    "prompt_tokens": 16,
+    "completion_tokens": 48,
+    "total_tokens": 64
+  }
+}
+```
+
+The `id` field is the request ID. Use it with [Request Signature](/phala-cloud/confidential-ai/confidential-model/api-reference/signature) when you need cryptographic proof for this specific response.
+
+## Feature Notes
+
+- Streaming uses the same `stream: true` option as the OpenAI API.
+- Vision models accept multimodal `content` arrays with `image_url` entries.
+- Tool calling uses OpenAI-compatible `tools`, `tool_choice`, assistant `tool_calls`, and tool response messages.
+- Structured output uses `response_format` on supported models.
+
+## Next Steps
+
+<CardGroup cols={2}>
+  <Card title="List Models" icon="list" href="/phala-cloud/confidential-ai/confidential-model/api-reference/models">
+    Discover available Confidential AI models and capabilities
+  </Card>
+  <Card title="Verify Responses" icon="signature" href="/phala-cloud/confidential-ai/confidential-model/api-reference/signature">
+    Fetch the signature for a chat completion response
+  </Card>
+</CardGroup>
diff --git a/phala-cloud/confidential-ai/confidential-model/api-reference/embedding-models.mdx b/phala-cloud/confidential-ai/confidential-model/api-reference/embedding-models.mdx
new file mode 100644
index 0000000..e6bc521
--- /dev/null
+++ b/phala-cloud/confidential-ai/confidential-model/api-reference/embedding-models.mdx
@@ -0,0 +1,83 @@
+---
+title: List Embedding Models
+description: List embedding-capable models available through the Confidential AI API.
+---
+
+## Endpoint
+
+```bash
+GET https://api.redpill.ai/v1/embeddings/models
+```
+
+Returns models designed for vector embeddings. Use this endpoint when selecting embedding models for retrieval, RAG, clustering, or similarity search.
+
+## Examples
+
+<CodeGroup>
+```bash cURL
+curl https://api.redpill.ai/v1/embeddings/models \
+  -H "Authorization: Bearer <API_KEY>"
+```
+
+```python Python
+import requests
+
+response = requests.get(
+    "https://api.redpill.ai/v1/embeddings/models",
+    headers={"Authorization": "Bearer <API_KEY>"},
+)
+
+for model in response.json()["data"]:
+    print(model["id"])
+```
+
+```typescript TypeScript
+const response = await fetch("https://api.redpill.ai/v1/embeddings/models", {
+  headers: {
+    Authorization: "Bearer <API_KEY>",
+  },
+});
+
+const models = await response.json();
+for (const model of models.data) {
+  console.log(model.id);
+}
+```
+</CodeGroup>
+
+## Response
+
+```json
+{
+  "object": "list",
+  "data": [
+    {
+      "id": "qwen/qwen3-embedding-8b",
+      "name": "Qwen3 Embedding 8B",
+      "created": 1704067200,
+      "input_modalities": ["text"],
+      "output_modalities": ["embeddings"],
+      "context_length": 32768,
+      "max_output_length": 4096,
+      "pricing": {
+        "prompt": "0.00000001",
+        "completion": "0"
+      },
+      "description": "Embedding model for semantic search and retrieval"
+    }
+  ]
+}
+```
+
+## Response Fields
+
+| Field | Description |
+|-------|-------------|
+| `id` | Model identifier for `POST /v1/embeddings` |
+| `name` | Human-readable model name |
+| `input_modalities` | Input types accepted by the model |
+| `output_modalities` | Output types produced by the model |
+| `context_length` | Maximum input context length |
+| `max_output_length` | Embedding dimensions or max output size |
+| `pricing.prompt` | Input price per token |
+| `description` | Model description and use case |
diff --git a/phala-cloud/confidential-ai/confidential-model/api-reference/embeddings.mdx b/phala-cloud/confidential-ai/confidential-model/api-reference/embeddings.mdx
new file mode 100644
index 0000000..9ae06fe
--- /dev/null
+++ b/phala-cloud/confidential-ai/confidential-model/api-reference/embeddings.mdx
@@ -0,0 +1,108 @@
+---
+title: Embeddings
+description: Create vector embeddings with OpenAI-compatible embedding models.
+---
+
+## Endpoint
+
+```bash
+POST https://api.redpill.ai/v1/embeddings
+```
+
+Generate vector embeddings for retrieval, semantic search, clustering, and similarity workloads.
+
+## Request Body
+
+<ParamField body="model" type="string" required>
+  Embedding model ID.
+
+  Examples: `qwen/qwen3-embedding-8b`, `sentence-transformers/all-minilm-l6-v2`.
+</ParamField>
+
+<ParamField body="input" type="string | array" required>
+  Input text or list of inputs to embed.
+</ParamField>
+
+<ParamField body="encoding_format" type="string">
+  Embedding encoding format. Common values are `float` and `base64`.
+</ParamField>
+
+<ParamField body="dimensions" type="integer">
+  Requested output dimensions, when supported by the selected model.
+</ParamField>
+
+## Examples
+
+<CodeGroup>
+```bash cURL
+curl https://api.redpill.ai/v1/embeddings \
+  -H "Authorization: Bearer <API_KEY>" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "qwen/qwen3-embedding-8b",
+    "input": "Confidential AI keeps inference data private."
+  }'
+```
+
+```python Python
+from openai import OpenAI
+
+client = OpenAI(
+    api_key="<API_KEY>",
+    base_url="https://api.redpill.ai/v1",
+)
+
+response = client.embeddings.create(
+    model="qwen/qwen3-embedding-8b",
+    input="Confidential AI keeps inference data private.",
+)
+
+vector = response.data[0].embedding
+print(len(vector))
+```
+
+```typescript TypeScript
+import OpenAI from "openai";
+
+const client = new OpenAI({
+  apiKey: "<API_KEY>",
+  baseURL: "https://api.redpill.ai/v1",
+});
+
+const response = await client.embeddings.create({
+  model: "qwen/qwen3-embedding-8b",
+  input: "Confidential AI keeps inference data private.",
+});
+
+console.log(response.data[0].embedding.length);
+```
+</CodeGroup>
+
+## Response
+
+```json
+{
+  "object": "list",
+  "data": [
+    {
+      "object": "embedding",
+      "index": 0,
+      "embedding": [0.0023, -0.0015, 0.0042]
+    }
+  ],
+  "model": "qwen/qwen3-embedding-8b",
+  "usage": {
+    "prompt_tokens": 8,
+    "total_tokens": 8
+  }
+}
+```
+
+## Common Models
+
+| Model | Dimensions | Context | Notes |
+|-------|------------|---------|-------|
+| `qwen/qwen3-embedding-8b` | 4096 | 32K | Large confidential embedding model |
+| `sentence-transformers/all-minilm-l6-v2` | 384 | 512 | Low-cost compact embedding model |
+
+Use [List Embedding Models](/phala-cloud/confidential-ai/confidential-model/api-reference/embedding-models) for the live embedding catalog.
diff --git a/phala-cloud/confidential-ai/confidential-model/api-reference/models.mdx b/phala-cloud/confidential-ai/confidential-model/api-reference/models.mdx
new file mode 100644
index 0000000..a5b75f6
--- /dev/null
+++ b/phala-cloud/confidential-ai/confidential-model/api-reference/models.mdx
@@ -0,0 +1,98 @@
+---
+title: List Models
+description: List available Confidential AI models, providers, modalities, context windows, and pricing metadata.
+---
+
+## Endpoints
+
+```bash
+GET https://api.redpill.ai/v1/models
+GET https://api.redpill.ai/v1/models/phala
+```
+
+Use the live model catalog before hardcoding model IDs. The catalog returns model IDs, context windows, pricing, providers, modalities, and TEE metadata when available.
+
+## Examples
+
+<CodeGroup>
+```bash All Models
+curl https://api.redpill.ai/v1/models \
+  -H "Authorization: Bearer <API_KEY>"
+```
+
+```bash Phala Models
+curl https://api.redpill.ai/v1/models/phala \
+  -H "Authorization: Bearer <API_KEY>"
+```
+
+```python Python
+from openai import OpenAI
+
+client = OpenAI(
+    api_key="<API_KEY>",
+    base_url="https://api.redpill.ai/v1",
+)
+
+models = client.models.list()
+for model in models.data:
+    print(model.id)
+```
+</CodeGroup>
+
+## Response
+
+```json
+{
+  "data": [
+    {
+      "id": "phala/qwen3.5-27b",
+      "name": "Qwen3.5 27B",
+      "created": 1677652288,
+      "description": "Qwen model running through Phala GPU TEE infrastructure",
+      "context_length": 262144,
+      "pricing": {
+        "prompt": "0.00000030",
+        "completion": "0.00000240"
+      },
+      "providers": ["phala"],
+      "metadata": {
+        "tee": true,
+        "appid": "..."
+      },
+      "architecture": {
+        "modality": "text->text",
+        "input_modalities": ["text"],
+        "output_modalities": ["text"]
+      }
+    }
+  ]
+}
+```
+
+## Model Object Fields
+
+| Field | Description |
+|-------|-------------|
+| `id` | Model identifier for API calls |
+| `name` | Human-readable model name |
+| `description` | Model or provider description |
+| `context_length` | Maximum context window |
+| `pricing.prompt` | Input token price per token; multiply by 1,000,000 for per-million-token pricing |
+| `pricing.completion` | Output token price per token; multiply by 1,000,000 for per-million-token pricing |
+| `providers` | Infrastructure providers such as `phala`, `near-ai`, `tinfoil`, or `chutes` |
+| `metadata.tee` | Whether the model is marked as a TEE model |
+| `metadata.appid` | Present when the model supports the attestation flow |
+| `architecture.input_modalities` | Supported input types, such as `text` or `image` |
+| `architecture.output_modalities` | Supported output types, such as `text` or `embeddings` |
+
+## Find Verifiable TEE Models
+
+Filter for models that expose TEE provider metadata:
+
+```bash
+curl https://api.redpill.ai/v1/models \
+  -H "Authorization: Bearer <API_KEY>" | \
+  jq '.data[] | select(.metadata.tee == true or any(.providers[]?; test("phala|near-ai|tinfoil|chutes"))) | {id, providers, appid: .metadata.appid}'
+```
+
+For production verification, test [Attestation Report](/phala-cloud/confidential-ai/confidential-model/api-reference/attestation) with the exact model ID before relying on it in your application.
diff --git a/phala-cloud/confidential-ai/confidential-model/api-reference/signature.mdx b/phala-cloud/confidential-ai/confidential-model/api-reference/signature.mdx
new file mode 100644
index 0000000..c02bbb0
--- /dev/null
+++ b/phala-cloud/confidential-ai/confidential-model/api-reference/signature.mdx
@@ -0,0 +1,109 @@
+---
+title: Request Signature
+description: Fetch a cryptographic signature for a Confidential AI response.
+---
+
+## Endpoint
+
+```bash
+GET https://api.redpill.ai/v1/signature/{request_id}?model={model}&signing_algo={algo}
+```
+
+Use this endpoint after a chat completion request. The signature proves a specific response was signed by a TEE key. Bind that key to fresh attestation evidence before treating the response as fully verified.
+
+## Parameters
+
+<ParamField path="request_id" type="string" required>
+  The `id` returned by `POST /v1/chat/completions`.
+</ParamField>
+
+<ParamField query="model" type="string" required>
+  The model ID used for the original request.
+</ParamField>
+
+<ParamField query="signing_algo" type="string">
+  Signature algorithm. Common values include `ecdsa`, `ecdsa-p256`, and `rsa`; use the algorithm supported by the model response.
+</ParamField>
+
+## Examples
+
+<CodeGroup>
+```bash cURL
+RESPONSE=$(curl -s https://api.redpill.ai/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer <API_KEY>" \
+  -d '{"model":"phala/qwen3.5-27b","messages":[{"role":"user","content":"hello"}]}')
+
+REQUEST_ID=$(echo "$RESPONSE" | jq -r '.id')
+
+curl "https://api.redpill.ai/v1/signature/$REQUEST_ID?model=phala/qwen3.5-27b" \
+  -H "Authorization: Bearer <API_KEY>"
+```
+
+```python Python
+import requests
+
+chat_response = requests.post(
+    "https://api.redpill.ai/v1/chat/completions",
+    headers={
+        "Authorization": "Bearer <API_KEY>",
+        "Content-Type": "application/json",
+    },
+    json={
+        "model": "phala/qwen3.5-27b",
+        "messages": [{"role": "user", "content": "hello"}],
+    },
+)
+
+request_id = chat_response.json()["id"]
+
+signature_response = requests.get(
+    f"https://api.redpill.ai/v1/signature/{request_id}",
+    params={"model": "phala/qwen3.5-27b"},
+    headers={"Authorization": "Bearer <API_KEY>"},
+)
+
+signature_data = signature_response.json()
+```
+</CodeGroup>
+
+## Response
+
+```json
+{
+  "text": "phala/qwen3.5-27b:116478638341bd2b...:3d0b2a2df73dc93a...",
+  "signature": "0xee817b30e13ec3c320997ec37076a600e194dc64...",
+  "signing_address": "0x56d070df1c6be444b007839ef9cf67cec7c12b8b",
+  "signing_algo": "ecdsa"
+}
+```
+
+## Response Fields
+
+| Field | Description |
+|-------|-------------|
+| `text` | Signed text. Format is either `request_hash:response_hash` or `model:request_hash:response_hash` |
+| `signature` | Signature over `text` |
+| `signing_address` | TEE signing address or public key |
+| `signing_algo` | Signature algorithm used |
+
+<Note>
+When `text` has three colon-separated parts, the first part is the model name used inside the signing path. It may differ from the alias you sent if the gateway rewrote the model ID internally.
+</Note>
+
+## Bind to Attestation
+
+For production verification, use the returned `signing_address` to fetch fresh attestation evidence:
+
+```bash
+NONCE=$(openssl rand -hex 32)
+
+curl "https://api.redpill.ai/v1/attestation/report?model=phala/qwen3.5-27b&nonce=$NONCE&signing_address=$SIGNING_ADDRESS" \
+  -H "Authorization: Bearer <API_KEY>"
+```
+
+The response is verified only when:
+
+1. The request and response hashes in `text` match the bytes you sent and received.
+2. The signature is valid for `text`.
+3. The attestation report binds the same `signing_address` to genuine TEE evidence and your fresh nonce.
diff --git a/phala-cloud/confidential-ai/confidential-model/confidential-ai-api.mdx b/phala-cloud/confidential-ai/confidential-model/confidential-ai-api.mdx
index aaf6f38..c1c0459 100644
--- a/phala-cloud/confidential-ai/confidential-model/confidential-ai-api.mdx
+++ b/phala-cloud/confidential-ai/confidential-model/confidential-ai-api.mdx
@@ -24,24 +24,27 @@ Once you get the API Key, you can start making requests to the Confidential AI A
 
 ## Make Your Secure Request
 
-Replace `<API_KEY>` with your actual API key in the examples below. We use DeepSeek V3 0324 model as an example, but you can choose any other available models.
+Replace `<API_KEY>` with your actual API key. The examples below use `phala/qwen3.5-27b`; use [List Models](/phala-cloud/confidential-ai/confidential-model/api-reference/models) to choose a model for your workload.
 
 <CodeGroup>
-```bash Python
+```python Python
 # Install OpenAI SDK: `pip3 install openai`
 
 from openai import OpenAI
 
-client = OpenAI(api_key="<API_KEY>", base_url="https://api.redpill.ai/v1")
+client = OpenAI(
+    api_key="<API_KEY>",
+    base_url="https://api.redpill.ai/v1",
+)
 
 response = client.chat.completions.create(
-    model="phala/deepseek-chat-v3-0324",
+    model="phala/qwen3.5-27b",
     messages=[
         {"role": "system", "content": "You are a helpful assistant"},
         {"role": "user", "content": "What is your model name?"},
     ],
-    stream=True
 )
+
 print(response.choices[0].message.content)
 ```
 
@@ -49,14 +52,13 @@ print(response.choices[0].message.content)
 import OpenAI from 'openai';
 
 const client = new OpenAI({
-    baseURL: 'https://api.redpill.ai/v1',
-    apiKey: '<API_KEY>',
-  },
+  baseURL: 'https://api.redpill.ai/v1',
+  apiKey: '<API_KEY>',
 });
 
 async function main() {
   const completion = await client.chat.completions.create({
-    model: 'phala/deepseek-chat-v3-0324',
+    model: 'phala/qwen3.5-27b',
     messages: [
       {
         role: 'user',
@@ -87,59 +89,97 @@ curl -X 'POST' \
       "role": "user"
     }
   ],
-  "stream": true,
-  "model": "phala/deepseek-chat-v3-0324"
+  "model": "phala/qwen3.5-27b"
 }'
 ```
 </CodeGroup>
 
 ### Available Models
 
-We support [14+ models](https://redpill.ai/models) running in GPU TEE from multiple providers. Click the **GPU TEE** checkbox to see all options.
+Confidential AI models are available through several GPU TEE providers. The live catalog is authoritative; query it before hardcoding model IDs:
+
+```bash
+curl https://api.redpill.ai/v1/models \
+  -H "Authorization: Bearer <API_KEY>"
+```
+
+To list Phala-backed models only:
+
+```bash
+curl https://api.redpill.ai/v1/models/phala \
+  -H "Authorization: Bearer <API_KEY>"
+```
+
+The following table reflects the current model families added in the RedPill model catalog update. Pricing and availability can change; use the API response for production routing.
 
 #### Phala Provider
 
-| Model | Model ID | Context | Pricing (per 1M tokens) |
-|-------|----------|---------|-------------------------|
-| DeepSeek V3 0324 | `deepseek/deepseek-chat-v3-0324` | 163K | $0.28 / $1.14 |
-| Qwen2.5 VL 72B Instruct | `qwen/qwen2.5-vl-72b-instruct` | 65K | $0.59 / $0.59 |
-| Google Gemma 3 27B | `google/gemma-3-27b-it` | 53K | $0.11 / $0.40 |
-| OpenAI GPT OSS 120B | `openai/gpt-oss-120b` | 131K | $0.10 / $0.49 |
-| OpenAI GPT OSS 20B | `openai/gpt-oss-20b` | 131K | $0.04 / $0.15 |
-| Qwen2.5 7B Instruct | `qwen/qwen-2.5-7b-instruct` | 32K | $0.04 / $0.10 |
-| Sentence Transformers all-MiniLM-L6-v2 | `sentence-transformers/all-minilm-l6-v2` | 512 | $0.000005 |
+| Model ID | Context | Modality | Pricing (input/output per 1M tokens) |
+|----------|---------|----------|--------------------------------------|
+| `phala/qwen3.5-27b` | 262K | Text | $0.30 / $2.40 |
+| `phala/qwen3-vl-30b-a3b-instruct` | 128K | Vision + Text | $0.20 / $0.70 |
+| `qwen/qwen3-embedding-8b` | 32K | Embeddings | $0.01 / $0 |
+| `phala/gemma-3-27b-it` | 53K | Vision + Text | $0.11 / $0.40 |
+| `phala/glm-4.7-flash` | 202K | Text | $0.10 / $0.43 |
+| `phala/gpt-oss-20b` | 131K | Text | $0.04 / $0.15 |
+| `phala/qwen-2.5-7b-instruct` | 32K | Text | $0.04 / $0.10 |
+| `phala/qwen2.5-vl-72b-instruct` | 128K | Vision + Text | $0.40 / $1.20 |
+| `phala/uncensored-24b` | 32K | Text | $0.20 / $0.90 |
+| `sentence-transformers/all-minilm-l6-v2` | 512 | Embeddings | $0.005 / $0 |
+
+<Note>
+`phala/qwen2.5-vl-72b-instruct` is a legacy alias that may route to `phala/qwen3-vl-30b-a3b-instruct`. Prefer the canonical ID returned by `/v1/models`.
+</Note>
 
 #### NearAI Provider
 
-| Model | Model ID | Context | Pricing (per 1M tokens) |
-|-------|----------|---------|-------------------------|
-| DeepSeek V3.1 | `deepseek/deepseek-chat-v3.1` | 163K | $1.00 / $2.50 |
-| Qwen3 30B A3B Instruct | `qwen/qwen3-30b-a3b-instruct-2507` | 262K | $0.15 / $0.45 |
-| Z.AI GLM 4.6 | `z-ai/glm-4.6` | 202K | $0.75 / $2.00 |
+| Model ID | Context | Modality | Pricing (input/output per 1M tokens) |
+|----------|---------|----------|--------------------------------------|
+| `z-ai/glm-5` | 203K | Text | $1.20 / $3.50 |
+| `deepseek/deepseek-chat-v3.1` | 164K | Text | $1.05 / $3.10 |
+| `openai/gpt-oss-120b` | 131K | Text | $0.10 / $0.49 |
+| `qwen/qwen3-30b-a3b-instruct-2507` | 262K | Text | $0.15 / $0.55 |
+| `z-ai/glm-4.7` | 131K | Text | $0.85 / $3.30 |
+
+#### Chutes Provider
+
+| Model ID | Context | Modality | Pricing (input/output per 1M tokens) |
+|----------|---------|----------|--------------------------------------|
+| `z-ai/glm-5.1` | 203K | Text | $1.21 / $4.20 |
+| `moonshotai/kimi-k2.6` | 262K | Text + Image | $1.09 / $4.60 |
+| `qwen/qwen3.5-397b-a17b` | 262K | Text | $0.55 / $3.50 |
+| `qwen/qwen3-coder-next` | 262K | Text | $0.18 / $1.20 |
+| `minimax/minimax-m2.5` | 197K | Text | $0.20 / $1.38 |
+| `xiaomi/mimo-v2-flash` | 262K | Text | $0.10 / $0.30 |
+| `deepseek/deepseek-v3.2` | 164K | Text | $0.32 / $0.48 |
+| `moonshotai/kimi-k2.5` | 262K | Text + Image | $0.60 / $3.00 |
 
 #### Tinfoil Provider
 
-| Model | Model ID | Context | Pricing (per 1M tokens) |
-|-------|----------|---------|-------------------------|
-| DeepSeek R1 0528 | `deepseek/deepseek-r1-0528` | 163K | $2.00 / $2.00 |
-| Qwen3 Coder 480B A35B | `qwen/qwen3-coder-480b-a35b-instruct` | 262K | $2.00 / $2.00 |
-| Qwen3 VL 30B A3B | `qwen/qwen3-vl-30b-a3b-instruct` | 262K | $2.00 / $2.00 |
-| Meta Llama 3.3 70B Instruct | `meta-llama/llama-3.3-70b-instruct` | 131K | $2.00 / $2.00 |
+| Model ID | Context | Modality | Pricing (input/output per 1M tokens) |
+|----------|---------|----------|--------------------------------------|
+| `qwen/qwen3-coder-480b-a35b-instruct` | 262K | Text | $2.00 / $2.00 |
+| `moonshotai/kimi-k2-thinking` | 262K | Text | $2.00 / $2.00 |
+| `deepseek/deepseek-r1-0528` | 163K | Text | $2.00 / $2.00 |
+| `meta-llama/llama-3.3-70b-instruct` | 131K | Text | $2.00 / $2.00 |
 
 <Note>
-All models run in GPU TEEs with hardware attestation. Pricing shows input/output token costs. Browse the full list at [redpill.ai/models](https://redpill.ai/models).
+TEE provider presence and attestation support are not identical for every provider and model. For production verification, test [Attestation Report](/phala-cloud/confidential-ai/confidential-model/api-reference/attestation) with the exact model ID you plan to use.
 </Note>
 
 ## Verify Your AI is Running Securely
 
-Once you finished your secure request, every response comes with cryptographic proof that it ran in a secure TEE. This proof is generated by the TEE. ensures the response is secure and trustworthy. Click [Verify](/phala-cloud/confidential-ai/verify/overview) to learn how to verify your AI is running securely.
+After you make a request, use [Request Signature](/phala-cloud/confidential-ai/confidential-model/api-reference/signature) to fetch the signature for that response. Then fetch a fresh [Attestation Report](/phala-cloud/confidential-ai/confidential-model/api-reference/attestation) with the returned `signing_address` to bind the response to TEE evidence.
 
 ## Next Steps
 
-There are some advanced features you could use with Confidential AI API.
+Use the API reference and feature guides for the next step:
 
-- [Tool Calling](/phala-cloud/confidential-ai/confidential-model/tool-calling) help you call tools from your AI models.
-- [Images and Vision](/phala-cloud/confidential-ai/confidential-model/images-and-vision) help you use images and vision models in Confidential AI.
-- [Structured Output](/phala-cloud/confidential-ai/confidential-model/structured-output) help you get structured output from your AI models.
-- [Streaming](/phala-cloud/confidential-ai/confidential-model/streaming) help you get streaming response from your AI models.
-- [Playground](/phala-cloud/confidential-ai/confidential-model/playground) help you play with Confidential AI models in a private environment.
+- [Chat Completions](/phala-cloud/confidential-ai/confidential-model/api-reference/chat-completions) documents the core request and response shape.
+- [List Models](/phala-cloud/confidential-ai/confidential-model/api-reference/models) shows how to discover models programmatically.
+- [Embeddings](/phala-cloud/confidential-ai/confidential-model/api-reference/embeddings) covers embedding model calls.
+- [Tool Calling](/phala-cloud/confidential-ai/confidential-model/tool-calling) helps you call tools from your AI models.
+- [Images and Vision](/phala-cloud/confidential-ai/confidential-model/images-and-vision) helps you use image-capable models.
+- [Structured Output](/phala-cloud/confidential-ai/confidential-model/structured-output) helps you get JSON responses.
+- [Streaming](/phala-cloud/confidential-ai/confidential-model/streaming) helps you consume streaming responses.
+- [Playground](/phala-cloud/confidential-ai/confidential-model/playground) helps you test models in a private environment.
diff --git a/phala-cloud/confidential-ai/confidential-model/images-and-vision.mdx b/phala-cloud/confidential-ai/confidential-model/images-and-vision.mdx
index 26ec1cf..116c8ab 100644
--- a/phala-cloud/confidential-ai/confidential-model/images-and-vision.mdx
+++ b/phala-cloud/confidential-ai/confidential-model/images-and-vision.mdx
@@ -24,7 +24,7 @@ client = OpenAI(
     api_key="<API_KEY>",
 )
 response = client.chat.completions.create(
-    model="phala/gemma-3-27b-it",
+    model="phala/qwen3-vl-30b-a3b-instruct",
     messages=[{
         "role": "user",
         "content": [
@@ -60,4 +60,5 @@ The overall impression is of a cute and peaceful scene with baby pandas enjoying
 ### Supported Models for Image Analysis
 
 - `phala/gemma-3-27b-it`
-- `phala/qwen2.5-vl-72b-instruct`
+- `phala/qwen3-vl-30b-a3b-instruct`
+- `phala/qwen2.5-vl-72b-instruct` (legacy alias)
diff --git a/phala-cloud/confidential-ai/confidential-model/streaming.mdx b/phala-cloud/confidential-ai/confidential-model/streaming.mdx
index 4e8a9a7..49a93e0 100644
--- a/phala-cloud/confidential-ai/confidential-model/streaming.mdx
+++ b/phala-cloud/confidential-ai/confidential-model/streaming.mdx
@@ -15,16 +15,17 @@ Confidential AI API supports streaming, enabling you to receive responses in a s
 
 Replace `<API_KEY>` with your actual API key in the examples below.
 
-```python
-import OpenAI from 'openai';
-const client = new OpenAI({
-    baseURL: 'https://api.redpill.ai/api/v1',
-    apiKey: '<API_KEY>',
-  },
-});
+<CodeGroup>
+```python Python
+from openai import OpenAI
+
+client = OpenAI(
+    api_key="<API_KEY>",
+    base_url="https://api.redpill.ai/v1",
+)
 
 stream = client.chat.completions.create(
-    model="phala/deepseek-chat-v3-0324",
+    model="phala/qwen3.5-27b",
     messages=[
         {
             "role": "user",
@@ -34,21 +35,36 @@ stream = client.chat.completions.create(
     stream=True,
 )
 for chunk in stream:
-    if chunk.choices:
-        print(chunk.choices[0].delta.content)
-        print("---")
+    content = chunk.choices[0].delta.content
+    if content:
+        print(content, end="")
 ```
 
+```typescript TypeScript
+import OpenAI from "openai";
 
-<Accordion title="Sample output of structured output">
-```json
----
-Hello
----
-Hello
----
+const client = new OpenAI({
+  apiKey: "<API_KEY>",
+  baseURL: "https://api.redpill.ai/v1",
+});
 
----
+const stream = await client.chat.completions.create({
+  model: "phala/qwen3.5-27b",
+  messages: [
+    { role: "user", content: "say `Hello` 2 times fast, no other output" },
+  ],
+  stream: true,
+});
+
+for await (const chunk of stream) {
+  process.stdout.write(chunk.choices[0]?.delta?.content || "");
+}
+```
+</CodeGroup>
+
+<Accordion title="Sample output of streaming request">
+```
+HelloHello
 ```
 </Accordion>
 
diff --git a/phala-cloud/confidential-ai/confidential-model/structured-output.mdx b/phala-cloud/confidential-ai/confidential-model/structured-output.mdx
index c2fa2b2..3535365 100644
--- a/phala-cloud/confidential-ai/confidential-model/structured-output.mdx
+++ b/phala-cloud/confidential-ai/confidential-model/structured-output.mdx
@@ -25,7 +25,7 @@ response = requests.post(
         "Content-Type": "application/json",
     },
     json={
-        "model": "phala/deepseek-chat-v3-0324",
+        "model": "phala/gpt-oss-20b",
         "messages": [
             {"role": "user", "content": "What is the weather like in Los Angeles?"},
         ],
@@ -79,8 +79,8 @@ print(info)
 
 Confidential AI supports structured output for the following models:
 
-- `phala/deepseek-chat-v3-0324`
 - `phala/gemma-3-27b-it`
 - `phala/gpt-oss-20b`
 - `phala/gpt-oss-120b`
-- `phala/qwen2.5-vl-72b-instruct`
+- `phala/qwen3.5-27b`
+- `phala/qwen3-vl-30b-a3b-instruct`
diff --git a/phala-cloud/confidential-ai/confidential-model/tool-calling.mdx b/phala-cloud/confidential-ai/confidential-model/tool-calling.mdx
index 2ee98a5..1cc971f 100644
--- a/phala-cloud/confidential-ai/confidential-model/tool-calling.mdx
+++ b/phala-cloud/confidential-ai/confidential-model/tool-calling.mdx
@@ -47,7 +47,7 @@ curl -s -X POST  'https://api.redpill.ai/v1/chat/completions' \
   ],
   "tool_choice": "auto",
   "stream": false,
-  "model": "phala/qwen3-coder"
+  "model": "phala/gpt-oss-20b"
 }'
 ```
 
@@ -57,7 +57,7 @@ curl -s -X POST  'https://api.redpill.ai/v1/chat/completions' \
   "id": "chatcmpl-28f745c2b7ee44f2ba36a8b4b409c74a",
   "object": "chat.completion",
   "created": 1754381277,
-  "model": "qwen/qwen3-coder",
+  "model": "phala/gpt-oss-20b",
   "choices": [
     {
       "index": 0,
@@ -152,7 +152,7 @@ curl -s -X POST  'https://api.redpill.ai/v1/chat/completions' \
     ],
     "tool_choice": "auto",
     "stream": false,
-    "model": "phala/qwen3-coder"
+    "model": "phala/gpt-oss-20b"
   }'
 ```
 
@@ -163,7 +163,7 @@ curl -s -X POST  'https://api.redpill.ai/v1/chat/completions' \
   "id": "chatcmpl-a46eff3d335c42c39bbe4ea69fc97462",
   "object": "chat.completion",
   "created": 1754381325,
-  "model": "qwen/qwen3-coder",
+  "model": "phala/gpt-oss-20b",
   "choices": [
     {
       "index": 0,
@@ -198,6 +198,7 @@ curl -s -X POST  'https://api.redpill.ai/v1/chat/completions' \
 
 ## Supported Models
 
-- phala/deepseek-chat-v3-0324
-- phala/qwen3-coder
-- phala/llama-3.3-70b-instruct
+- `phala/gpt-oss-20b`
+- `phala/qwen3.5-27b`
+- `qwen/qwen3-coder-next`
+- `qwen/qwen3-coder-480b-a35b-instruct`
diff --git a/phala-cloud/confidential-ai/verify/verify-attestation.mdx b/phala-cloud/confidential-ai/verify/verify-attestation.mdx
index ecd5037..7a5fb5f 100644
--- a/phala-cloud/confidential-ai/verify/verify-attestation.mdx
+++ b/phala-cloud/confidential-ai/verify/verify-attestation.mdx
@@ -42,41 +42,73 @@ response = requests.get(
 )
 report = response.json()
 
-# You get key pieces:
-# - nvidia_payload: GPU verification data
-# - intel_quote: CPU verification data
-# - signing_address: For signature verification
-# - signing_algo: "ecdsa" or "ed25519"
+# Response shape depends on the provider backing the model.
 ```
 
-The report gives you NVIDIA's hardware verification data for each GPU, Intel's TEE verification data for the CPU, a signing address you'll use later to verify signatures, and the signing algorithm used by this TEE instance.
+The report gives you Intel TDX evidence, optional NVIDIA GPU evidence, signing key information, and software measurement data. The exact shape depends on the provider:
+
+- **Phala / NearAI two-layer format**: `gateway_attestation` plus `model_attestations`.
+- **Chutes format**: `attestation_type: "chutes"` plus `all_attestations`.
+- **Flat format**: older Phala-native responses expose fields such as `intel_quote`, `nvidia_payload`, and `signing_address` at the top level.
+
+Use [Attestation Report](/phala-cloud/confidential-ai/confidential-model/api-reference/attestation) for the endpoint schema.
+
+### Select the attestation to verify
+
+For Phala and NearAI two-layer responses, verify both the gateway attestation and the model attestation when both are present. The gateway protects routing and request handling; the model attestation protects the inference runtime.
+
+```python
+def get_attestations(report):
+    attestations = []
+
+    if "gateway_attestation" in report:
+        attestations.append(("gateway", report["gateway_attestation"]))
+
+    for item in report.get("model_attestations", []):
+        attestations.append(("model", item))
+
+    for item in report.get("all_attestations", []):
+        attestations.append(("model", item))
+
+    if "intel_quote" in report:
+        attestations.append(("model", report))
+
+    return attestations
+
+attestations = get_attestations(report)
+assert attestations, "No attestation evidence found"
+```
 
 ### Verify NVIDIA GPU attestation
 
-Now let's verify your NVIDIA GPUs are genuine. You'll send the `nvidia_payload` from your report to NVIDIA's own attestation service. Why NVIDIA's service? Because only NVIDIA can confirm their hardware is authentic - they built secret keys into each chip during manufacturing.
+Now verify NVIDIA GPUs when GPU evidence is present. Phala-style responses expose `nvidia_payload`; Chutes-style responses expose `gpu_evidence`.
 
 ```python
 import json
 import base64
 
-# Parse and verify GPU payload nonce
-gpu_payload = json.loads(report["nvidia_payload"])
-assert gpu_payload["nonce"].lower() == request_nonce.lower()
+for name, attestation in attestations:
+    if "nvidia_payload" not in attestation:
+        continue
 
-# Send to NVIDIA's Remote Attestation Service
-response = requests.post(
-    "https://nras.attestation.nvidia.com/v3/attest/gpu",
-    json=gpu_payload
-)
-result = response.json()
+    # Parse and verify GPU payload nonce
+    gpu_payload = json.loads(attestation["nvidia_payload"])
+    assert gpu_payload["nonce"].lower() == request_nonce.lower()
 
-# Decode the JWT verdict
-jwt_token = result[0][1]
-payload_b64 = jwt_token.split(".")[1]
-padded = payload_b64 + "=" * ((4 - len(payload_b64) % 4) % 4)
-verdict_data = json.loads(base64.urlsafe_b64decode(padded))
+    # Send to NVIDIA's Remote Attestation Service
+    response = requests.post(
+        "https://nras.attestation.nvidia.com/v3/attest/gpu",
+        json=gpu_payload
+    )
+    result = response.json()
 
-assert verdict_data["x-nvidia-overall-att-result"] == True
+    # Decode the JWT verdict
+    jwt_token = result[0][1]
+    payload_b64 = jwt_token.split(".")[1]
+    padded = payload_b64 + "=" * ((4 - len(payload_b64) % 4) % 4)
+    verdict_data = json.loads(base64.urlsafe_b64decode(padded))
+
+    assert verdict_data["x-nvidia-overall-att-result"] == True
 ```
 
 The GPU payload must use the same nonce you generated. NVIDIA returns a JWT with `x-nvidia-overall-att-result: True` for verified authentic hardware.
@@ -86,14 +118,26 @@ The GPU payload must use the same nonce you generated. NVIDIA returns a JWT with
 For Intel CPUs, you'll verify the TDX quote using Phala's verification service. This service decodes and validates Intel's cryptographic proof.
 
 ```python
-# Verify Intel TDX quote
-response = requests.post(
-    "https://cloud-api.phala.com/api/v1/attestations/verify",
-    json={"hex": report["intel_quote"]}
-)
-intel_result = response.json()
+import base64
+import re
+
+def quote_to_hex(quote):
+    value = quote.removeprefix("0x")
+    if re.fullmatch(r"[0-9a-fA-F]+", value):
+        return value
+    return base64.b64decode(value).hex()
+
+for name, attestation in attestations:
+    if "intel_quote" not in attestation:
+        continue
+
+    response = requests.post(
+        "https://cloud-api.phala.com/api/v1/attestations/verify",
+        json={"hex": quote_to_hex(attestation["intel_quote"])}
+    )
+    intel_result = response.json()
 
-assert intel_result["quote"]["verified"] == True
+    assert intel_result["quote"]["verified"] == True
 ```
 
 This confirms the CPU is genuine Intel hardware running in TDX mode. The `intel_result` contains the decoded quote data we'll use next, including `reportdata` and `mrconfig` fields.
@@ -112,8 +156,8 @@ report_data_hex = intel_result["quote"]["body"]["reportdata"]
 report_data = bytes.fromhex(report_data_hex.removeprefix("0x"))
 
 # Parse signing address based on algorithm
-signing_address = report["signing_address"]
-signing_algo = report.get("signing_algo", "ecdsa")
+signing_address = attestation["signing_address"]
+signing_algo = attestation.get("signing_algo", "ecdsa")
 
 if signing_algo == "ecdsa":
     # ECDSA: 20-byte Ethereum address
@@ -155,7 +199,7 @@ Next, verify your application code hasn't been modified. The TEE measures the en
 from hashlib import sha256
 
 # Extract compose manifest from attestation
-tcb_info = report["info"]["tcb_info"]
+tcb_info = attestation["info"]["tcb_info"]
 if isinstance(tcb_info, str):
     tcb_info = json.loads(tcb_info)
 
diff --git a/phala-cloud/confidential-ai/verify/verify-signature.mdx b/phala-cloud/confidential-ai/verify/verify-signature.mdx
index d64ced5..8f52326 100644
--- a/phala-cloud/confidential-ai/verify/verify-signature.mdx
+++ b/phala-cloud/confidential-ai/verify/verify-signature.mdx
@@ -18,7 +18,7 @@ import requests
 
 # After getting AI response with chat_id
 chat_id = ai_response["id"]
-model = "phala/deepseek-chat-v3-0324"  # or your model
+model = "phala/qwen3.5-27b"  # or your model
 
 # Fetch the signature
 sig_response = requests.get(
@@ -28,16 +28,17 @@ sig_response = requests.get(
 signature_data = sig_response.json()
 
 # signature_data contains:
-# - text: "request_hash:response_hash"
+# - text: "request_hash:response_hash" or "model:request_hash:response_hash"
 # - signature: The ECDSA or Ed25519 signature
 # - signing_address: The address that signed this response
+# - signing_algo: The signature algorithm
 ```
 
-The response gives you everything needed for verification. The `text` field contains hashes of your request and the AI's response, separated by a colon. The `signature` is the cryptographic proof from the TEE. The `signing_address` identifies which TEE instance signed this response.
+The response gives you everything needed for verification. The `text` field contains hashes of your request and the AI's response. Some responses include the model name first, so the format is either `request_hash:response_hash` or `model:request_hash:response_hash`. The `signature` is the cryptographic proof from the TEE. The `signing_address` identifies which TEE instance signed this response.
 
 ## Verify request and response hashes
 
-Confirm the hashes in the `text` field match your actual request and response. The `text` field format is `request_hash:response_hash`.
+Confirm the hashes in the `text` field match your actual request and response. Hashes are byte-sensitive, so production verifiers should hash the exact serialized request body and response body sent over the wire.
 
 ```python
 from hashlib import sha256
@@ -53,9 +54,13 @@ response_body = '{"id": "...", "choices": [...], ...}'     # Full response JSON
 request_hash = sha256_text(request_body_json)
 response_hash = sha256_text(response_body)
 
-# Parse the signed hashes
-hashed_text = signature_data["text"]
-request_hash_server, response_hash_server = hashed_text.split(":")
+# Parse the signed hashes.
+# Format can be either request_hash:response_hash or model:request_hash:response_hash.
+parts = signature_data["text"].split(":")
+if len(parts) == 3:
+    signed_model, request_hash_server, response_hash_server = parts
+else:
+    request_hash_server, response_hash_server = parts
 
 # Verify they match
 assert request_hash == request_hash_server
@@ -109,19 +114,27 @@ attestation_response = requests.get(
 )
 attestation_report = attestation_response.json()
 
-# If using multi-server deployment, filter for matching signing address
-if "all_attestations" in attestation_report:
-    attestation = next(
-        item for item in attestation_report["all_attestations"]
-        if item["signing_address"].lower() == signing_address.lower()
-    )
-else:
-    attestation = attestation_report
+def find_attestation_for_signer(report, signing_address):
+    candidates = []
+    if "gateway_attestation" in report:
+        candidates.append(report["gateway_attestation"])
+    candidates.extend(report.get("model_attestations", []))
+    candidates.extend(report.get("all_attestations", []))
+    if "signing_address" in report:
+        candidates.append(report)
+
+    for item in candidates:
+        if item.get("signing_address", "").lower() == signing_address.lower():
+            return item
+
+    raise ValueError("No attestation found for signing address")
+
+attestation = find_attestation_for_signer(attestation_report, signing_address)
 
 print(f"Found attestation for: {attestation['signing_address']}")
 ```
 
-In multi-server deployments, the response may include `all_attestations` array containing attestations from multiple backend servers. You filter by `signing_address` to find the one matching your signature.
+In multi-server or two-layer deployments, the response may include several attestations. Filter by `signing_address` to find the one matching your response signature.
 
 ### Verify the attestation
 
@@ -142,7 +155,7 @@ This gives you an independent third-party verification that the signature is val
 
 ## Complete example
 
-For a full implementation that verifies both attestation and signatures, see the [signature verifier example](https://github.com/Phala-Network/private-ml-sdk/blob/main/vllm-proxy/verifiers/signature_verifier.py).
+For a full raw Python implementation that verifies both attestation and signatures, see the [signature verifier example](https://github.com/Phala-Network/private-ml-sdk/blob/main/vllm-proxy/verifiers/signature_verifier.py).
 
 This script demonstrates the complete flow:
 1. Send chat completion request (streaming or non-streaming)