Phala-Network · Marvin-Cypher · May 15, 2026 · May 15, 2026
diff --git a/docs.json b/docs.json
@@ -433,6 +433,17 @@
                     "pages": [
                       "/phala-cloud/confidential-ai/confidential-model/confidential-ai-api",
                       "/phala-cloud/confidential-ai/confidential-gpu/model-template",
+                      {
+                        "group": "API Reference",
+                        "pages": [
+                          "/phala-cloud/confidential-ai/confidential-model/api-reference/chat-completions",
+                          "/phala-cloud/confidential-ai/confidential-model/api-reference/models",
+                          "/phala-cloud/confidential-ai/confidential-model/api-reference/attestation",
+                          "/phala-cloud/confidential-ai/confidential-model/api-reference/signature",
+                          "/phala-cloud/confidential-ai/confidential-model/api-reference/embeddings",
+                          "/phala-cloud/confidential-ai/confidential-model/api-reference/embedding-models"
+                        ]
+                      },
                       "/phala-cloud/confidential-ai/confidential-model/tool-calling",
                       "/phala-cloud/confidential-ai/confidential-model/images-and-vision",
                       "/phala-cloud/confidential-ai/confidential-model/structured-output",
@@ -2122,4 +2133,4 @@
   "thumbnails": {
     "background": "/images/phala-docs-og.png"
   }
-}
+}
diff --git a/phala-cloud/confidential-ai/confidential-model/api-reference/attestation.mdx b/phala-cloud/confidential-ai/confidential-model/api-reference/attestation.mdx
@@ -0,0 +1,165 @@
+---
+title: Attestation Report
+description: Fetch TEE attestation evidence for a Confidential AI model.
+---
+
+## Endpoint
+
+```bash
+GET https://api.redpill.ai/v1/attestation/report?model={model_id}&nonce={nonce}&signing_address={address}
+```
+
+The attestation report proves a model endpoint is backed by TEE hardware and provides the evidence needed for hardware, software, and signer binding checks.
+
+<Warning>
+Always include a fresh random `nonce` when fetching attestations for security-sensitive verification. A nonce prevents replay of an older valid attestation.
+</Warning>
+
+## Parameters
+
+<ParamField query="model" type="string" required>
+  Model ID to attest.
+
+  Examples: `phala/qwen3.5-27b`, `phala/qwen-2.5-7b-instruct`, `openai/gpt-oss-120b`, `z-ai/glm-5`.
+</ParamField>
+
+<ParamField query="nonce" type="string">
+  Fresh 32-byte random value encoded as 64 hex characters. The nonce is embedded in the TEE report data.
+</ParamField>
+
+<ParamField query="signing_address" type="string">
+  Ethereum address or public key used to filter attestations in multi-server deployments. Use this when binding a response signature to a specific TEE signer.
+</ParamField>
+
+## Examples
+
+<CodeGroup>
+```bash cURL
+NONCE=$(openssl rand -hex 32)
+
+curl "https://api.redpill.ai/v1/attestation/report?model=phala/qwen3.5-27b&nonce=$NONCE" \
+  -H "Authorization: Bearer <API_KEY>"
+```
+
+```python Python
+import secrets
+import requests
+
+nonce = secrets.token_hex(32)
+
+response = requests.get(
+    "https://api.redpill.ai/v1/attestation/report",
+    params={
+        "model": "phala/qwen3.5-27b",
+        "nonce": nonce,
+    },
+    headers={"Authorization": "Bearer <API_KEY>"},
+)
+
+attestation = response.json()
+```
+</CodeGroup>
+
+## Response Formats
+
+The response format depends on the provider behind the model.
+
+### Phala / NearAI Two-Layer Format
+
+Models may return separate gateway and model attestations:
+
+```json
+{
+  "gateway_attestation": {
+    "signing_address": "0x...",
+    "signing_algo": "ecdsa",
+    "intel_quote": "hex-encoded-tdx-quote",
+    "event_log": [],
+    "report_data": "...",
+    "request_nonce": "...",
+    "info": {
+      "vm_config": "..."
+    }
+  },
+  "model_attestations": [
+    {
+      "model_name": "phala/qwen3.5-27b",
+      "signing_address": "0x...",
+      "signing_algo": "ecdsa",
+      "intel_quote": "hex-encoded-tdx-quote",
+      "nvidia_payload": "{...json gpu attestation...}",
+      "event_log": [],
+      "info": {
+        "tcb_info": "{...app_compose...}",
+        "vm_config": "..."
+      }
+    }
+  ]
+}
+```
+
+### Chutes Format
+
+Some models return Chutes-style instance attestations:
+
+```json
+{
+  "attestation_type": "chutes",
+  "nonce": "...",
+  "all_attestations": [
+    {
+      "instance_id": "uuid",
+      "nonce": "...",
+      "intel_quote": "base64-encoded-tdx-quote",
+      "gpu_evidence": [
+        { "certificate": "...", "evidence": "...", "arch": "HOPPER" }
+      ],
+      "e2e_pubkey": "..."
+    }
+  ]
+}
+```
+
+### Flat Format
+
+Older Phala-native responses may expose fields at the top level:
+
+```json
+{
+  "signing_address": "0x...",
+  "signing_algo": "ecdsa",
+  "request_nonce": "...",
+  "intel_quote": "hex-encoded-tdx-quote",
+  "nvidia_payload": "{...}",
+  "info": {
+    "tcb_info": "{\"app_compose\":\"...\"}"
+  }
+}
+```
+
+## Important Fields
+
+| Field | Description |
+|-------|-------------|
+| `signing_address` | Address or key used by the TEE to sign responses |
+| `signing_algo` | Signature algorithm, commonly `ecdsa` |
+| `request_nonce` / `nonce` | Nonce included in the attestation |
+| `intel_quote` | Intel TDX quote for CPU TEE verification |
+| `nvidia_payload` | NVIDIA GPU attestation payload |
+| `event_log` | Boot event log for software stack verification |
+| `info.vm_config` | VM configuration evidence |
+| `info.tcb_info.app_compose` | Docker Compose application evidence |
+| `gateway_attestation` | Gateway TEE attestation |
+| `model_attestations` | One or more model runtime attestations |
+| `all_attestations` | Provider-specific list of model instance attestations |
+
+## Verification Flow
+
+1. Generate a fresh nonce.
+2. Fetch an attestation report for the exact model.
+3. Verify the Intel TDX quote.
+4. Verify GPU evidence when `nvidia_payload` or `gpu_evidence` is present.
+5. Confirm the report data binds the nonce and expected signing address.
+6. Verify application measurements such as compose hash and image provenance when available.
+
+For a walkthrough, see [Verify Attestation](/phala-cloud/confidential-ai/verify/verify-attestation).
diff --git a/phala-cloud/confidential-ai/confidential-model/api-reference/chat-completions.mdx b/phala-cloud/confidential-ai/confidential-model/api-reference/chat-completions.mdx
@@ -0,0 +1,157 @@
+---
+title: Chat Completions
+description: Create OpenAI-compatible chat completion responses with Confidential AI models.
+---
+
+## Endpoint
+
+```bash
+POST https://api.redpill.ai/v1/chat/completions
+```
+
+Creates a response for a chat conversation. Use the same OpenAI-compatible request shape you already use with the OpenAI SDK, then set the base URL to `https://api.redpill.ai/v1`.
+
+## Request Body
+
+<ParamField body="model" type="string" required>
+  Model ID to use for completion.
+
+  Examples: `phala/qwen3.5-27b`, `phala/gemma-3-27b-it`, `z-ai/glm-5`, `openai/gpt-oss-120b`.
+</ParamField>
+
+<ParamField body="messages" type="array" required>
+  Conversation messages. Each message includes `role` and `content`.
+
+  ```json
+  [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": "Explain GPU TEE in one paragraph."}
+  ]
+  ```
+</ParamField>
+
+<ParamField body="temperature" type="number">
+  Sampling temperature. Typical range is `0` to `2`.
+</ParamField>
+
+<ParamField body="max_tokens" type="integer">
+  Maximum number of output tokens for most open models and GPU TEE models.
+</ParamField>
+
+<ParamField body="max_completion_tokens" type="integer">
+  Maximum output tokens for newer OpenAI reasoning models that do not accept `max_tokens`.
+</ParamField>
+
+<ParamField body="stream" type="boolean">
+  Set to `true` to receive server-sent event chunks.
+</ParamField>
+
+<ParamField body="tools" type="array">
+  Function/tool definitions that supported models can call.
+</ParamField>
+
+<ParamField body="tool_choice" type="string | object">
+  Controls whether the model may call tools. Common values are `auto`, `none`, or a specific tool selection object.
+</ParamField>
+
+<ParamField body="response_format" type="object">
+  Requests structured output from supported models, including JSON schema mode.
+</ParamField>
+
+## Examples
+
+<CodeGroup>
+```bash cURL
+curl https://api.redpill.ai/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer <API_KEY>" \
+  -d '{
+    "model": "phala/qwen3.5-27b",
+    "messages": [
+      {"role": "user", "content": "What privacy guarantees does GPU TEE provide?"}
+    ]
+  }'
+```
+
+```python Python
+from openai import OpenAI
+
+client = OpenAI(
+    api_key="<API_KEY>",
+    base_url="https://api.redpill.ai/v1",
+)
+
+response = client.chat.completions.create(
+    model="phala/qwen3.5-27b",
+    messages=[
+        {"role": "user", "content": "What privacy guarantees does GPU TEE provide?"}
+    ],
+)
+
+print(response.choices[0].message.content)
+```
+
+```typescript TypeScript
+import OpenAI from "openai";
+
+const client = new OpenAI({
+  apiKey: "<API_KEY>",
+  baseURL: "https://api.redpill.ai/v1",
+});
+
+const response = await client.chat.completions.create({
+  model: "phala/qwen3.5-27b",
+  messages: [
+    { role: "user", content: "What privacy guarantees does GPU TEE provide?" },
+  ],
+});
+
+console.log(response.choices[0].message.content);
+```
+</CodeGroup>
+
+## Response
+
+```json
+{
+  "id": "chatcmpl-123",
+  "object": "chat.completion",
+  "created": 1677652288,
+  "model": "phala/qwen3.5-27b",
+  "choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "GPU TEE protects inference by..."
+      },
+      "finish_reason": "stop"
+    }
+  ],
+  "usage": {
+    "prompt_tokens": 16,
+    "completion_tokens": 48,
+    "total_tokens": 64
+  }
+}
+```
+
+The `id` field is the request ID. Use it with [Request Signature](/phala-cloud/confidential-ai/confidential-model/api-reference/signature) when you need cryptographic proof for this specific response.
+
+## Feature Notes
+
+- Streaming uses the same `stream: true` option as the OpenAI API.
+- Vision models accept multimodal `content` arrays with `image_url` entries.
+- Tool calling uses OpenAI-compatible `tools`, `tool_choice`, assistant `tool_calls`, and tool response messages.
+- Structured output uses `response_format` on supported models.
+
+## Next Steps
+
+<CardGroup cols={2}>
+  <Card title="List Models" icon="list" href="/phala-cloud/confidential-ai/confidential-model/api-reference/models">
+    Discover available Confidential AI models and capabilities
+  </Card>
+  <Card title="Verify Responses" icon="signature" href="/phala-cloud/confidential-ai/confidential-model/api-reference/signature">
+    Fetch the signature for a chat completion response
+  </Card>
+</CardGroup>