Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -433,6 +433,17 @@
"pages": [
"/phala-cloud/confidential-ai/confidential-model/confidential-ai-api",
"/phala-cloud/confidential-ai/confidential-gpu/model-template",
{
"group": "API Reference",
"pages": [
"/phala-cloud/confidential-ai/confidential-model/api-reference/chat-completions",
"/phala-cloud/confidential-ai/confidential-model/api-reference/models",
"/phala-cloud/confidential-ai/confidential-model/api-reference/attestation",
"/phala-cloud/confidential-ai/confidential-model/api-reference/signature",
"/phala-cloud/confidential-ai/confidential-model/api-reference/embeddings",
"/phala-cloud/confidential-ai/confidential-model/api-reference/embedding-models"
]
},
"/phala-cloud/confidential-ai/confidential-model/tool-calling",
"/phala-cloud/confidential-ai/confidential-model/images-and-vision",
"/phala-cloud/confidential-ai/confidential-model/structured-output",
Expand Down Expand Up @@ -2122,4 +2133,4 @@
"thumbnails": {
"background": "/images/phala-docs-og.png"
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
---
title: Attestation Report
description: Fetch TEE attestation evidence for a Confidential AI model.
---

## Endpoint

```bash
GET https://api.redpill.ai/v1/attestation/report?model={model_id}&nonce={nonce}&signing_address={address}
```

The attestation report proves a model endpoint is backed by TEE hardware and provides the evidence needed for hardware, software, and signer binding checks.

<Warning>
Always include a fresh random `nonce` when fetching attestations for security-sensitive verification. A nonce prevents replay of an older valid attestation.
</Warning>

## Parameters

<ParamField query="model" type="string" required>
Model ID to attest.

Examples: `phala/qwen3.5-27b`, `phala/qwen-2.5-7b-instruct`, `openai/gpt-oss-120b`, `z-ai/glm-5`.
</ParamField>

<ParamField query="nonce" type="string">
Fresh 32-byte random value encoded as 64 hex characters. The nonce is embedded in the TEE report data.
</ParamField>

<ParamField query="signing_address" type="string">
Ethereum address or public key used to filter attestations in multi-server deployments. Use this when binding a response signature to a specific TEE signer.
</ParamField>

## Examples

<CodeGroup>
```bash cURL
NONCE=$(openssl rand -hex 32)

curl "https://api.redpill.ai/v1/attestation/report?model=phala/qwen3.5-27b&nonce=$NONCE" \
-H "Authorization: Bearer <API_KEY>"
```

```python Python
import secrets
import requests

nonce = secrets.token_hex(32)

response = requests.get(
"https://api.redpill.ai/v1/attestation/report",
params={
"model": "phala/qwen3.5-27b",
"nonce": nonce,
},
headers={"Authorization": "Bearer <API_KEY>"},
)

attestation = response.json()
```
</CodeGroup>

## Response Formats

The response format depends on the provider behind the model.

### Phala / NearAI Two-Layer Format

Models may return separate gateway and model attestations:

```json
{
"gateway_attestation": {
"signing_address": "0x...",
"signing_algo": "ecdsa",
"intel_quote": "hex-encoded-tdx-quote",
"event_log": [],
"report_data": "...",
"request_nonce": "...",
"info": {
"vm_config": "..."
}
},
"model_attestations": [
{
"model_name": "phala/qwen3.5-27b",
"signing_address": "0x...",
"signing_algo": "ecdsa",
"intel_quote": "hex-encoded-tdx-quote",
"nvidia_payload": "{...json gpu attestation...}",
"event_log": [],
"info": {
"tcb_info": "{...app_compose...}",
"vm_config": "..."
}
}
]
}
```

### Chutes Format

Some models return Chutes-style instance attestations:

```json
{
"attestation_type": "chutes",
"nonce": "...",
"all_attestations": [
{
"instance_id": "uuid",
"nonce": "...",
"intel_quote": "base64-encoded-tdx-quote",
"gpu_evidence": [
{ "certificate": "...", "evidence": "...", "arch": "HOPPER" }
],
"e2e_pubkey": "..."
}
]
}
```

### Flat Format

Older Phala-native responses may expose fields at the top level:

```json
{
"signing_address": "0x...",
"signing_algo": "ecdsa",
"request_nonce": "...",
"intel_quote": "hex-encoded-tdx-quote",
"nvidia_payload": "{...}",
"info": {
"tcb_info": "{\"app_compose\":\"...\"}"
}
}
```

## Important Fields

| Field | Description |
|-------|-------------|
| `signing_address` | Address or key used by the TEE to sign responses |
| `signing_algo` | Signature algorithm, commonly `ecdsa` |
| `request_nonce` / `nonce` | Nonce included in the attestation |
| `intel_quote` | Intel TDX quote for CPU TEE verification |
| `nvidia_payload` | NVIDIA GPU attestation payload |
| `event_log` | Boot event log for software stack verification |
| `info.vm_config` | VM configuration evidence |
| `info.tcb_info.app_compose` | Docker Compose application evidence |
| `gateway_attestation` | Gateway TEE attestation |
| `model_attestations` | One or more model runtime attestations |
| `all_attestations` | Provider-specific list of model instance attestations |

## Verification Flow

1. Generate a fresh nonce.
2. Fetch an attestation report for the exact model.
3. Verify the Intel TDX quote.
4. Verify GPU evidence when `nvidia_payload` or `gpu_evidence` is present.
5. Confirm the report data binds the nonce and expected signing address.
6. Verify application measurements such as compose hash and image provenance when available.

For a walkthrough, see [Verify Attestation](/phala-cloud/confidential-ai/verify/verify-attestation).
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
---
title: Chat Completions
description: Create OpenAI-compatible chat completion responses with Confidential AI models.
---

## Endpoint

```bash
POST https://api.redpill.ai/v1/chat/completions
```

Creates a response for a chat conversation. Use the same OpenAI-compatible request shape you already use with the OpenAI SDK, then set the base URL to `https://api.redpill.ai/v1`.

## Request Body

<ParamField body="model" type="string" required>
Model ID to use for completion.

Examples: `phala/qwen3.5-27b`, `phala/gemma-3-27b-it`, `z-ai/glm-5`, `openai/gpt-oss-120b`.
</ParamField>

<ParamField body="messages" type="array" required>
Conversation messages. Each message includes `role` and `content`.

```json
[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain GPU TEE in one paragraph."}
]
```
</ParamField>

<ParamField body="temperature" type="number">
Sampling temperature. Typical range is `0` to `2`.
</ParamField>

<ParamField body="max_tokens" type="integer">
Maximum number of output tokens for most open models and GPU TEE models.
</ParamField>

<ParamField body="max_completion_tokens" type="integer">
Maximum output tokens for newer OpenAI reasoning models that do not accept `max_tokens`.
</ParamField>

<ParamField body="stream" type="boolean">
Set to `true` to receive server-sent event chunks.
</ParamField>

<ParamField body="tools" type="array">
Function/tool definitions that supported models can call.
</ParamField>

<ParamField body="tool_choice" type="string | object">
Controls whether the model may call tools. Common values are `auto`, `none`, or a specific tool selection object.
</ParamField>

<ParamField body="response_format" type="object">
Requests structured output from supported models, including JSON schema mode.
</ParamField>

## Examples

<CodeGroup>
```bash cURL
curl https://api.redpill.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <API_KEY>" \
-d '{
"model": "phala/qwen3.5-27b",
"messages": [
{"role": "user", "content": "What privacy guarantees does GPU TEE provide?"}
]
}'
```

```python Python
from openai import OpenAI

client = OpenAI(
api_key="<API_KEY>",
base_url="https://api.redpill.ai/v1",
)

response = client.chat.completions.create(
model="phala/qwen3.5-27b",
messages=[
{"role": "user", "content": "What privacy guarantees does GPU TEE provide?"}
],
)

print(response.choices[0].message.content)
```

```typescript TypeScript
import OpenAI from "openai";

const client = new OpenAI({
apiKey: "<API_KEY>",
baseURL: "https://api.redpill.ai/v1",
});

const response = await client.chat.completions.create({
model: "phala/qwen3.5-27b",
messages: [
{ role: "user", content: "What privacy guarantees does GPU TEE provide?" },
],
});

console.log(response.choices[0].message.content);
```
</CodeGroup>

## Response

```json
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "phala/qwen3.5-27b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "GPU TEE protects inference by..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 16,
"completion_tokens": 48,
"total_tokens": 64
}
}
```

The `id` field is the request ID. Use it with [Request Signature](/phala-cloud/confidential-ai/confidential-model/api-reference/signature) when you need cryptographic proof for this specific response.

## Feature Notes

- Streaming uses the same `stream: true` option as the OpenAI API.
- Vision models accept multimodal `content` arrays with `image_url` entries.
- Tool calling uses OpenAI-compatible `tools`, `tool_choice`, assistant `tool_calls`, and tool response messages.
- Structured output uses `response_format` on supported models.

## Next Steps

<CardGroup cols={2}>
<Card title="List Models" icon="list" href="/phala-cloud/confidential-ai/confidential-model/api-reference/models">
Discover available Confidential AI models and capabilities
</Card>
<Card title="Verify Responses" icon="signature" href="/phala-cloud/confidential-ai/confidential-model/api-reference/signature">
Fetch the signature for a chat completion response
</Card>
</CardGroup>
Loading
Loading