Chat Completions
Frontend API Routes
Chat Completions
POST
Chat Completions
Chat Completions
OpenAI-compatible chat completions endpoint for confidential AI inference. This endpoint is not exposed by the Umbra frontend server—instead, the frontend connects directly to the provider (vLLM) inside the TEE using authenticated TLS (aTLS) with attestation verification.This is not a Next.js API route. The frontend uses the
confidential-chat.ts library to connect directly to the provider endpoint inside the TEE.Endpoint
NEXT_PUBLIC_VLLM_BASE_URL environment variable or provided by the user in the UI.
Authentication
The aTLS connection is established using the @phala/dcap-qvl-web library, which verifies Intel TDX attestation quotes in-browser.Security Requirements
- Must use HTTPS (except for localhost/127.0.0.1 in development)
- TDX attestation verification via aTLS
- EKM channel binding to prevent MITM attacks
Request Parameters
Model identifier (e.g., “Qwen/Qwen2.5-32B-Instruct”). Configured via
NEXT_PUBLIC_VLLM_MODEL or user settings.Array of message objects with
role (“system”, “user”, or “assistant”) and content (string).Sampling temperature (0.0 to 2.0). Defaults to 0.7.
Maximum tokens to generate. Defaults to 4098.
Enable streaming responses. Defaults to true.
Reasoning effort level: “low”, “medium”, or “high”. For models that support reasoning.
Cache salt for request deduplication (provider-specific).
Request Body
Response (Non-Streaming)
Completion ID
Array of completion choices. Each choice contains:
message: Object withroleandcontentfinish_reason: Reason for completion (“stop”, “length”, etc.)
Non-Streaming Response Example
Response (Streaming)
Whenstream: true, the server returns Server-Sent Events (SSE) with data: prefixed lines.
Streaming Format
choices[0].delta.content: Content chunkchoices[0].delta.reasoning_content: Reasoning chunk (for models that support it)choices[0].finish_reason: Present in final chunk before[DONE]
Error Responses
400 Bad Request
- Invalid request body
- Missing required parameters
max_tokenstoo small or prompt too long
401 Unauthorized
- Invalid or missing API key
503 Service Unavailable
- Provider unreachable
- Connection timeout
- TLS/certificate error
Example
Message Validation
Theconfidential-chat.ts library validates all messages:
- Role must be “system”, “user”, or “assistant”
- Content must be a non-empty string
- System message is automatically prepended if not present
Provider Configuration
The frontend supports dynamic provider configuration:- Base URL:
NEXT_PUBLIC_VLLM_BASE_URLor user-provided - Model:
NEXT_PUBLIC_VLLM_MODELor user-provided - API Key: Optional Bearer token
- System Prompt:
NEXT_PUBLIC_DEFAULT_SYSTEM_PROMPTor custom - Temperature:
NEXT_PUBLIC_DEFAULT_TEMPERATURE(default: 0.7) - Max Tokens:
NEXT_PUBLIC_DEFAULT_MAX_TOKENS(default: 4098)
Reasoning Support
Some models support reasoning traces. The library handles:reasoning_contentin message objects- Streaming reasoning deltas via
reasoning_deltachunks - Reasoning effort levels (“low”, “medium”, “high”)
Implementation Details
OpenAI Compatibility: This endpoint follows the OpenAI Chat Completions API specification, making it compatible with standard OpenAI client libraries (though you’ll need custom aTLS fetch for attestation).
Error Interpretation
The library provides helpful error messages:- Max tokens: “This request is larger than the model can process…”
- Auth failure: “Authorization failed. Check the bearer token…”
- Network failure: “Cannot connect to the provider. Please check…”
- CORS: “CORS error: The provider is blocking requests…”
- TLS/SSL: “TLS/SSL certificate error. Please verify…”
- Timeout: “Request timed out. The provider may be overloaded…”
frontend/lib/confidential-chat.ts