Errors

This page standardizes the HTTP responses and OpenAI-style error bodies produced by the unified inference pipeline. Use it for frontend/backend integration, monitoring, and alert routing.

1. Error object schema (OpenAI-style)

All errors return an application/json body:

{
  "error": {
    "type": "invalid_request_error | rate_limit_exceeded | service_unavailable | timeout | internal_error | server_error",
    "code": "unsupported_provider | executor_binding_validation_failed | model_not_found | invalid_api_key | permission_denied | rate_limit_exceeded | model_fetch_error | internal_error | service_unavailable | orchestrator_missing | closed_source_service_unavailable | timeout",
    "message": "Human-readable description",
    "param": "optional, when a specific field is invalid",
    "provider": "optional, provider/model involved",
    "request_id": "optional, for tracing"
  }
}

2. Status code matrix (overview)

HTTP

Source (entry)

error.type

error.code

Typical trigger

Fallback Notes

400 Bad Request

UnifiedInferenceService

invalid_request_error

unsupported_provider

Request explicitly selects an unsupported provider

Client error — no fallback

400 Bad Request

ExecutorBindingValidatorService

invalid_request_error

executor_binding_validation_failed

API key is bound to an incompatible executor type

Stop and fix binding

400 / 404

ExecutorBindingValidatorService

invalid_request_error

model_not_found

Model info not found during validation

Check model config/binding

401 Unauthorized

SimpleErrorClassifier

invalid_request_error

invalid_api_key

Auth failure / invalid API key

Update client credentials

403 Forbidden

SimpleErrorClassifier

invalid_request_error

permission_denied

User binding/permission check failed

No fallback

404 Not Found

SimpleErrorClassifier

invalid_request_error

model_not_found

Model does not exist or is not accessible

Confirm model id

429 Too Many Requests

sendErrorResponse / SimpleErrorClassifier

rate_limit_exceeded

Rate limit or quota exceeded

Fallbackable

500 Internal Server Error

UnifiedInferenceService

internal_error

model_fetch_error

Failed to fetch model list from storage

Infra investigation

500 Internal Server Error

sendErrorResponse

server_error

internal_error

Unclassified internal errors

Fallbackable

502 Bad Gateway

sendErrorResponse

service_unavailable

Executor or downstream failed (keyword matched)

Fallbackable

503 Service Unavailable

UnifiedInferenceService

service_unavailable

orchestrator_missing / closed_source_service_unavailable

Orchestrator not injected / closed-source gRPC client missing

Check dependencies; retry/fallback

503 Service Unavailable

sendErrorResponse

service_unavailable

Keywords like “no healthy executors”, “service unavailable”

Fallback to other providers

504 Gateway Timeout

sendErrorResponse

timeout

Executor selection or downstream call timed out

Fallbackable

3. Primary sources & mapping logic

3.1 UnifiedInferenceService

On HttpException, pass through status and body.
validateOrchestrator missing → 503 service_unavailable (orchestrator_missing).
Closed-source gRPC client missing → 503 service_unavailable (closed_source_service_unavailable).
Unsupported provider → 400 invalid_request_error (unsupported_provider).
getModelList storage call failed → 500 internal_error (model_fetch_error).
sendErrorResponse uses mapErrorToHttpStatus (keyword heuristics):
- contains “no healthy executors” / “service unavailable” → 503
- contains “rate limit” / “quota” → 429
- contains “timeout” → 504
- contains “invalid” / “bad request” → 400
- otherwise → 500

3.2 ExecutorBindingValidatorService

On validation failure, constructs an OpenAI-compatible error body and throws HttpException.
Typical cases: missing model binding; executor type mismatch.

3.3 SimpleErrorClassifier & Formatter

For HttpException, preserve original status; map to UnifiedErrorType:
- 401 → invalid_api_key
- 403 → permission_denied
- 404 → model_not_found
- 429 → rate_limit_exceeded
- other 4xx → invalid_request
- any 5xx → internal_error
For generic errors, classify by keywords (same heuristic as above), then emit OpenAI-style body.

4. Fallback policy

shouldAttemptFallback treats the following HTTP statuses as fallbackable: 500, 502, 503, 504, 429.

Additionally, if the error message contains any of:

TIMEOUT, CONNECTION_ERROR, SERVICE_UNAVAILABLE,
RATE_LIMITED, EXECUTOR_UNAVAILABLE, LOAD_BALANCING_FAILED

then fallback logic is engaged (try other providers/executors per policy).

Client guidance: Frontends/SDKs should detect these statuses/codes and apply retry with jittered backoff and provider/model fallback, where appropriate.

5. Monitoring & alerting (minimal checklist)

Group by status, error.type, error.code, provider, model, region.
Create SLOs on: success rate, P50/P95 latency, fallback success rate, timeout ratio.
Set high-urgency alerts for sustained spikes in 5xx/service_unavailable/timeout, and for “no healthy executors”.
Track 429 separately (quota/rate policies) and surface actionable guidance to users.

6. Example error bodies

400 · unsupported_provider

{
  "error": {
    "type": "invalid_request_error",
    "code": "unsupported_provider",
    "message": "Provider 'acme-llm' is not supported."
  }
}

401 · invalid_api_key

{
  "error": {
    "type": "invalid_request_error",
    "code": "invalid_api_key",
    "message": "The API key is invalid or expired."
  }
}

429 · rate_limit_exceeded

{
  "error": {
    "type": "rate_limit_exceeded",
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please retry later."
  }
}

503 · no healthy executors

{
  "error": {
    "type": "service_unavailable",
    "code": "service_unavailable",
    "message": "No healthy executors available in region 'us-east'."
  }
}

504 · timeout

{
  "error": {
    "type": "timeout",
    "code": "timeout",
    "message": "Downstream call timed out after 30s."
  }
}

PreviousAPI Quickstart & Advanced Examples NextBYOK（Bring Your Own Keys）

Last updated 1 month ago