Errors
This page standardizes the HTTP responses and OpenAI-style error bodies produced by the unified inference pipeline. Use it for frontend/backend integration, monitoring, and alert routing.
1. Error object schema (OpenAI-style)
All errors return an application/json body:
{
"error": {
"type": "invalid_request_error | rate_limit_exceeded | service_unavailable | timeout | internal_error | server_error",
"code": "unsupported_provider | executor_binding_validation_failed | model_not_found | invalid_api_key | permission_denied | rate_limit_exceeded | model_fetch_error | internal_error | service_unavailable | orchestrator_missing | closed_source_service_unavailable | timeout",
"message": "Human-readable description",
"param": "optional, when a specific field is invalid",
"provider": "optional, provider/model involved",
"request_id": "optional, for tracing"
}
}
2. Status code matrix (overview)
HTTP
Source (entry)
error.type
error.code
Typical trigger
Fallback Notes
400 Bad Request
UnifiedInferenceService
invalid_request_error
unsupported_provider
Request explicitly selects an unsupported provider
Client error — no fallback
400 Bad Request
ExecutorBindingValidatorService
invalid_request_error
executor_binding_validation_failed
API key is bound to an incompatible executor type
Stop and fix binding
400 / 404
ExecutorBindingValidatorService
invalid_request_error
model_not_found
Model info not found during validation
Check model config/binding
401 Unauthorized
SimpleErrorClassifier
invalid_request_error
invalid_api_key
Auth failure / invalid API key
Update client credentials
403 Forbidden
SimpleErrorClassifier
invalid_request_error
permission_denied
User binding/permission check failed
No fallback
404 Not Found
SimpleErrorClassifier
invalid_request_error
model_not_found
Model does not exist or is not accessible
Confirm model id
429 Too Many Requests
sendErrorResponse / SimpleErrorClassifier
rate_limit_exceeded
rate_limit_exceeded
Rate limit or quota exceeded
Fallbackable
500 Internal Server Error
UnifiedInferenceService
internal_error
model_fetch_error
Failed to fetch model list from storage
Infra investigation
500 Internal Server Error
sendErrorResponse
server_error
internal_error
Unclassified internal errors
Fallbackable
502 Bad Gateway
sendErrorResponse
service_unavailable
service_unavailable
Executor or downstream failed (keyword matched)
Fallbackable
503 Service Unavailable
UnifiedInferenceService
service_unavailable
orchestrator_missing / closed_source_service_unavailable
Orchestrator not injected / closed-source gRPC client missing
Check dependencies; retry/fallback
503 Service Unavailable
sendErrorResponse
service_unavailable
service_unavailable
Keywords like “no healthy executors”, “service unavailable”
Fallback to other providers
504 Gateway Timeout
sendErrorResponse
timeout
timeout
Executor selection or downstream call timed out
Fallbackable
3. Primary sources & mapping logic
3.1 UnifiedInferenceService
On HttpException, pass through status and body.
validateOrchestrator missing → 503 service_unavailable (orchestrator_missing).
Closed-source gRPC client missing → 503 service_unavailable (closed_source_service_unavailable).
Unsupported provider → 400 invalid_request_error (unsupported_provider).
getModelList storage call failed → 500 internal_error (model_fetch_error).
sendErrorResponse uses mapErrorToHttpStatus (keyword heuristics):
contains “no healthy executors” / “service unavailable” → 503
contains “rate limit” / “quota” → 429
contains “timeout” → 504
contains “invalid” / “bad request” → 400
otherwise → 500
3.2 ExecutorBindingValidatorService
On validation failure, constructs an OpenAI-compatible error body and throws HttpException.
Typical cases: missing model binding; executor type mismatch.
3.3 SimpleErrorClassifier & Formatter
For HttpException, preserve original status; map to UnifiedErrorType:
401 → invalid_api_key
403 → permission_denied
404 → model_not_found
429 → rate_limit_exceeded
other 4xx → invalid_request
any 5xx → internal_error
For generic errors, classify by keywords (same heuristic as above), then emit OpenAI-style body.
4. Fallback policy
shouldAttemptFallback treats the following HTTP statuses as fallbackable: 500, 502, 503, 504, 429.
Additionally, if the error message contains any of:
TIMEOUT, CONNECTION_ERROR, SERVICE_UNAVAILABLE,
RATE_LIMITED, EXECUTOR_UNAVAILABLE, LOAD_BALANCING_FAILED
then fallback logic is engaged (try other providers/executors per policy).
Client guidance: Frontends/SDKs should detect these statuses/codes and apply retry with jittered backoff and provider/model fallback, where appropriate.
5. Monitoring & alerting (minimal checklist)
Group by status, error.type, error.code, provider, model, region.
Create SLOs on: success rate, P50/P95 latency, fallback success rate, timeout ratio.
Set high-urgency alerts for sustained spikes in 5xx/service_unavailable/timeout, and for “no healthy executors”.
Track 429 separately (quota/rate policies) and surface actionable guidance to users.
6. Example error bodies
400 · unsupported_provider
{
"error": {
"type": "invalid_request_error",
"code": "unsupported_provider",
"message": "Provider 'acme-llm' is not supported."
}
}
401 · invalid_api_key
{
"error": {
"type": "invalid_request_error",
"code": "invalid_api_key",
"message": "The API key is invalid or expired."
}
}
429 · rate_limit_exceeded
{
"error": {
"type": "rate_limit_exceeded",
"code": "rate_limit_exceeded",
"message": "Rate limit exceeded. Please retry later."
}
}
503 · no healthy executors
{
"error": {
"type": "service_unavailable",
"code": "service_unavailable",
"message": "No healthy executors available in region 'us-east'."
}
}
504 · timeout
{
"error": {
"type": "timeout",
"code": "timeout",
"message": "Downstream call timed out after 30s."
}
}
Last updated