Gateway Strategy & Fallback (Gateway ↔ Executor)
What is a Strategy?
In SightAI Gateway, a strategy defines the mandatory decision path for handling a request after the User API Key has been resolved to its Binding.
Exclusive: each request follows exactly one strategy path.
Specifies:
Routing behavior (which Provider Key / Channel is used).
Failure semantics (what to do on 4xx / 5xx / timeout / rate-limit).
Decision authority (Gateway vs. Executor).
Goals: auditable, predictable handling; fair attribution for Brokers and Users.
Rollout note: Strategy A is the default and must be supported first. Strategies B and C can be enabled progressively behind flags.
Available Strategies
A) Strict Binding, No Fallback (default)
Routing: Always use the bound Provider Key in the Binding.
Failure: Fail immediately — no intra-channel retries, no cross-channel substitution.
Decision: Enforced by Binding policy (default ON).
Use case: Broker protection, clean auditing.
Outcome: STRICT_OK / STRICT_FAIL.
B) Intra-Channel Fallback (Executor-driven)
Routing: The Executor may attempt substitute keys within the same Channel.
Modes:
KEYSET_ONLY (recommended default): only keys under the same Provider Account.
CHANNEL_WIDE (optional): any healthy key in the Channel (may change attribution).
Failure: If (limited) retries are exhausted, return failure.
Decision: Requires Provider permission and User opt-in; concrete attempts are performed by the Executor.
Outcome: INTRA_OK / INTRA_FAIL.
C) Cross-Channel Fallback (Gateway-orchestrated)
Routing: If the primary Channel fails, the Gateway may attempt one backup Channel.
Failure: If the backup fails, stop — no second cross-Channel attempt.
Decision: Requires Provider permission and User opt-in.
Constraint: Backup must be in the intersection of Provider and User allow-lists.
Outcome: XCHANNEL_OK / XCHANNEL_FAIL.
Configuration
Provider Config (capability / upper bounds)
# Intra-channel fallback capability
intraChannelFallback: OFF | KEYSET_ONLY | CHANNEL_WIDE # default: OFF
# Cross-channel fallback capability
crossChannelFallback:
enabled: false # default: false
allowList: [] # e.g., ["azure-openai"]
# Platform caps (non-configurable by tenants)
platformCaps:
intraMaxRetries: 1 # typical default cap
Notes
Provider grants permission, not obligation. Users still decide whether to use these capabilities.
Platform may cap attempts (e.g., 1 intra-Channel retry) to protect latency.
User Config (choice within Provider’s permission)
strictBinding: true # default: true
allowIntraChannel: false # default: false
allowCrossChannel:
enabled: false # default: false
preferredBackup: null # e.g., "azure-openai"
Effective policy = intersection of Provider and User configs.
Request Lifecycle & Control Flow
Entry & Preparation
Gateway receives request (with User API Key).
Binding resolution: Look up Binding from cache (incl. bindingVersion); cache-miss triggers authoritative fetch + fill.
Collect configs:
Provider: intraChannelFallback, crossChannelFallback.enabled/allowList.
User: strictBinding, allowIntraChannel, allowCrossChannel.enabled/preferredBackup.
Compute Effective Strategy:
If strictBinding = ON ⇒ Strategy A directly.
Else:
Effective Intra = Provider permits AND User allows (mode per Provider).
Effective Cross = Provider permits AND User allows AND backup ∈ both allow-lists.
Attempt budget: at most 2 total upstream attempts per request (Primary 1 + Cross-Channel backup 1).
Time budget: each attempt has its own upstream timeout; intra-Channel retries must finish within Executor’s overall time budget.
Step 1 — Primary Attempt
Gateway calls the bound Provider Key / Channel via the Channel’s Executor.
On success → return immediately.
Outcome: STRICT_OK (strategy A) or simply “success” before any fallback is considered.
On failure → branch by effective policy.
Branch A — Strict Binding (default)
Triggers:
strictBinding = ON, or
Effective Intra = false and Effective Cross = false.
Behavior:
Executor MUST NOT swap keys within Channel.
Gateway MUST NOT attempt any backup Channel.
Return failure immediately.
Outcome: STRICT_FAIL (error class e.g., STRICT_KEY_UNAVAILABLE or upstream passthrough).
Branch B — Intra-Channel Fallback (Executor)
Triggers (after primary fails):
Effective Intra = true.
Behavior:
Executor tries limited substitutions within the same Channel:
KEYSET_ONLY (same Provider Account) — recommended to preserve Broker attribution.
CHANNEL_WIDE (any healthy key in Channel) — optional, may affect attribution.
On success → return immediately; Outcome: INTRA_OK.
On failure → return to Gateway; Outcome: INTRA_FAIL (e.g., INTRA_CHANNEL_FALLBACK_EXHAUSTED).
If Intra succeeds, the flow ends. If it fails (or is not enabled), evaluate Cross-Channel (C).
Branch C — Cross-Channel Fallback (Gateway)
Triggers (only if Intra not enabled or failed):
Effective Cross = true, and still within the 2-attempt budget.
Behavior:
Gateway selects one backup Channel from the Provider ∩ User allow-list (considering preferredBackup).
Gateway calls that Channel’s Executor once (no intra retries on the backup attempt).
On success → XCHANNEL_OK.
On failure → XCHANNEL_FAIL (e.g., CROSS_CHANNEL_FAILED).
If policy forbids Cross → POLICY_BLOCKED (CROSS_CHANNEL_FORBIDDEN).
Outcomes & Error Taxonomy
Client-visible error classes (stable & readable)
STRICT_KEY_UNAVAILABLE — Strict mode; bound key unavailable; fallback disallowed.
INTRA_CHANNEL_FALLBACK_EXHAUSTED — Allowed intra-Channel substitutions all failed.
CROSS_CHANNEL_FORBIDDEN — Cross-Channel fallback not permitted by policy.
CROSS_CHANNEL_FAILED — One backup Channel was tried and failed.
UPSTREAM_PASSTHROUGH — Upstream 401/403/429/5xx surfaced.
Logging & Audit
Each request logs at minimum:
bindingId, bindingVersion
strategyPath (A|B|C)
finalChannel
providerAccountUsed, providerKeyUsed
outcome (STRICT_OK/FAIL, INTRA_OK/FAIL, XCHANNEL_OK/FAIL, POLICY_BLOCKED)
errorClass
latency
These fields form Merkle leaves for epoch snapshots and drive accurate revenue attribution.
Typical Presets
Broker-First Protection
Provider: Intra=OFF, Cross=OFF
User: Strict=ON
→ Strategy A only. Primary failure returns immediately.
Keyset-Only Resilience
Provider: Intra=KEYSET_ONLY, Cross=OFF
User: Strict=OFF, AllowIntra=ON
→ If primary fails, try one key within the same Provider Account. Success ⇒ INTRA_OK; otherwise INTRA_FAIL.
Cross-Cloud Escape Hatch
Provider: Cross=ON (e.g., allow "azure-openai")
User: Strict=OFF, AllowIntra=ON (optional), AllowCross=ON (prefers "azure-openai")
→ If primary fails: attempt B (if enabled), else attempt C exactly once. Success ⇒ XCHANNEL_OK; failure ⇒ XCHANNEL_FAIL.
Implementation Notes & Guardrails
Keep attempt budget and timeouts conservative to avoid hidden latency tails.
Prefer KEYSET_ONLY for Intra to preserve Broker attribution unless there’s an explicit need for Channel-wide resilience.
Ensure audit fields are written before returning, for both success and failure paths.
Treat Binding cache as authoritative for the request lifetime; include bindingVersion in logs to anchor auditability.
Last updated