Gateway Strategy & Fallback (Gateway ↔ Executor)

What is a Strategy?

In SightAI Gateway, a strategy defines the mandatory decision path for handling a request after the User API Key has been resolved to its Binding.

  • Exclusive: each request follows exactly one strategy path.

  • Specifies:

    • Routing behavior (which Provider Key / Channel is used).

    • Failure semantics (what to do on 4xx / 5xx / timeout / rate-limit).

    • Decision authority (Gateway vs. Executor).

  • Goals: auditable, predictable handling; fair attribution for Brokers and Users.

Rollout note: Strategy A is the default and must be supported first. Strategies B and C can be enabled progressively behind flags.


Available Strategies

A) Strict Binding, No Fallback (default)

  • Routing: Always use the bound Provider Key in the Binding.

  • Failure: Fail immediately — no intra-channel retries, no cross-channel substitution.

  • Decision: Enforced by Binding policy (default ON).

  • Use case: Broker protection, clean auditing.

  • Outcome: STRICT_OK / STRICT_FAIL.


B) Intra-Channel Fallback (Executor-driven)

  • Routing: The Executor may attempt substitute keys within the same Channel.

  • Modes:

    • KEYSET_ONLY (recommended default): only keys under the same Provider Account.

    • CHANNEL_WIDE (optional): any healthy key in the Channel (may change attribution).

  • Failure: If (limited) retries are exhausted, return failure.

  • Decision: Requires Provider permission and User opt-in; concrete attempts are performed by the Executor.

  • Outcome: INTRA_OK / INTRA_FAIL.


C) Cross-Channel Fallback (Gateway-orchestrated)

  • Routing: If the primary Channel fails, the Gateway may attempt one backup Channel.

  • Failure: If the backup fails, stop — no second cross-Channel attempt.

  • Decision: Requires Provider permission and User opt-in.

  • Constraint: Backup must be in the intersection of Provider and User allow-lists.

  • Outcome: XCHANNEL_OK / XCHANNEL_FAIL.


Configuration

Provider Config (capability / upper bounds)

# Intra-channel fallback capability
intraChannelFallback: OFF | KEYSET_ONLY | CHANNEL_WIDE   # default: OFF

# Cross-channel fallback capability
crossChannelFallback:
  enabled: false                                         # default: false
  allowList: []                                          # e.g., ["azure-openai"]

# Platform caps (non-configurable by tenants)
platformCaps:
  intraMaxRetries: 1                                     # typical default cap

Notes

  • Provider grants permission, not obligation. Users still decide whether to use these capabilities.

  • Platform may cap attempts (e.g., 1 intra-Channel retry) to protect latency.

User Config (choice within Provider’s permission)

strictBinding: true                                      # default: true

allowIntraChannel: false                                 # default: false

allowCrossChannel:
  enabled: false                                         # default: false
  preferredBackup: null                                  # e.g., "azure-openai"

Effective policy = intersection of Provider and User configs.


Request Lifecycle & Control Flow

Entry & Preparation

  1. Gateway receives request (with User API Key).

  2. Binding resolution: Look up Binding from cache (incl. bindingVersion); cache-miss triggers authoritative fetch + fill.

  3. Collect configs:

    • Provider: intraChannelFallback, crossChannelFallback.enabled/allowList.

    • User: strictBinding, allowIntraChannel, allowCrossChannel.enabled/preferredBackup.

  4. Compute Effective Strategy:

    • If strictBinding = ON ⇒ Strategy A directly.

    • Else:

      • Effective Intra = Provider permits AND User allows (mode per Provider).

      • Effective Cross = Provider permits AND User allows AND backup ∈ both allow-lists.

    • Attempt budget: at most 2 total upstream attempts per request (Primary 1 + Cross-Channel backup 1).

    • Time budget: each attempt has its own upstream timeout; intra-Channel retries must finish within Executor’s overall time budget.

Step 1 — Primary Attempt

  • Gateway calls the bound Provider Key / Channel via the Channel’s Executor.

  • On success → return immediately.

    Outcome: STRICT_OK (strategy A) or simply “success” before any fallback is considered.

  • On failure → branch by effective policy.


Branch A — Strict Binding (default)

Triggers:

  • strictBinding = ON, or

  • Effective Intra = false and Effective Cross = false.

Behavior:

  • Executor MUST NOT swap keys within Channel.

  • Gateway MUST NOT attempt any backup Channel.

  • Return failure immediately.

Outcome: STRICT_FAIL (error class e.g., STRICT_KEY_UNAVAILABLE or upstream passthrough).


Branch B — Intra-Channel Fallback (Executor)

Triggers (after primary fails):

  • Effective Intra = true.

Behavior:

  • Executor tries limited substitutions within the same Channel:

    • KEYSET_ONLY (same Provider Account) — recommended to preserve Broker attribution.

    • CHANNEL_WIDE (any healthy key in Channel) — optional, may affect attribution.

  • On success → return immediately; Outcome: INTRA_OK.

  • On failure → return to Gateway; Outcome: INTRA_FAIL (e.g., INTRA_CHANNEL_FALLBACK_EXHAUSTED).

If Intra succeeds, the flow ends. If it fails (or is not enabled), evaluate Cross-Channel (C).


Branch C — Cross-Channel Fallback (Gateway)

Triggers (only if Intra not enabled or failed):

  • Effective Cross = true, and still within the 2-attempt budget.

Behavior:

  • Gateway selects one backup Channel from the Provider ∩ User allow-list (considering preferredBackup).

  • Gateway calls that Channel’s Executor once (no intra retries on the backup attempt).

  • On success → XCHANNEL_OK.

  • On failure → XCHANNEL_FAIL (e.g., CROSS_CHANNEL_FAILED).

  • If policy forbids Cross → POLICY_BLOCKED (CROSS_CHANNEL_FORBIDDEN).


Outcomes & Error Taxonomy

Client-visible error classes (stable & readable)

  • STRICT_KEY_UNAVAILABLE — Strict mode; bound key unavailable; fallback disallowed.

  • INTRA_CHANNEL_FALLBACK_EXHAUSTED — Allowed intra-Channel substitutions all failed.

  • CROSS_CHANNEL_FORBIDDEN — Cross-Channel fallback not permitted by policy.

  • CROSS_CHANNEL_FAILED — One backup Channel was tried and failed.

  • UPSTREAM_PASSTHROUGH — Upstream 401/403/429/5xx surfaced.


Logging & Audit

Each request logs at minimum:

  • bindingId, bindingVersion

  • strategyPath (A|B|C)

  • finalChannel

  • providerAccountUsed, providerKeyUsed

  • outcome (STRICT_OK/FAIL, INTRA_OK/FAIL, XCHANNEL_OK/FAIL, POLICY_BLOCKED)

  • errorClass

  • latency

These fields form Merkle leaves for epoch snapshots and drive accurate revenue attribution.


Typical Presets

  1. Broker-First Protection

  • Provider: Intra=OFF, Cross=OFF

  • User: Strict=ON

    → Strategy A only. Primary failure returns immediately.

  1. Keyset-Only Resilience

  • Provider: Intra=KEYSET_ONLY, Cross=OFF

  • User: Strict=OFF, AllowIntra=ON

    → If primary fails, try one key within the same Provider Account. Success ⇒ INTRA_OK; otherwise INTRA_FAIL.

  1. Cross-Cloud Escape Hatch

  • Provider: Cross=ON (e.g., allow "azure-openai")

  • User: Strict=OFF, AllowIntra=ON (optional), AllowCross=ON (prefers "azure-openai")

    → If primary fails: attempt B (if enabled), else attempt C exactly once. Success ⇒ XCHANNEL_OK; failure ⇒ XCHANNEL_FAIL.


Implementation Notes & Guardrails

  • Keep attempt budget and timeouts conservative to avoid hidden latency tails.

  • Prefer KEYSET_ONLY for Intra to preserve Broker attribution unless there’s an explicit need for Channel-wide resilience.

  • Ensure audit fields are written before returning, for both success and failure paths.

  • Treat Binding cache as authoritative for the request lifetime; include bindingVersion in logs to anchor auditability.

Last updated