Skip to main content

Case Lifecycle

A case represents one disrupted flight; a sub-case represents one PNR (Passenger Name Record) within that case — typically one passenger or a family group. The case orchestrator is the central state machine of the platform; everything else exists to advance a sub-case from PENDING to RESOLVED.

Two levels of state

The case-level state is a coarse summary. The interesting machine is at the sub-case level — every PNR moves independently through it.

The case (macro disruption)

A case is created when:

  • Auto / event-driven — the flight predictor publishes a FlightDisrupted event after detecting a cancellation or significant delay. The case orchestrator consumes the event and provisions the case + sub-cases automatically.
  • Manual / on-demand — an operations supervisor declares a disruption from the operator console (POST /v1/cases/manual). Used for localized issues the predictor missed.

In both flows the manifest is fetched from the airline's PSS (Amadeus, Sabre, or custom) via the airline adapter, sub-cases are bulk-inserted in PENDING, and the operator sees the case live in their dashboard within seconds.

StateMeaning
OPENManifest provisioned. Sub-cases are PENDING. Awaiting first operator action.
IN_PROGRESSAt least one sub-case has moved past PENDING.
CLOSEDEvery sub-case is in a terminal state (RESOLVED or REJECTED_BY_PAX after manual handoff).

The sub-case (the saga unit)

Every transition is enforced by a declarative workflow definition. Side effects — audit, snapshot publishing, real-time UI fan-out, saga compensation — are wired as handlers attached to specific transitions, not inline action functions. Conditions (operator lock, concurrency check) are typed and explicit.

States

StateMeaning
PENDINGInitial state on ingestion. Awaiting operator action. Idle.
PROCESSINGOperator submitted compensation. The booking + wallet workflow is running.
OFFER_READYWorkflow succeeded. The passenger snapshot is updated. The passenger has been notified and can accept or decline. Idle.
RESOLVEDPassenger accepted the offer. Terminal.
REJECTED_BY_PAXPassenger declined. Saga rollback succeeded (room released). Awaiting operator rework. Idle.
FAILEDPartner returned a permanent failure on initial booking. Awaiting operator fallback. Idle.
COMPENSATION_FAILEDSaga rollback could not be unwound automatically — the cancel exhausted retries and the partner returned a permanent-failure code (e.g., non-refundable rate). Operator must reconcile manually. Idle.

Transitions and events

FromToEventNotes
PENDINGPROCESSINGSUBMITOperator submits compensation. Conditions: caller holds the operator lock; concurrency check passes.
PROCESSINGOFFER_READYWALLET_ISSUEDBooking + wallet legs both succeeded. Triggers passenger snapshot update.
PROCESSINGFAILEDBOOKING_FAILEDPartner returned a permanent failure on the booking leg.
OFFER_READYRESOLVEDOFFER_ACCEPTEDPassenger accepted from the mobile webapp.
OFFER_READYREJECTED_BY_PAXOFFER_DECLINEDPassenger declined. Saga compensation begins.
REJECTED_BY_PAXPENDINGOPERATOR_REWORKOperator picks an alternative.
REJECTED_BY_PAXCOMPENSATION_FAILEDCOMPENSATION_UNRECOVERABLEThe cancel exhausted retries AND the partner returned a permanent-failure code. The case is parked in a dedicated reconciliation queue.
COMPENSATION_FAILEDPENDINGOPERATOR_RECONCILEDOperator manually reconciled with the partner (e.g., called the hotel) and marked the reconciliation complete.
FAILEDPENDINGOPERATOR_REWORKOperator retries with different inputs.

RESOLVED is the only terminal state. PENDING, OFFER_READY, REJECTED_BY_PAX, COMPENSATION_FAILED, and FAILED are idle — they can sit indefinitely while waiting on a passenger response or operator action.

The saga (the happy path, end-to-end)

Key properties for customer confidence:

  • The operator returns immediately. The submit step records the intent and returns; every partner call happens out-of-process.
  • State and event are written together. The handler does both in a single durable step. There is no "we updated the database but lost the next step" failure mode.
  • Idempotent handlers. Every consumer evaluates the current sub-case status before acting — duplicate events are acknowledged and dropped.
  • Per-tenant isolation. Workflow topics, secrets, and data are tenant-scoped end-to-end. Cross-tenant interference is structurally impossible.

The decline path (saga rollback)

A passenger decline is the most operationally meaningful failure mode the platform has to handle gracefully:

Operators see two distinct queues for the two failure modes:

  • Manual rework (REJECTED_BY_PAX) — the original cancel succeeded; the operator just needs to find a new option.
  • Reconciliation (COMPENSATION_FAILED) — the original cancel failed permanently; the operator must call the partner directly, document, and mark the reconciliation complete before the sub-case re-enters the rework queue.

This distinction prevents the dangerous mistake of treating a "cancel failed" sub-case the same as a "passenger declined" sub-case. They look superficially similar but require radically different operator action.

Concurrency: the operator swarm

When dozens of airport agents open the same disrupted flight to start rebooking, two correctness problems surface immediately:

  1. Two operators editing the same passenger. Solved by the entity lock. When an operator clicks a sub-case, the operator API acquires the lock for that sub-case. The lock is coupled to the operator's live session; the moment the session ends, the lock releases within a fraction of a second. A short wall-clock fallback exists as defense-in-depth. The operator UI shows a lock icon on locked rows in real time, propagated to every operator across the cluster.

  2. Two operators booking the last room of the same hotel. Solved by the inventory soft-hold. Before submitting compensation, the booking engine acquires a per-attempt hold. The losing operator gets an instant "Hotel sold out" notification. The hold has a self-healing expiry — capacity is restored automatically if the operator abandons the flow, with no compensating action required.

There is no shared mutable inventory counter anywhere in the system. Inventory is computed from the live set of holds plus confirmed reservations.

Concurrent writes (the case-level guarantee)

Sub-case writes use optimistic concurrency control. Every sub-case carries a version number that is checked on every update. If a concurrent writer already moved the sub-case forward between the operator's read and write, the second write fails with a structured 409 conflict and the operator UI surfaces "this case was just updated" — the operator reloads and sees the current state. No lost writes; no silent overwrites.

Where to next

Was this helpful?