Architecture Overview
Nexa is engineered for operational panic — the microscopic windows during a Tier-1 hub closure when dozens of agents compete for limited hotel inventory while tens of thousands of stranded passengers hit the platform from their phones. The architecture is built from four patterns that compose: event-driven workflows, read-side isolation for passenger traffic, distributed coordination for inventory and operator concurrency, and anti-corruption adapters for every external partner.
Runtime topology
The three API surfaces (BFFs)
Each external audience hits a dedicated Backend-For-Frontend. They share the same identity model and the same platform behind them, but their authorization, rate limits, and traffic patterns differ enough to deserve separate processes.
| Surface | Audience | Auth | Traffic shape |
|---|---|---|---|
| Operator API | Airline operators (ops console) | OIDC + role-based | High-complexity, transactional, low cardinality of actors |
| Passenger API | Passengers (mobile webapp) | Short-lived passenger tokens | Massive read fan-out — refreshes during a disruption |
| Partner API | B2B integrators | OAuth2 client-credentials, scoped | Bursty, machine-to-machine, per-partner quotas |
See API Reference for endpoints, schemas, and webhook contracts.
Core domains
Each domain owns its data. Cross-domain communication is event-driven over the workflow bus — domains never share a database.
| Domain | Tenant scope | Owns | Partners |
|---|---|---|---|
| Cases | Tenant-scoped | Case + sub-case lifecycle, saga orchestration, durable workflow | None directly |
| Airline adapter | Tenant-scoped | PSS abstraction (Amadeus, Sabre, custom). Zero-persistence — cache only | Amadeus, Sabre, custom PSS |
| Policies | Tenant-scoped | Versioned policy engine, natural-language → structured policy synthesis | AI provider |
| Booking | Tenant-scoped | Inventory search, soft-holds, async fulfillment, saga compensation | Amadeus, Hotelbeds, direct contracts |
| Wallet | Tenant-scoped | Virtual prepaid card issuance, scheduled drops, reconciliation | Pomelo |
| Notifications | Tenant-scoped | Localized voucher delivery (SMS, email, WhatsApp), idempotent send | Twilio, SendGrid, Meta |
| Flight predictor | Shared platform | Dual-head cancel/delay model; ingests public flight + weather data | AeroAPI, AviationStack |
| Audit | Tenant-scoped | Append-only audit log with before/after snapshots | None |
The flight predictor is the only shared-platform domain — it consumes public flight and weather data, has no PII, and serves every tenant from a single deployment. Every other domain is deployed per-tenant for full isolation of data, secrets, and event topics.
The four resilience patterns
1. Event-driven async workflows
Operators never wait on a partner. The booking workflow does.
When an operator clicks "Issue Voucher":
- The case orchestrator records the operator's intent against the sub-case in a single durable step and returns an immediate acknowledgement.
- The booking engine picks up the work asynchronously, gated by per-partner adaptive traffic shaping so the platform never exceeds a partner's published rate limit.
- On success, the saga continues to the next step (wallet card issuance, passenger notification).
- On permanent failure, the saga runs compensation — any partial booking is rolled back automatically.
Two structural properties matter for customer confidence: state and event are committed together (no "we updated state but lost the next step" failure mode), and the operator UI never blocks on a partner.
2. Read-side isolation for passenger traffic
The passenger surface reads from a denormalized snapshot store that is completely separate from the operational case database.
When a sub-case becomes ready for passenger acceptance, the case orchestrator compiles a lightweight passenger document — hotel address and QR code, transport instructions, voucher, card reveal link — and writes it to the snapshot store. Passenger refreshes hit only that store. Tens of thousands of concurrent passenger refreshes cause zero contention with the operator UI.
3. Distributed coordination
Two flavors of coordination, both invisible to customers:
- Entity locks prevent two operators from editing the same passenger simultaneously. The lock is coupled to the operator's live session — the moment the session ends, the lock releases. A wall-clock fallback exists as defense-in-depth, but the live-session signal is authoritative. The locked-by indicator propagates to every other operator's screen in real time.
- Inventory soft-holds prevent two operators from booking the last room. Available inventory is computed from the live set of holds plus confirmed reservations — there is no shared mutable counter that can drift under crash. Holds are self-healing: an abandoned attempt's capacity is restored automatically.
These patterns are described in customer-confidence terms in How Nexa survives operational panic.
4. Anti-corruption adapters
Every external partner sits behind an interface-typed adapter. The orchestration layer never sees vendor-specific JSON; the canonical data model is a Nexa-internal shape that all adapters translate to and from.
Adapters share three engineering primitives:
- Adaptive traffic shaping per partner — platform-wide, so adding worker capacity never bursts a partner's rate limit.
- Circuit breakers per partner — an outage at one partner does not disable another. Recovery is automatic when the partner's signals return to healthy.
- Canonical translation — adapter responses normalize into the same internal shapes for inventory and fulfillment regardless of which partner produced them.
If a partner changes a contract, the adapter absorbs it; the rest of the platform doesn't notice.
Storage shape
| Store | Purpose |
|---|---|
| Operational store | Cases, sub-cases, reservations, policies, wallet, durable workflow record. Sized for transactional consistency. |
| Snapshot store | Read-side store for passenger traffic. Horizontally scalable; isolated from the operational store. |
| Coordination store | Distributed locks, inventory soft-holds, real-time presence. Transient state with self-healing semantics. |
| Workflow bus | Cross-domain events and async commands. All tenant-relevant topics are tenant-scoped; platform-internal observability is separate. |
Tenant isolation
A token issued for one airline cannot read another airline's data. Isolation is layered:
- Identity claims — the tenant identity is stamped at login and forwarded across every internal call.
- Application-level guard — every query is automatically filtered by tenant. CI asserts the guard is wired up at boot for every endpoint.
- Per-tenant data partitioning — the operational store is partitioned per-tenant; analytical reads use row-level security.
- Per-tenant event topics — topic-level access controls prevent a tenant's consumer from reading another tenant's events.
- Per-tenant secrets — partner credentials are per-tenant, sourced at runtime, never logged.
Cross-tenant reads are impossible by construction, even for "platform-wide" partners (regulators, auditors). Each gets a tenant-specific registration per airline they oversee.
Identifiers (URN scheme)
Every identifier — internal or external — is a URN. URNs are structural: the partner and status are part of the identifier, not side fields.
| URN | Example |
|---|---|
| Case | urn:case:c-7f8e1 |
| Sub-case | urn:sub-case:sc-91a4 |
| PNR | urn:pnr:XYZ123:vendor:amadeus |
| Reservation | urn:reservation:r-8842:vendor:amadeus:status:confirmed |
| Hotel offer (ephemeral) | urn:offer:8842 |
| Issued card | urn:issued-card:ic-44b1 |
| Correlation (W3C) | urn:correlation:<trace-id> |
URNs are routable: extracting the partner from a URN tells the booking engine which adapter to dispatch to without an if/else chain. Any URN value in any payload is safe to log and compare across surfaces.
See Data model for the full URN registry.
What's next
- Case Lifecycle — the case + sub-case state machine and the saga in detail.
- Data Model — every URN type and the cross-domain references.
- How Nexa survives operational panic — application-level resilience.
- How Nexa stays available — infrastructure-level resilience.
- Public API Reference — operator, passenger, partner, and webhooks.