Architecture Overview

Nexa is engineered for operational panic — the microscopic windows during a Tier-1 hub closure when dozens of agents compete for limited hotel inventory while tens of thousands of stranded passengers hit the platform from their phones. The architecture is built from four patterns that compose: event-driven workflows, read-side isolation for passenger traffic, distributed coordination for inventory and operator concurrency, and anti-corruption adapters for every external partner.

Runtime topology

The three API surfaces (BFFs)

Each external audience hits a dedicated Backend-For-Frontend. They share the same identity model and the same platform behind them, but their authorization, rate limits, and traffic patterns differ enough to deserve separate processes.

Surface	Audience	Auth	Traffic shape
Operator API	Airline operators (ops console)	OIDC + role-based	High-complexity, transactional, low cardinality of actors
Passenger API	Passengers (mobile webapp)	Short-lived passenger tokens	Massive read fan-out — refreshes during a disruption
Partner API	B2B integrators	OAuth2 client-credentials, scoped	Bursty, machine-to-machine, per-partner quotas

See API Reference for endpoints, schemas, and webhook contracts.

Core domains

Each domain owns its data. Cross-domain communication is event-driven over the workflow bus — domains never share a database.

Domain	Tenant scope	Owns	Partners
Cases	Tenant-scoped	Case + sub-case lifecycle, saga orchestration, durable workflow	None directly
Airline adapter	Tenant-scoped	PSS abstraction (Amadeus, Sabre, custom). Zero-persistence — cache only	Amadeus, Sabre, custom PSS
Policies	Tenant-scoped	Versioned policy engine, natural-language → structured policy synthesis	AI provider
Booking	Tenant-scoped	Inventory search, soft-holds, async fulfillment, saga compensation	Amadeus, Hotelbeds, direct contracts
Wallet	Tenant-scoped	Virtual prepaid card issuance, scheduled drops, reconciliation	Pomelo
Notifications	Tenant-scoped	Localized voucher delivery (SMS, email, WhatsApp), idempotent send	Twilio, SendGrid, Meta
Flight predictor	Shared platform	Dual-head cancel/delay model; ingests public flight + weather data	AeroAPI, AviationStack
Audit	Tenant-scoped	Append-only audit log with before/after snapshots	None

The flight predictor is the only shared-platform domain — it consumes public flight and weather data, has no PII, and serves every tenant from a single deployment. Every other domain is deployed per-tenant for full isolation of data, secrets, and event topics.

The four resilience patterns

1. Event-driven async workflows

Operators never wait on a partner. The booking workflow does.

When an operator clicks "Issue Voucher":

The case orchestrator records the operator's intent against the sub-case in a single durable step and returns an immediate acknowledgement.
The booking engine picks up the work asynchronously, gated by per-partner adaptive traffic shaping so the platform never exceeds a partner's published rate limit.
On success, the saga continues to the next step (wallet card issuance, passenger notification).
On permanent failure, the saga runs compensation — any partial booking is rolled back automatically.

Two structural properties matter for customer confidence: state and event are committed together (no "we updated state but lost the next step" failure mode), and the operator UI never blocks on a partner.

2. Read-side isolation for passenger traffic

The passenger surface reads from a denormalized snapshot store that is completely separate from the operational case database.

When a sub-case becomes ready for passenger acceptance, the case orchestrator compiles a lightweight passenger document — hotel address and QR code, transport instructions, voucher, card reveal link — and writes it to the snapshot store. Passenger refreshes hit only that store. Tens of thousands of concurrent passenger refreshes cause zero contention with the operator UI.

3. Distributed coordination

Two flavors of coordination, both invisible to customers:

Entity locks prevent two operators from editing the same passenger simultaneously. The lock is coupled to the operator's live session — the moment the session ends, the lock releases. A wall-clock fallback exists as defense-in-depth, but the live-session signal is authoritative. The locked-by indicator propagates to every other operator's screen in real time.
Inventory soft-holds prevent two operators from booking the last room. Available inventory is computed from the live set of holds plus confirmed reservations — there is no shared mutable counter that can drift under crash. Holds are self-healing: an abandoned attempt's capacity is restored automatically.

These patterns are described in customer-confidence terms in How Nexa survives operational panic.

4. Anti-corruption adapters

Every external partner sits behind an interface-typed adapter. The orchestration layer never sees vendor-specific JSON; the canonical data model is a Nexa-internal shape that all adapters translate to and from.

Adapters share three engineering primitives:

Adaptive traffic shaping per partner — platform-wide, so adding worker capacity never bursts a partner's rate limit.
Circuit breakers per partner — an outage at one partner does not disable another. Recovery is automatic when the partner's signals return to healthy.
Canonical translation — adapter responses normalize into the same internal shapes for inventory and fulfillment regardless of which partner produced them.

If a partner changes a contract, the adapter absorbs it; the rest of the platform doesn't notice.

Storage shape

Store	Purpose
Operational store	Cases, sub-cases, reservations, policies, wallet, durable workflow record. Sized for transactional consistency.
Snapshot store	Read-side store for passenger traffic. Horizontally scalable; isolated from the operational store.
Coordination store	Distributed locks, inventory soft-holds, real-time presence. Transient state with self-healing semantics.
Workflow bus	Cross-domain events and async commands. All tenant-relevant topics are tenant-scoped; platform-internal observability is separate.

Tenant isolation

A token issued for one airline cannot read another airline's data. Isolation is layered:

Identity claims — the tenant identity is stamped at login and forwarded across every internal call.
Application-level guard — every query is automatically filtered by tenant. CI asserts the guard is wired up at boot for every endpoint.
Per-tenant data partitioning — the operational store is partitioned per-tenant; analytical reads use row-level security.
Per-tenant event topics — topic-level access controls prevent a tenant's consumer from reading another tenant's events.
Per-tenant secrets — partner credentials are per-tenant, sourced at runtime, never logged.

Cross-tenant reads are impossible by construction, even for "platform-wide" partners (regulators, auditors). Each gets a tenant-specific registration per airline they oversee.

Identifiers (URN scheme)

Every identifier — internal or external — is a URN. URNs are structural: the partner and status are part of the identifier, not side fields.

URN	Example
Case	`urn:case:c-7f8e1`
Sub-case	`urn:sub-case:sc-91a4`
PNR	`urn:pnr:XYZ123:vendor:amadeus`
Reservation	`urn:reservation:r-8842:vendor:amadeus:status:confirmed`
Hotel offer (ephemeral)	`urn:offer:8842`
Issued card	`urn:issued-card:ic-44b1`
Correlation (W3C)	`urn:correlation:<trace-id>`

URNs are routable: extracting the partner from a URN tells the booking engine which adapter to dispatch to without an if/else chain. Any URN value in any payload is safe to log and compare across surfaces.

See Data model for the full URN registry.

What's next

Case Lifecycle — the case + sub-case state machine and the saga in detail.
Data Model — every URN type and the cross-domain references.
How Nexa survives operational panic — application-level resilience.
How Nexa stays available — infrastructure-level resilience.
Public API Reference — operator, passenger, partner, and webhooks.

Runtime topology​

The three API surfaces (BFFs)​

Core domains​

The four resilience patterns​

1. Event-driven async workflows​

2. Read-side isolation for passenger traffic​

3. Distributed coordination​

4. Anti-corruption adapters​

Storage shape​

Tenant isolation​

Identifiers (URN scheme)​

What's next​