Skip to main content

Architecture Overview

Nexa is engineered for operational panic — the microscopic windows during a Tier-1 hub closure when dozens of agents compete for limited hotel inventory while tens of thousands of stranded passengers hit the platform from their phones. The architecture is built from four patterns that compose: event-driven workflows, read-side isolation for passenger traffic, distributed coordination for inventory and operator concurrency, and anti-corruption adapters for every external partner.

Runtime topology

The three API surfaces (BFFs)

Each external audience hits a dedicated Backend-For-Frontend. They share the same identity model and the same platform behind them, but their authorization, rate limits, and traffic patterns differ enough to deserve separate processes.

SurfaceAudienceAuthTraffic shape
Operator APIAirline operators (ops console)OIDC + role-basedHigh-complexity, transactional, low cardinality of actors
Passenger APIPassengers (mobile webapp)Short-lived passenger tokensMassive read fan-out — refreshes during a disruption
Partner APIB2B integratorsOAuth2 client-credentials, scopedBursty, machine-to-machine, per-partner quotas

See API Reference for endpoints, schemas, and webhook contracts.

Core domains

Each domain owns its data. Cross-domain communication is event-driven over the workflow bus — domains never share a database.

DomainTenant scopeOwnsPartners
CasesTenant-scopedCase + sub-case lifecycle, saga orchestration, durable workflowNone directly
Airline adapterTenant-scopedPSS abstraction (Amadeus, Sabre, custom). Zero-persistence — cache onlyAmadeus, Sabre, custom PSS
PoliciesTenant-scopedVersioned policy engine, natural-language → structured policy synthesisAI provider
BookingTenant-scopedInventory search, soft-holds, async fulfillment, saga compensationAmadeus, Hotelbeds, direct contracts
WalletTenant-scopedVirtual prepaid card issuance, scheduled drops, reconciliationPomelo
NotificationsTenant-scopedLocalized voucher delivery (SMS, email, WhatsApp), idempotent sendTwilio, SendGrid, Meta
Flight predictorShared platformDual-head cancel/delay model; ingests public flight + weather dataAeroAPI, AviationStack
AuditTenant-scopedAppend-only audit log with before/after snapshotsNone

The flight predictor is the only shared-platform domain — it consumes public flight and weather data, has no PII, and serves every tenant from a single deployment. Every other domain is deployed per-tenant for full isolation of data, secrets, and event topics.

The four resilience patterns

1. Event-driven async workflows

Operators never wait on a partner. The booking workflow does.

When an operator clicks "Issue Voucher":

  1. The case orchestrator records the operator's intent against the sub-case in a single durable step and returns an immediate acknowledgement.
  2. The booking engine picks up the work asynchronously, gated by per-partner adaptive traffic shaping so the platform never exceeds a partner's published rate limit.
  3. On success, the saga continues to the next step (wallet card issuance, passenger notification).
  4. On permanent failure, the saga runs compensation — any partial booking is rolled back automatically.

Two structural properties matter for customer confidence: state and event are committed together (no "we updated state but lost the next step" failure mode), and the operator UI never blocks on a partner.

2. Read-side isolation for passenger traffic

The passenger surface reads from a denormalized snapshot store that is completely separate from the operational case database.

When a sub-case becomes ready for passenger acceptance, the case orchestrator compiles a lightweight passenger document — hotel address and QR code, transport instructions, voucher, card reveal link — and writes it to the snapshot store. Passenger refreshes hit only that store. Tens of thousands of concurrent passenger refreshes cause zero contention with the operator UI.

3. Distributed coordination

Two flavors of coordination, both invisible to customers:

  • Entity locks prevent two operators from editing the same passenger simultaneously. The lock is coupled to the operator's live session — the moment the session ends, the lock releases. A wall-clock fallback exists as defense-in-depth, but the live-session signal is authoritative. The locked-by indicator propagates to every other operator's screen in real time.
  • Inventory soft-holds prevent two operators from booking the last room. Available inventory is computed from the live set of holds plus confirmed reservations — there is no shared mutable counter that can drift under crash. Holds are self-healing: an abandoned attempt's capacity is restored automatically.

These patterns are described in customer-confidence terms in How Nexa survives operational panic.

4. Anti-corruption adapters

Every external partner sits behind an interface-typed adapter. The orchestration layer never sees vendor-specific JSON; the canonical data model is a Nexa-internal shape that all adapters translate to and from.

Adapters share three engineering primitives:

  • Adaptive traffic shaping per partner — platform-wide, so adding worker capacity never bursts a partner's rate limit.
  • Circuit breakers per partner — an outage at one partner does not disable another. Recovery is automatic when the partner's signals return to healthy.
  • Canonical translation — adapter responses normalize into the same internal shapes for inventory and fulfillment regardless of which partner produced them.

If a partner changes a contract, the adapter absorbs it; the rest of the platform doesn't notice.

Storage shape

StorePurpose
Operational storeCases, sub-cases, reservations, policies, wallet, durable workflow record. Sized for transactional consistency.
Snapshot storeRead-side store for passenger traffic. Horizontally scalable; isolated from the operational store.
Coordination storeDistributed locks, inventory soft-holds, real-time presence. Transient state with self-healing semantics.
Workflow busCross-domain events and async commands. All tenant-relevant topics are tenant-scoped; platform-internal observability is separate.

Tenant isolation

A token issued for one airline cannot read another airline's data. Isolation is layered:

  1. Identity claims — the tenant identity is stamped at login and forwarded across every internal call.
  2. Application-level guard — every query is automatically filtered by tenant. CI asserts the guard is wired up at boot for every endpoint.
  3. Per-tenant data partitioning — the operational store is partitioned per-tenant; analytical reads use row-level security.
  4. Per-tenant event topics — topic-level access controls prevent a tenant's consumer from reading another tenant's events.
  5. Per-tenant secrets — partner credentials are per-tenant, sourced at runtime, never logged.

Cross-tenant reads are impossible by construction, even for "platform-wide" partners (regulators, auditors). Each gets a tenant-specific registration per airline they oversee.

Identifiers (URN scheme)

Every identifier — internal or external — is a URN. URNs are structural: the partner and status are part of the identifier, not side fields.

URNExample
Caseurn:case:c-7f8e1
Sub-caseurn:sub-case:sc-91a4
PNRurn:pnr:XYZ123:vendor:amadeus
Reservationurn:reservation:r-8842:vendor:amadeus:status:confirmed
Hotel offer (ephemeral)urn:offer:8842
Issued cardurn:issued-card:ic-44b1
Correlation (W3C)urn:correlation:<trace-id>

URNs are routable: extracting the partner from a URN tells the booking engine which adapter to dispatch to without an if/else chain. Any URN value in any payload is safe to log and compare across surfaces.

See Data model for the full URN registry.

What's next

Was this helpful?