Inside the Flight Predictor

The Nexa Flight Predictor is a Nexa-trained model that lets the platform start sourcing hotel inventory before an airline officially declares a disruption. From a consumer's point of view it's three numbers per flight (cancelProbability, predictedDelayMinutes, confidenceScore) plus a list of human-readable factors. This article describes how those numbers are produced — the architecture, not the implementation.

The customer-facing API surface lives in the Flight Disruption Predictor guide.

What it predicts

For every scheduled flight in the next 24 hours of every watched airline, the predictor emits:

cancelProbability ∈ [0, 1] — calibrated probability that the flight will be cancelled.
predictedDelayMinutes ∈ [0, 720] — expected delay if the flight operates.
confidenceScore ∈ [0, 1] — coverage proxy: how much corroborating evidence supports the forecast.

These numbers are produced via two complementary paths:

Hot path — when an upstream flight-data event lands (a new flight, a status change), a per-flight forecast runs in seconds.
Cold path — every five minutes, a global sweep re-forecasts every non-terminal flight as a safety net for missed events.

A boot-time reconciliation step closes the gap when the service has been unreachable.

The unit of work: airline, not airport

The first design decision that shapes everything else: the predictor is organized around watched airlines, not airports.

When a tenant marks an airline as watched, three things happen:

A push subscription is opened against the upstream flight-data source for each ICAO in the carrier's group. LATAM, for example, expands to its full set of operating ICAOs across the region.
A pull of the next 24 hours of scheduled departures runs every five minutes per ICAO.
A nightly historical backfill is scheduled to feed accuracy reports and retraining data.

A subsidiary roll-up keeps the operator UX consistent ("LATAM" is a single brand) while the platform internally operates per ICAO. Every alert subscription is keyed by ICAO, every data-quota burn is captured per ICAO, and every push event is routed by ICAO.

Ingestion cycle (every five minutes)

The recurring cycle, per watched airline:

Expand the ICAO group for the airline (handle subsidiaries).
For each ICAO, ensure an alert subscription exists upstream (idempotent).
List scheduled departures for the next 24 hours, capturing rate-limit headers so the dashboard can show data-quota burn vs. ceiling per airline.
Bulk-upsert flights with a synthetic key that combines airline, flight number, and scheduled departure to prevent collisions on shared codes.
Fan out per-flight forecasts only for flights that are newly discovered (don't re-forecast a flight whose forecast is minutes old).
Advance a watermark that enables the downtime-recovery flow described below.

A few real operational caveats:

The cycle errors loudly when credentials are missing — there is no silent fallback, since ingestion without an upstream is no ingestion.
Status normalization is fail-open: unknown upstream statuses collapse to scheduled. A conscious tradeoff — Nexa would rather produce a sub-optimal forecast than drop a flight because the upstream introduced a new vocabulary token.

Per-flight forecast (hot path)

When a webhook arrives or the cycle discovers a new flight, a per-flight forecast runs:

Look up the flight by ID.
Pull historical outcomes for the route and a slice for the airline.
Run the decision agent (the orchestrator that combines all the pieces below).
Persist the result, mirror it to long-term storage, and emit a real-time event.

A useful detail: forecasts are bucketed per minute per flight — bursts of change events (typical when a flight is reprogrammed several times) collapse to one prediction per flight per minute. The cost saving on bursty schedules is non-trivial.

Flights in terminal states (landed, cancelled, diverted) are skipped.

The multi-agent signal layer

Before the model runs, an ingestion layer collects external signals from independent sources. Each agent is small, isolated, and runs in parallel — a downed source contributes zero signals but never aborts the prediction.

Agent	Source category	Notes
Flight Ops	Upstream flight-data source	Honors a manual override on the input — useful for ad-hoc what-ifs.
Weather	Open meteorological data	Maps weather code to a normalized severity.
Labor & Civil Unrest	Open geopolitical-events feed	Strikes, civil-unrest events near the airport.
News & Traffic	News aggregator	Keyword-matched on closure / crash / strike vocabulary.
Natural Hazard	Open hazard feed plus a curated overlay	Seismic and volcanic events.
Airline Reliability	Curated reliability feed (or derived from history)	Falls back to historical outcomes when no curated source is available.
Aircraft Rotation	Internal rotation chain	Carries forward the delay of the previous leg of the same tail.
Cancellations Board	Live-cancellations crawl	Match by airline; fall back by airport.
History Pattern	Historical outcomes	Seasonality, day-of-week.

Every signal carries a normalized severity, a confidence, an observed-at timestamp, and source metadata.

Attribution: ranking factors for humans

A separate attribution agent ranks signals for the operator UI. It returns the top contributing factors with a human-readable rationale.

Crucially, attribution does not feed the model. It feeds the UI and the topFactors field of the API response. This separation is deliberate — the model should not be biased by the explanation it produces.

The feature snapshot

The model's input is a numeric feature vector covering route history, airline-level history, recent (last-14-day) trends, signal aggregates, seasonality, and rotation-chain pressure. Each prediction is auditable: every snapshot is persisted, so any prediction can be reconstructed to its exact input.

Inference: trained model first, deterministic baseline fallback

prediction = trainedModelForecast(snapshot, topFactors)
          ?? baselineForecast(snapshot, topFactors)

If the trained model is unreachable for any reason, a deterministic baseline runs locally inside the orchestrator. The platform never blocks on model availability.

The trained model

The trained model is dual-head:

Cancel head — binary classifier returning cancelProbability.
Delay head — regressor returning predictedDelayMinutes (clamped to [0, 720]).

Why this shape:

The dataset is tabular and mixed (rates, severities, normalized counters), with non-linear but smooth relationships between signal severity and outcomes. Tree-based ensembles handle this class of problem well, and they are far easier to explain to operators than neural networks.
The two heads are sized differently. The regression head is wider because the delay signal is much noisier than the binary cancel signal.

Training cadence

The trained model is retrained on a managed cadence:

Weekly incremental retrain on new outcomes.
Quarterly full retrain with a hyperparameter sweep.
On demand after material schedule changes (new route footprint, terminal reopen).

Every trained model is registered with a version. Rolling back is a single registry pointer change — no service interruption.

The deterministic baseline

The baseline is a transparent linear combination over the feature snapshot. It runs whenever the trained model is unreachable, the prediction endpoint is unset, or the call fails. Two design rules govern it:

No single feature can saturate the result on its own. All weights are bounded so that one extreme value never produces a forecast unsupported by other corroborating evidence.
Rotation-chain pressure is first-class. An aircraft arriving very late has a high probability of cancelling its next leg rather than absorbing more delay; the rotation term carries weight comparable to "recent airline cancel rate."

The baseline produces forecasts in the same [0, 1] and [0, 720] ranges as the trained model. From the consumer's perspective the two paths are interchangeable.

Confidence is not probability

After producing cancelProbability and predictedDelayMinutes, the system computes a separate confidenceScore. This number is a coverage proxy — it grows with the number of independent factors supporting the forecast and with the historical track record of the route and airline. It exists so an operator can quickly see whether a forecast is backed by three corroborating factors or just one.

A consumer downstream that needs a calibrated probability of cancellation should use cancelProbability and treat confidenceScore as a coverage proxy — not a calibrated estimate of certainty.

Webhook handling

The upstream flight-data source pushes events to a webhook protected by a shared token. For each event, Nexa:

Persists the raw payload to an append-only audit row — webhooks are never dropped, even if processing fails.
Normalizes the kind to one of a small canonical set.
Resolves a stable flight identifier.
Branches by kind: schedule changes upsert and enqueue a forecast; departures mark the flight active and re-forecast; arrivals capture an outcome; cancellations write a terminal outcome; diversions create a synthetic origin signal so neighboring flights pick up the disruption laterally.
Advances the watermark.

Diverted flights are not re-forecasted, but they do create a synthetic origin signal. This is how lateral contagion is modelled — without explicitly modelling cross-flight cancellations.

Downtime recovery

Webhooks are fire-and-forget. If Nexa is unreachable, those events are lost and the flight state goes stale. Recovery is based on a single watermark and a bounded replay.

Watermark: bumped on every successful webhook process and on every successful ingestion cycle close.
Trigger: at boot, after all dependencies are healthy and the recurring jobs are seeded; or manually via an admin endpoint.
Window resolution: explicit override beats watermark-based, beats default; capped at 7 days; skipped entirely if the watermark is fresher than a few minutes.
Flow: list historical flights for each watched ICAO in the window, bulk-upsert with the observed status, upsert outcomes, re-enqueue forecasts for non-terminal flights, advance the watermark.

The per-minute bucketing on forecast jobs means re-enqueuing a flight that just received a forecast is idempotent — there is no risk of duplicate forecasts during reconciliation.

Outcomes, accuracy, retraining

Outcomes (the ground truth) are captured from three places:

A nightly historical backfill.
A daily outcomes capture.
The boot-time reconcile.

The accuracy report endpoints expose:

Cancel (binary): true/false-positive/negative counts, accuracy, precision, recall, F1, Brier score.
Delay (regression): MAE, RMSE, bias (positive = over-predicts), median absolute error, banded accuracy at 15/30/60 minutes, IATA on-time accuracy.
Daily breakdown and top airports.

Matching is strict on no-look-ahead: for each outcome the most recent prediction with generatedAt < scheduled_out is used. Outcomes without a preceding prediction count toward coverageRate but are excluded from the accuracy metrics.

Operational guarantees

A few decisions worth surfacing for a careful reader:

The platform never blocks on model availability. A deterministic baseline runs whenever the trained model is unreachable.
Status normalization is fail-open. Unknown statuses become scheduled; we'd rather forecast in a wrong-but-recoverable state than drop a flight.
Diverted flights propagate via a synthetic signal, not via explicit cross-flight modelling. Lateral contagion is data-driven.
The attribution agent is not in the inference loop. Separation of concerns — the model is never biased by its own explanation.
Confidence is coverage, not calibration. Use cancelProbability for thresholds; use confidenceScore to sort or filter by how much material the model had.

Glossary

Hot path / cold path — Hot = per-flight forecast (seconds). Cold = full sweep every 5 min.
Watermark — Timestamp of the last successfully processed event. Enables gap detection after downtime.
Signal — An external piece of information with normalized severity and confidence, produced by an agent and consumed by the feature builder and the attribution agent.
Snapshot — The numeric feature vector that enters the model. Audited: every prediction stores the exact snapshot that produced it.
Nexa AI Model — Nexa's trained model. Customers consume it through the predictor API; they do not provision, train, or operate any third-party AI service.

What it predicts​

The unit of work: airline, not airport​

Ingestion cycle (every five minutes)​

Per-flight forecast (hot path)​

The multi-agent signal layer​

Attribution: ranking factors for humans​

The feature snapshot​

Inference: trained model first, deterministic baseline fallback​

The trained model​

Training cadence​

The deterministic baseline​

Confidence is not probability​

Webhook handling​

Downtime recovery​

Outcomes, accuracy, retraining​

Operational guarantees​

Glossary​

See also​