Will DiMaio
Index
06In Progress· 2026· ML Research & Engineering

Pushing RL into operations research

GroundControl AI

A MaskablePPO policy that dispatches airport ground vehicles — fuelers, baggage tugs, pushback tractors — to service aircraft inside their departure windows, beating first-come-first-served on synthetic schedules with zero conflicts and zero abandonments. Now in retraining for live BTS data.

0%
Win rate vs FCFS · KFIC
0
Observation dims
0
Conflicts · all batteries
Retraining
Stage
01Key Insight
WHY ACTION MASKING IS NON-NEGOTIABLE HERE

Most state-action pairs in airport ground ops are physically impossible: a fueler past shift end, a tug docked at the wrong gate type, a pushback tractor already in motion. Vanilla PPO learns this through reward punishment — and burns its entire training budget rejecting illegal moves. MaskablePPO restricts the action distribution to legal moves at decision time, so the policy never sees an illegal choice. Convergence goes from intractable to tractable, and the resulting policy is by construction safe to deploy: it physically cannot emit an action that violates a constraint.

02The Problem

Why this exists.

Airport ground ops is a constraint-satisfaction problem with seconds of slack and hours of consequence. Aircraft arrive, park at gates, and need fuel, baggage handling, and a pushback tractor before their departure window closes. A fleet of ground vehicles has to dispatch against hundreds of concurrent service requests with type compatibility (only some tugs handle wide-bodies), shift windows, and physical travel time across the apron. A single dispatch decision made in the wrong order ripples into a delay that costs the airline thousands.

Existing operations-research tooling is fast on toy problems and brittle on real ones. Vanilla PPO is the obvious RL approach, and it doesn't work — most actions in this state space are physically impossible (assigning a fueler whose shift ended, sending a tug to a gate it can't dock at), so the policy burns its training budget on rejection. The interesting question is whether constrained RL with anticipation can produce a policy that's actually deployable, not just academic.

03The Approach

How it works.

GroundControl AI is a discrete-event airport simulator coupled to a MaskablePPO policy. The simulator models a single day at minute granularity: aircraft arrive on a schedule, request services, occupy gates, and depart. A fleet of ground vehicles — fuelers, baggage tugs, pushback tractors — must be dispatched to complete each service before the departure window closes. Vehicle types, shift windows, travel times across the apron, and gate compatibility are all enforced as hard constraints.

The dispatcher is a MaskablePPO policy with a 337-dimensional observation space encoding aircraft states, vehicle positions, pending tasks, and an anticipation buffer of upcoming work. Action masking is what makes the whole thing tractable: at every decision step, an action mask is computed from world state — only compatible vehicle/aircraft pairs in valid shift windows are presented to the policy. Vanilla PPO has to learn legality through punishment; MaskablePPO never sees an illegal action in the first place. The policy converges; vanilla PPO does not.

Evaluation is the project's discipline. Every result comes from a replayable seed bank: a 50-seed in-distribution battery on KFIC (the synthetic 15-node graph the policy was trained on) plus a 50-seed out-of-distribution battery, plus a real-world 40-flight slice from BTS data on KAUS (Austin-Bergstrom). The KFIC win is real and reported (21% win rate vs FCFS, mean delay delta −0.4 min, zero conflicts, zero abandonments). The KAUS gap is also reported honestly — the policy was trained on a 15-node graph and sees fallback values for KAUS's 119-node graph; it does not fail (zero conflicts) but is overly passive. KAUS retraining is the active next step.

04Training & Evaluation

What it does.

01 / 04Simulator

Discrete-event airport day at minute granularity

Aircraft arrive on a schedule, park at gates, request services, and depart. A fleet of fuelers, baggage tugs, and pushback tractors must complete each service before the aircraft's departure window closes. Travel time across the apron, vehicle type compatibility, and shift windows are all hard constraints — not soft penalties.

02 / 04Policy

MaskablePPO with anticipation buffer

337-dimensional observation space encoding aircraft states, vehicle positions, current tasks, and an 8-slot anticipation buffer of upcoming work. The anticipation buffer is what lets the policy pre-position vehicles for predictable future demand, instead of reacting only when a service request arrives.

03 / 04Evaluation

Replayable seed bank, KFIC + KAUS

Every result is reproducible from a fixed seed bank: a 50-seed hard battery on KFIC (the synthetic graph the policy trained on), a 50-seed out-of-distribution battery, and a 40-flight real-world BTS slice on KAUS. Win rate, mean delay delta, conflicts, and abandonments are all reported per battery — no cherry-picked numbers.

04 / 04Roadmap

KAUS retraining for real-world transfer

The KAUS evaluation surfaces a real transfer problem: the policy was trained on a 15-node graph and sees fallback position values for the 119-node real airport. It does not fail — zero conflicts, zero abandonments — but is overly passive. The next 90 days are KAUS retraining: rebuild the schedule generator from BTS data, expand the graph encoding, and re-evaluate against the same seed bank.

05Achievements

What's shipped so far.

VS FCFS · KFIC HARD BATTERY
0% win

Across a 50-seed in-distribution battery on synthetic KFIC schedules, the MaskablePPO policy beats first-come-first-served 21% of the time, with mean delay delta of −0.4 minutes and zero conflicts or abandonments.

CONFLICTS / ABANDONMENTS
0 / 0

Zero illegal dispatch decisions and zero stranded service requests across both KFIC batteries and the KAUS real-world slice. Action masking is doing what it's there for: the policy is by construction safe.

KAUS RETRAINING ROADMAP
0 days

Out-of-distribution transfer to live KAUS data is the active next step. Schedule generator rebuild from BTS feed, graph encoding expansion (15 → 119 nodes), and re-evaluation against the same seed bank — honest deltas, no demo varnish.