AI Agents in Executive Protection: What Happens When the Intelligence Analyst Is a Language Model

Executive protection programs have always run on intelligence. Until recently, that intelligence was produced by humans — analysts reading incident reports, building trip briefs, sitting on watch desks. Over the past eighteen months, a meaningful share of that workload has shifted to language-model agents querying crime data APIs. The change is real. The limits are also real, and they are not the ones the marketing materials tend to discuss.

What an executive protection program actually consumes

Before discussing automation, it helps to be specific about what corporate security teams ask of an intelligence function. The work breaks down into roughly four products:

Pre-trip briefs. A document or dashboard summarizing crime, civil unrest, health, and transportation risk for every city on a principal's itinerary, scoped to the specific hotels, venues, and routes involved.
Real-time monitoring. A watch function that flags incidents within a defined radius of the principal's current location or planned route while travel is in progress.
Site assessments. Standing analysis of a residence, office, or recurring venue — typically a block-level look at the prior twelve to thirty-six months of incidents, broken out by category and time of day.
Pattern-of-life baselining. Slower, longitudinal analysis to understand what “normal” looks like for a given neighborhood, so that the watch function has a benchmark for what counts as elevated.

All four of these products have, historically, been bottlenecked by analyst time. A single intelligence analyst can produce maybe three or four city briefs a day at the level of detail a serious EP program expects. Watch coverage scales linearly with headcount. Site assessments tend to be commissioned once and refreshed irregularly. None of this is a function of any single product's shortcomings — it is the math of human throughput.

The broader operational context

It is worth situating EP automation inside the larger wave moving through public-safety operations over the past two years. Three developments outside the EP space are reshaping the data and intelligence environment that corporate security teams operate inside (3SI Security, May 2026).

First, law enforcement itself has begun deploying AI to handle administrative load. A 2025 U.S. Public Safety Trends Report found that 76% of officers spend more than half of their shifts on paperwork. Agencies are now using language-model tools to draft incident reports, summarize body-camera footage, and triage investigative leads. The privacy and evidentiary defensibility questions are unresolved, but the productivity pressure is real, and it mirrors the same pressure being applied on the corporate side.

Second, Real-Time Crime Centers (RTCCs) are evolving toward what the industry now calls “direct-to-dispatch” models. Verified intelligence from cameras, ALPRs, and analyst review flows directly into dispatch workflows rather than waiting on an analyst to package and forward it. For an EP team watching from outside, this means the time from incident occurrence to public-record signal is contracting — which affects what an agent monitoring a feed can plausibly catch in real time.

Third, public-private intelligence sharing has matured. Several jurisdictions now run formal platforms that let private security teams receive verified video and incident data from law enforcement during active incidents. EP programs that previously had to rely on scanner audio and OSINT now have, at least in some cities, a direct channel. The data quality is better; the responsibility for what is done with it has not changed.

Where AI agents are actually being used

The deployments we have seen and read about in 2025 and 2026 fall into a narrower band than the broader “AI for security” conversation suggests. The high-leverage uses are unglamorous:

API-fronted brief generation. An agent receives an itinerary, queries a crime data API for each address and city, pulls public events and protest calendars, and assembles a structured brief with citations. A human analyst then reviews and edits before delivery.
Geofenced watch automation. An agent polls a crime data API on a 5–15 minute cadence for incidents inside a defined polygon around the principal's current location, and surfaces only those that exceed a configured severity threshold. The agent does not page anyone directly; it writes to a queue that a human watch officer triages.
Site assessment drafting. An agent pulls 12–36 months of incidents at an address, normalizes the categories, produces a category and time-of-day breakdown, and writes a draft narrative. A human edits and signs the final product.
Question-answering over historical data. An analyst types “what does the burglary trend at this address look like vs. the surrounding ZIP over the last three years?” and the agent returns a chart and a short paragraph, sourced from the API.

Notice what is not on this list: autonomous decision-making about whether to move a principal, autonomous communication with the principal or their family, and autonomous escalation to local law enforcement. Programs that have deployed agents responsibly have kept all three of those firmly in human hands.

The calibration problem

The most useful thing a recent line of research has done for security buyers is to put numbers on a limitation that practitioners had already suspected. In a 2024 academic post on calibration using NEISS injury data (Circo, 2024), the author found that large language models classifying free-text injury narratives are not well-calibrated — the confidence scores they emit, whether self-reported or extracted from token probabilities, do not match observed accuracy. The models are particularly overconfident on items the underlying logit suggests they are uncertain about.

That finding generalizes, and it matters here because EP intelligence work involves a great deal of classification and triage: is this incident a robbery or a theft? Is this protest peaceful or escalating? Is this address part of a hotspot? An agent that returns “high confidence” on most of its outputs is not actually telling the operator anything useful. The signal-to-noise ratio in the confidence channel is low, and the consumer of the brief has no easy way to know that.

For EP buyers, the practical implication is that the “confidence” field on an agent output should be treated as decorative until proven otherwise. Either the deploying team has run a calibration study against a held-out set of incidents with known ground truth, or they have not. If they have not, the field is marketing.

A cautionary precedent on the public-safety side

On April 13, 2026, an AI alert app called CrimeRadar pushed an active-shooter alert to parents in Mount Vernon, Missouri based on a misinterpreted police radio transmission. There was no shooting; the system had transcribed and classified a routine call as a school threat (Old Man Trench, April 2026). The product's designers had optimized for speed. The result was a false lockdown.

The relevant quote, from the post-incident coverage: “Systems like this are built to be fast. Safety requires being right. Those two things are not the same.”

The reason this matters for EP is that the structural temptation in the corporate security market is identical. Buyers want the agent to surface the critical incident before the principal arrives at the venue. Vendors want to demonstrate latency. Latency without calibration is a recipe for false positives. False positives degrade trust quickly — an EP watch officer who pages the principal three times for things that turn out to be nothing will be ignored on the fourth page, which is the one that mattered.

What developers building these workflows should design around

For developers building agent integrations against crime data APIs for EP use, a few patterns recur in deployments that have not embarrassed their operators:

Treat the API as the source of truth and the model as the synthesizer. Every claim in the agent's output should map to a row the API returned. If the brief says “three armed robberies within 500m of the hotel in the last 90 days,” the user should be able to click through to those three rows.
Make “no incidents found” an explicit and visible state. Hallucination risk is highest when the agent has nothing to work with and reaches for general knowledge. A sparse-data page with a clear empty state is better than a paragraph that sounds confident and is wrong.
Cap radius and time windows server-side. An agent that decides on its own to expand a query from 500m to 2km when it finds nothing nearby will produce a more “readable” brief and a less accurate one. Boundaries should be enforced outside the agent.
Log every query and every output for after-action review. EP work has a built-in feedback loop: trips happen, nothing bad happens or something does, and the team reviews. The agent's contribution should be auditable in that review, not opaque.
Build the watch function around queues, not pages. The agent decides what shows up on a watch officer's queue. The watch officer decides what gets escalated. Compressing that step is where the CrimeRadar pattern lives.

What the data underneath has to support

A lot of these workflows quietly assume that the API behind them is delivering data that an agent can use without misleading the reader. That assumption is not always safe.

National-rate baselines from official sources are sparse and slow. The FBI's UCR program reports US violent crime at 359 per 100,000 and property crime at 1,760 per 100,000 for 2024, with year-over-year changes of -5.4% and -9.0% respectively, and a 49.1% decline in overall crime since 2001 (USAFacts). State rankings show Alaska at 724/100,000 for violent crime and Maine at 100/100,000 in 2024 (USAFacts state ranking). The Real-Time Crime Index, which updates roughly every 45 days, fills part of the gap (RTCI).

But EP work is rarely about national rates. It is about whether the block in front of the venue had three robberies in the last 60 days. That granularity comes from address-level incident data, not from UCR aggregates, and most of the value an agent can add depends on whether the API can answer those questions in the first place.

A quick gut check for any EP-agent vendor pitch

Ask the vendor to show you, for a real address you choose, the list of underlying incidents the brief is summarizing. If they cannot, the brief is not grounded. If they can, ask how many of those incidents are normalized vs. raw, and how often the source feed updates. Those two answers determine whether the agent has anything real to reason over.

What the agent does well, on the evidence so far

The honest assessment is that agents are doing a better job at the “summarize, narrate, and format” layer than human analysts can sustainably do, and they are not yet doing a better job at the “decide what matters” layer.

A senior analyst who used to spend two hours writing a city brief can now spend twenty minutes reviewing one that an agent wrote, and produce roughly the same artifact. That is a real productivity gain. It compounds for programs that cover dozens of trips a week, where the bottleneck was throughput rather than judgment.

A watch officer who used to scan five tabs of municipal feeds can now triage a queue that an agent has pre-filtered. That is also a real gain, but it is more fragile. The filter introduces a dependency on the agent's recall — the things it failed to surface are, by construction, invisible to the officer. Programs that have done this well are running parallel-eyes audits for some interval after deployment to verify the agent is not silently dropping the same category of incident every time.

What the agent does not do well

It does not handle ambiguity in incident classification well, especially when the source feed uses non-standard category labels and the agent has to map them on the fly. It does not handle gaps in the data well — an address that the API has no incidents for is not necessarily a safe address; it might just be outside the agency's coverage. And it does not handle base-rate reasoning well: an agent will commonly describe a neighborhood as “high-crime” based on the absolute count of incidents without normalizing for population, square mileage, or the relevant denominator.

All three of these are addressable in product design. An API that returns coverage metadata alongside incident data lets the agent say “no incidents reported, but this jurisdiction has irregular reporting” rather than “no incidents reported” full stop. A category taxonomy enforced at the API layer eliminates the on-the-fly mapping. And a structured response that includes the surrounding ZIP or city baseline lets the agent put a number in context. Whether any given vendor has done this work is an empirical question, and worth asking before signing.

Adjacent demand: distributed-location corporate security

The same class of agent is showing up in a related but distinct corner of corporate security: protecting distributed business locations. The drivers there are quantitative, not narrative.

The National Retail Federation's 2025 report found that 46% of retailers reported an increase in violence during criminal incidents and 48% saw a rise in cargo and supply chain theft (cited in 3SI Security, May 2026). ATM jackpotting attacks reached approximately 700 incidents in 2025, with reported losses over $20 million. Organized retail crime rings are running coordinated flash-mob thefts across jurisdictional lines, which makes any single-jurisdiction view of the problem misleading.

For an organization with 800 retail locations or 1,200 ATMs, the individual-principal EP intelligence model does not scale. There is no analyst writing a site brief for each address. The natural fit for an agent is the distributed-location version of what individual EP already does: a standing site assessment for every location, a geofenced watch for elevated incidents, and an alerting threshold tuned to category and severity. The same calibration concerns apply — perhaps more sharply, because the alert volume is higher and the cost of false positives compounds across the fleet.

A pragmatic forecast

The likely path over the next eighteen months is not that EP intelligence work goes away. It is that the cost structure shifts. Programs that previously could afford only a single analyst will run agent-assisted workflows that look like a three-person shop. The marginal cost of a trip brief approaches the cost of an API call. The marginal cost of judgment does not move.

That redistribution will create new failure modes — not catastrophic ones, probably, but characteristic ones. Briefs that read well and miss something obvious. Watch queues that drop a category of incident for three months before anyone notices. Site assessments that confuse an under-reported jurisdiction for a safe one. None of these are AI-specific failures; they are automation failures that AI has made cheaper to deploy, which means more programs will encounter them.

The programs that handle the transition well will be the ones that keep the human review step, run periodic calibration audits, and remember that the agent's job is to make the analyst faster, not to replace the moment of judgment that the analyst exists for.