Predictive Policing vs. Descriptive Crime Data: The Distinction Every Developer Should Understand

The debate over predictive policing has consumed public safety policy circles for a decade. But for developers building products that depend on crime data, the distinction between predictive and descriptive isn't just philosophical — it's architectural. Choose the wrong foundation and you inherit algorithmic bias, regulatory exposure, and a product that can be challenged in court. Choose the right one and you build on facts.

What Predictive Policing Actually Means

Predictive policing is the use of algorithmic models — fed by historical crime records, demographic data, and geographic patterns — to forecast where crimes are likely to occur before they happen, or to identify individuals deemed statistically likely to commit or become victims of violence. The pitch to police departments was compelling: get ahead of crime instead of just reacting to it.

The most prominent commercial implementations included PredPol (later rebranded as Geolitica), which analyzed incident history to generate patrol deployment recommendations by location and time window. Chicago's Strategic Subject List — also known as the “heat list” — assigned individual residents a risk score ranging from 0 to 500 based on prior arrests, associates, and neighborhood data. ShotSpotter deployed acoustic sensors to create real-time gunfire heat maps that fed into predictive deployment models.

The evidence base for these systems is, at best, contested. At worst, it's a case study in how feedback loops corrupt data. When a model predicts high crime in a neighborhood, police deploy there more heavily. More policing means more recorded incidents. More recorded incidents feed the model as confirmation. The neighborhood gets flagged again. The cycle repeats — not because the neighborhood is genuinely more dangerous, but because the algorithm is measuring its own enforcement footprint.

The Predictive Policing Track Record

LAPD / PredPol:A 2021 audit by the LAPD Inspector General found PredPol disproportionately targeted minority neighborhoods. The department shut down the program after the findings.
Chicago Heat List:A RAND Corporation evaluation found no evidence the Strategic Subject List reduced violence. Chicago aldermen moved to repeal it following civil liberties challenges.
Santa Cruz, CA:Became the first US city to ban predictive policing outright in 2020, citing concerns about bias and civil rights violations.
New Orleans / Palantir:A secret predictive policing contract with Palantir was exposed by The Verge in 2018 — the city had not disclosed the program to the public or city council.

What Descriptive Crime Data Actually Is

Descriptive crime data is a record of what happened: a specific crime type, at a specific location, at a specific time, as reported to law enforcement. No predictions. No individual risk scores. No demographic inference. Just the factual record of reported incidents — normalized, geocoded, and made queryable.

This is the foundation SpotCrime is built on. When a burglary occurs on a residential block in Indianapolis at 11 PM on a Tuesday and the resident files a police report, that incident enters the public record. SpotCrime ingests that record, normalizes it against a standardized taxonomy of 18 crime categories, geocodes it to a precise latitude/longitude, and makes it available via API within minutes. No model is predicting whether the next burglary will happen. The API is reporting that this one did.

The neighborhood safety rating — a 1–100 index for any US address — is built entirely on this descriptive foundation. It aggregates verified incident counts across a 36-month window, weights by recency and severity, normalizes for population density, and calculates trend momentum (improving, stable, or declining). Every number in the output traces back to a reported incident in the public record. There is no demographic input. There is no individual profiling. There is no model predicting future crime.

Why This Distinction Is Architectural, Not Academic

Developers who build products on crime data often treat the data layer as a commodity — the interesting work is in the UX, the features, the growth. The data is just plumbing. That framing misses something important: the data layer is also the legal and reputational foundation of your product.

Consider what happens when your product's data source is predictive. You are, in effect, making claims about future crime likelihood. Those claims are probabilistic, model-dependent, and unverifiable against the ground truth. When a user sees your product rate their neighborhood as high-risk and they know the area is safe, they can't trace the rating back to specific incidents — because there may not be any. The model said so. That's the claim.

Now consider the regulatory environment. The EU's AI Act, which took full effect in 2024, classifies systems that profile individuals for law enforcement purposes as high-risk AI — subject to mandatory conformity assessments, bias audits, and human oversight requirements. Several US states have introduced or passed legislation restricting automated decision-making in criminal justice contexts. California's Automated Decision Systems Accountability Act imposes disclosure requirements on tools used in high-stakes decisions.

If your product uses predictive scoring at the individual or micro-neighborhood level to help users make decisions about where to live, whether to hire someone, or how to price insurance — you may be navigating a regulatory minefield you didn't sign up for. Descriptive aggregate data sits in a fundamentally different legal category. You are reporting what happened. Journalism does this. Property records do this. Insurance actuarial tables do this. It is not an area of active legislative hostility.

6,441

Shooting incidents tracked by shootingsnear.me

Across 12 US cities, last 60 days

Months of incident history

Behind every neighborhood safety rating

22,000+

US cities covered

SpotCrime incident data

The Feedback Loop Problem in Practice

The feedback loop that corrupts predictive policing data is worth understanding in detail, because it also illustrates why descriptive data is structurally more reliable for most developer use cases.

In a predictive model trained on historical arrest data, neighborhoods with higher historical policing intensity will produce more arrest records — not necessarily because more crime occurs there, but because more officers are present to make arrests. That data then trains the next iteration of the model. Over time, the model's predictions converge on historical enforcement patterns rather than actual crime incidence. The model is, in effect, predicting where police used to go, then telling police to go there again.

Descriptive data built on reported incidents — rather than arrests — has a different bias profile. Reporting rates vary by neighborhood, crime type, and community trust in police. Under-reporting is a real phenomenon. But it is a known, characterizable limitation with no feedback amplification. The data reports what was reported. It does not compound enforcement decisions back into itself.

For a developer building a product, this distinction is directly relevant to accuracy. A neighborhood safety score derived from predictive arrest-based models may look confident but be measuring policing intensity. A score derived from normalized, verified incident reports — what actually happened, as filed — is measuring something closer to actual experienced crime. For a family choosing a neighborhood, a real estate buyer pricing a purchase, or a corporate security team assessing travel risk, the second number is the useful one.

Real-Time Shooting Data: Descriptive at Its Most Immediate

The difference between predictive and descriptive comes into sharpest focus with real-time shooting data. SpotCrime's companion tool shootingsnear.me tracks reported shooting incidents across major US cities, updated daily, with a 60-day rolling window. Right now, that feed shows 6,441 incidents across 12 cities — including 1,008 in San Antonio, 912 in Baltimore, 834 in Indianapolis, 798 in Seattle, and 603 in Las Vegas.

These are reported incidents. Not predictions. Not model outputs. Not algorithmic scores assigned to zip codes based on demographic proxies. Each data point traces back to a specific report: a location, a time, a law enforcement filing. A user in Indianapolis who wants to understand gun violence exposure near their home or office is looking at the factual record of what happened near that address — not a model's guess about what might happen.

For developers building safety-oriented features — real-time crime alerts, neighborhood safety layers, executive travel risk assessment — this is the data to build on. The same principle holds at scale: the SpotCrime API surfaces incident-level data refreshed every 15 minutes across 22,000+ US cities, all traceable to source records.

What This Means for AI Agents Consuming Crime Data

As AI agents become active consumers of public safety data — routing decisions, risk assessments, real estate searches, travel planning — the descriptive vs. predictive distinction takes on a new dimension.

An AI agent reasoning about whether a neighborhood is safe for a client will, ideally, ground that reasoning in verifiable facts. If the agent's data source returns a neighborhood risk score derived from a proprietary predictive model, the agent cannot verify the inputs, assess the bias profile, or explain the output. It is trusting a black box. If the agent queries a descriptive API — here are the 14 reported incidents within 0.5 miles of this address in the last 90 days, here is the 36-month trend, here is the severity breakdown — it has something it can reason about, cite, and qualify.

The shift toward agentic AI in enterprise security, real estate, and family safety applications makes data provenance more important, not less. Agents that surface unverifiable predictive scores will be challenged by users and regulators. Agents that surface verified incident-based data — and can explain it — will be trusted.

Predictive Crime Data

✗Model-dependent — outputs vary by training data and algorithm
✗Feedback loops amplify historical enforcement bias
✗Individual risk scores create civil rights exposure
✗High-risk AI classification under EU AI Act
✗Outputs are claims, not facts — challengeable in court
✗Documented failures: LAPD, Chicago, New Orleans

Descriptive Crime Data

✓Fact-based — every data point traces to a reported incident
✓No demographic inputs or individual profiling
✓Verifiable, auditable, explainable to users
✓Consistent with journalism, insurance, and property data standards
✓No AI Act high-risk classification for aggregate scoring
✓AI agents can cite, qualify, and reason about outputs

Building on the Right Foundation

For developers, the practical guidance is straightforward: know what your crime data vendor is actually selling. Ask whether neighborhood safety scores are derived from reported incident counts or from a predictive model trained on arrest data, demographic proxies, or police deployment patterns. Ask whether individual addresses produce outputs traceable to specific public records. Ask how the vendor handles the bias audit question — because regulators and plaintiffs will ask it eventually.

Descriptive, incident-based data is not a lesser alternative to predictive modeling. For the overwhelming majority of developer use cases — real estate safety layers, family alert systems, corporate security assessment, travel risk scoring — it is the more accurate, more defensible, and more legally sound option. Predictive models are a solution to a problem most product builders don't have. The problem most builders have is: show me what actually happened, near this address, in the recent past, with enough context to make it meaningful.

That is exactly what descriptive crime data APIs provide. SpotCrime's incident feed covers 22,000+ US cities with 15-minute refresh cycles. neighborhood safety score aggregates that data into a single, explainable safety index derived entirely from reported incidents — no black boxes, no demographic inference, no contested statistical claims. For developers who want to build products their users and their lawyers can both stand behind, the foundation matters.

The Policy Tide Is Turning Against Prediction

For those watching the regulatory landscape, the direction of travel is clear. Predictive policing programs have been suspended, banned, or audited into obsolescence in an increasing number of jurisdictions. The EU AI Act's high-risk classification for law enforcement AI systems represents the most comprehensive regulatory action to date, but US state-level legislation is accelerating in the same direction.

The main SpotCrime blog has covered this arc in depth — from the March 2026 LAPD crime data lawsuit documenting how public agencies resist transparency, to the April 2026 coverage of AI in policing and the growing push for accountability. The broader picture is that public tolerance for opaque algorithmic systems in public safety contexts is declining, while the legal and regulatory infrastructure to challenge those systems is growing.

Products built on transparent, verifiable, fact-based crime data are positioned well for that regulatory environment. Products built on predictive black boxes are positioned for a reckoning.

The choice of data foundation is not a technical footnote. It is a product decision with legal, reputational, and ethical consequences that compound over time. Build on facts. Build on what happened. Build on data that can be explained to a user, a regulator, or a judge — because increasingly, all three will ask.