A man forces a window, enters a house, and takes a laptop. The responding officer's computer-aided dispatch system records it under a local code. The department's annual FBI submission files it as one thing. The same event, run through the bureau's older counting rules, would have been filed as something else. Three systems, one burglary, three different names — and that is before anyone tries to put it on a map. For developers building on crime data, the collection problem is largely solved. The classification problem is not.
Most people assume that “a crime” is a discrete, countable thing — like a transaction in a ledger. In practice, an incident is a bundle of facts that has to be forced through a classification system before it becomes a number. There are at least three such systems operating in the United States at any given moment, they do not agree with one another, and the disagreements are not random. They are structural, documented, and predictable. Understanding them is the difference between a crime data product that means what it says and one that quietly miscounts.
Three systems, one incident
The first system an incident touches is the local one. When an officer or a records clerk classifies a report, they use the offense codes defined by their own agency's records management system (RMS), which in turn often inherit from the state penal code. These local taxonomies are enormous and idiosyncratic. A mid-sized department may carry several hundred distinct offense codes; a large one, well over a thousand. “Theft” alone can fan out into dozens of codes distinguished by dollar value, location type, and method. None of this is standardized across jurisdictions. A “larceny from a vehicle” in one city is a “theft — auto burglary” in the next county and a numbered statute reference in the third.
The second system is the FBI's Summary Reporting System (SRS), the backbone of the Uniform Crime Reports for nearly a century. SRS collapses the local mess into a small set of categories — the familiar Part I offenses (murder, rape, robbery, aggravated assault, burglary, larceny-theft, motor vehicle theft, arson) and a broader set of Part II offenses. It is simple, comparable across decades, and — by modern standards — lossy in ways that matter.
The third system is the National Incident-Based Reporting System (NIBRS), which the FBI made its standard in 2021. NIBRS records each incident as a structured object: up to ten offenses, with victims, offenders, properties, and relationships attached. It is far richer than SRS — and the transition between the two created one of the largest discontinuities in the history of American crime statistics. Per USAFacts, the FBI's state-level series now runs on SRS data through 2020 and NIBRS data through 2024, because the underlying counting machinery changed midstream.
The core problem in one sentence
The same physical event produces different counts depending on which classification system processes it — so any dataset that blends sources, or spans the 2021 transition, is comparing things that were never measured the same way.
The hierarchy rule: how SRS undercounts by design
The clearest illustration of why classification matters is the SRS hierarchy rule. Under SRS, when a single incident involves multiple offenses, the agency reports only the most serious one. A robbery that also involves an aggravated assault and a stolen vehicle is counted once — as a robbery. The assault and the theft disappear from the national totals.
This was a reasonable simplification in an era of paper forms and adding machines. It is also a systematic undercount of everything below the top of each incident's severity stack. NIBRS removes the hierarchy rule: it records all offenses in an incident, up to ten. The consequence is mechanical and unavoidable — when an agency switches from SRS to NIBRS, some categories appear to rise simply because offenses that were previously absorbed by a more serious co-occurring crime are now counted. The events did not change. The counting rule did.
This is why responsible analysts hedge so heavily around the 2021 transition, and why a year-over-year comparison that straddles it can be close to meaningless without adjustment. It is also why “crime went up in category X” claims drawn from raw NIBRS-versus-SRS comparisons deserve immediate skepticism. The first question is always whether you are looking at a change in the world or a change in the ledger.
What the national numbers look like — and why the shape matters
For grounding, here are the latest FBI figures as compiled by USAFacts for 2024. The national violent crime rate was 359 per 100,000 people; the property crime rate was 1,760 per 100,000. Violent crime fell 5.4% year over year and property crime fell 9%. Both continue a long decline — overall crime is down roughly 49% since 2001 by the same source.
The internal composition is the part developers should internalize, because it determines where classification errors do the most damage:
Violent crime, by share
- Aggravated assault — 71.3%
- Robbery — 16.9%
- Rape — 10.4%
- Murder — 1.4%
Property crime, by share
- Larceny-theft — 72.3%
- Motor vehicle theft — 14.7%
- Burglary — 13.0%
Two observations follow directly. First, the categories that dominate the counts — aggravated assault and larceny-theft — are precisely the ones where local definitions vary most and where the SRS hierarchy rule does the most quiet work. Murder, at 1.4% of violent crime, is the most consistently classified offense in the entire system; almost everything above it in volume is harder to pin down. Second, the categories that fluctuate most in public discourse are often a small slice of the total. A product that surfaces a single “crime” number without exposing this composition is hiding the categories where its own classification choices matter most.
The normalization problem, stated precisely
A crime data API that covers many jurisdictions does not get to pick one of these systems and stop. Local feeds arrive in local taxonomies. Some agencies publish NIBRS-aligned data; many still publish in their own RMS codes; the daily incident feeds that power real-time products almost never arrive pre-classified into FBI categories at all. The job of normalization is to map all of it onto a single, stable taxonomy that means the same thing in Houston as it does in Hartford — without inventing precision that the source data does not contain.
This decomposes into four concrete engineering problems:
- Taxonomy mapping.Each source's offense codes must be mapped to a common category set. This is mostly a many-to-one problem, but the hard cases are the ones where a single local code spans two normalized categories, or where the local code is genuinely ambiguous. Every map entry is a small judgment call, and the quality of the dataset is the sum of those calls.
- Severity assignment. Once normalized, incidents need a comparable severity so that counts, scores, and heatmaps weight a homicide differently from a noise complaint. This is where the hierarchy-rule lesson applies in reverse: if you collapse multi-offense incidents to their top offense for display, say so; if you count every offense, say that instead. Both are defensible. Doing one while implying the other is not.
- Deduplication. The same incident often appears in more than one feed, or is updated across multiple records as a case develops. Counting it twice inflates exactly the dense urban areas where users scrutinize the data most closely.
- Geocoding and suppression. An incident is only useful at the address level if it is placed correctly, and only publishable if sensitive cases are suppressed at the source. Both happen before classification is meaningful for a map. We covered the suppression layer in detail in our guide to what crime data APIs suppress and why.
None of these steps is glamorous, and none can be skipped. A raw municipal feed is not a product; it is a liability with a schema. The work that turns it into something a developer can build on is almost entirely this normalization layer. For a hands-on view of the tooling crime analysts use to do adjacent work — pandas, SQL, automated CompStat-style reporting — the practical walkthrough at crime de-coder is a useful reference point.
Where AI helps, and where it quietly hurts
Taxonomy mapping is an obvious candidate for a language model. Given an unstructured offense description, a model can propose a normalized category, and for the clean majority of records it will be right. The danger is in the tail, and specifically in the confidence the model reports.
There is good evidence that LLM confidence scores are not well-calibrated — that the probabilities a model attaches to its own outputs are systematically overconfident. The analysis of AI classifier calibration on NEISS injury data at gmcirco's calibration walkthrough makes the point concretely on injury classification, which is structurally close to offense classification. The practical consequence: a model that says it is 95% sure about an ambiguous offense code may be right far less than 95% of the time, and if you route records on that number, your error rate will be worse than your dashboard claims.
The defensible pattern mirrors the one we have argued for repeatedly: deterministic code owns the counts, the model assists with language-shaped subtasks, and a human owns the ambiguous map entries. A model can draft a taxonomy mapping; it should not silently finalize one. The cost of getting this wrong is not abstract. When an alerting system optimized for speed misread a routine police radio transmission as a school shooting — the CrimeRadar incident in Missouri this April, which we examined in detail — the failure was a classification failure at the input layer. As one account put it, systems like that are built to be fast, and safety requires being right, and those two things are not the same.
What this means if you are buying or building
If you are evaluating a crime data API, the taxonomy is where you should push hardest, because it is where vendors are most tempted to imply precision they do not have. A short checklist:
- Ask for the taxonomy, in writing. A serious provider can hand you its normalized category list and explain how it maps local codes onto it. If the answer is vague, the normalization is probably vague too.
- Ask how multi-offense incidents are counted. Top-offense-only or every-offense? There is no wrong answer, only an undocumented one.
- Ask what happens across the 2021 NIBRS transition. Any historical series that spans it should either adjust for the discontinuity or flag it. Silence is a red flag.
- Treat a single composite “safety” number with appropriate suspicion. It can be honest, but only if the methodology behind it is published. We wrote up how we approach this in the SpotScore methodology walkthrough.
- Distinguish the index from the incident. National samples like the Real-Time Crime Index are built to mimic national trends from a sample of agencies. They are excellent for direction and useless for “what happened on this block.” The two jobs require different data, and we compared them directly in our RTCI-versus-UCR breakdown.
The honest version of a number
The recurring theme across all of this is that a crime statistic is not a measurement in the way a temperature is. It is the output of a classification pipeline, and every stage of that pipeline embeds decisions that could reasonably have gone another way. The hierarchy rule was a decision. The NIBRS transition was a decision. Every entry in a taxonomy map is a decision. None of this makes crime data unusable — the long decline documented by USAFacts, roughly 49% since 2001, is real and visible through all three systems. It does mean that the responsible way to ship a crime number is to ship the decisions behind it.
That is the unglamorous core of the work. Anyone can serve a feed. The value is in the normalization layer that makes the feed mean the same thing everywhere it is used — and in the discipline to say so when it does not. Raw data is not enough, precisely because raw data has already been classified three times before it reaches you, and never the same way twice.
Access Address-Level Crime Data
Real-time incidents · SpotScore™ safety ratings · 36-month trends · 22,000+ US cities. Normalized and verified — because raw data isn't enough.