How Crime Lands on the Map: The Geocoding Problem Every Crime Data API Has to Solve

A police record arrives as text: an offense code, a timestamp, and a location written the way a human dispatcher wrote it — “1200 BLK MAIN ST” or “MAIN ST / 5TH AVE.” Before that record can be a pin on a map, a heat cell in a hotspot, or an input to a block-level safety score, software has to convert that text into a pair of coordinates. That conversion is the least-discussed and most error-prone step in the entire crime data pipeline. This is a field guide to where the error comes from, how big it is, and what to demand of any API that claims to put crime “at an address.”

The step nobody puts in the marketing copy

Every crime data product makes the same implicit promise: that the dot it draws is where the thing happened. The promise is rarely examined because the underlying step — geocoding — is invisible when it works and silent when it fails. A record that geocodes to the wrong block does not throw an error. It draws a dot. The dot looks exactly as authoritative as a correct one.

This matters more, not less, in a falling-crime environment. The Real-Time Crime Index, drawing on 566 law enforcement agencies covering roughly 118.6 million people, reported violent crime down 6.4% and property crime down 11.4% for January through April 2026 versus the same period in 2025, with murder down 18.7% and motor vehicle theft down 20.3% (crimeindex.org). As counts fall, each individual incident carries more weight in any local rate. When a block has three burglaries instead of thirty, putting one of them on the wrong block is no longer rounding error — it is a third of the signal. Geocoding precision and low base rates compound.

Five ways to turn an address into a point

Geocoders are not interchangeable. They differ in the reference data they consult and in what they do when the address does not match cleanly. In rough order of decreasing precision:

Parcel / rooftop. The geocoder matches the address to a specific tax parcel or building footprint and returns its centroid. Typical error: a few meters. This is the gold standard, and it is only available where parcel data exists and is current.
Address-range interpolation.The dominant method. The geocoder finds the street segment whose address range contains the number — say, 1200–1298 Main St — and estimates a position by linear interpolation along that segment. It assumes addresses are evenly spaced and that even and odd sides are symmetric. Neither assumption holds on cul-de-sacs, on blocks with a single large parcel, or where a vacant lot leaves a gap in the numbering. Typical error: tens of meters, but the tail is long.
Intersection.“MAIN ST / 5TH AVE” resolves to the crossing point of two segments. Usually accurate to the intersection itself, which may be tens of meters from where the incident actually occurred mid-block.
Street-segment centroid.When only a block is known — the common case for published data, discussed below — the geocoder returns the midpoint of the segment. Error is bounded by half the block length, often 50–100 meters in a dense grid and far more in a rural area.
ZIP or place centroid. The fallback of last resort. When nothing else matches, the record lands at the centroid of a ZIP code or municipality. Error: from hundreds of meters to several kilometers. A record geocoded this way is, for block-level purposes, noise wearing the costume of a data point.

The cardinal rule: a coordinate is only as meaningful as the method that produced it. A pipeline that does not preserve the geocoding method alongside the coordinates has thrown away the one field that tells you whether to trust the dot. A latitude and longitude with no precision flag is a number pretending to be a fact.

The hundred-block convention is a feature, not a bug

Most US police departments do not publish exact addresses. They publish to the hundred-block: “1200 block of Main St,” a deliberate generalization of the true address to the nearest hundred in the house number. This is the standard privacy control for incident data, and it exists for good reason — it keeps victim and caller addresses out of a public feed while preserving enough geography to be useful. We have written separately about what crime data APIs suppress and why.

The consequence for geocoding is structural, not incidental: when a department publishes to the hundred-block, the best achievableprecision is segment-centroid, full stop. No geocoder, however good, can recover an exact rooftop from a location the source deliberately blurred. A vendor that returns rooftop-looking coordinates for hundred-block input is not being more precise — it is fabricating precision the source data never contained. The honest representation of a hundred-block incident is a point at the block midpoint, flagged as block-level, displayed with a radius or a thicker symbol rather than a needle-sharp pin.

The match rate is the number that matters

Every geocoding run produces a match rate: the share of input records that resolved to a usable coordinate. The records that did not match — the unmatched residual— are where the real risk lives, because what a pipeline does with them silently determines the shape of every map downstream. There are three common behaviors, in increasing order of danger:

Drop them.Unmatched records are excluded. This is defensible if you report the drop rate, and indefensible if you do not, because a reader sees a complete map and has no way to know that, say, 12% of incidents are missing — and missing non-randomly. New construction, rural routes, and recently annexed areas geocode worse, so the dropped records cluster geographically.
Fall back silently.Unmatched records are pushed down the precision ladder — from interpolation to ZIP centroid — without a flag. Now a pile of low-precision points masquerades as the real thing, and because ZIP centroids are a single coordinate, those records stack into a phantom hotspot exactly at the ZIP center.
Default to a sentinel.The worst case: unmatched records receive a placeholder — (0, 0), the geographic center of the city, or the coordinates of police headquarters. Anyone who has worked with raw municipal feeds has seen the telltale spike of incidents at city hall. It is not a crime wave at city hall. It is the geocoder's failure bin rendered as data.

~5m

parcel / rooftop error

50–100m

block-centroid error, dense grid

>1km

ZIP-centroid fallback error

The spatial error budget

Borrowing the engineering habit of an explicit error budget, the total displacement between a published dot and the true location is the sum of several independent sources:

Source generalization. The hundred-block convention contributes up to half a block before any software runs. This is the floor, and it is set by the publisher, not the vendor.
Method error. Interpolation versus parcel versus centroid, as above. The same address run through two reputable geocoders routinely disagrees by tens of meters; the disagreement is itself a usable estimate of uncertainty.
Reference-data staleness.Street centerlines and parcel files are snapshots. A subdivision built last year is a blank space in a two-year-old reference file, so its addresses interpolate onto the nearest old segment — sometimes hundreds of meters away.
Parsing error. Before any geocoding, the free-text location has to be parsed into components. Directionals (N vs S Main), suffixes (St vs Ave), and unit numbers are routine failure points, and a parse error upstream produces a confident, precise, wrong coordinate downstream.

These do not cancel. A record can be generalized to the block, interpolated onto a stale segment, and parsed with the wrong directional, and the errors can stack in the same direction. The practical takeaway is that a single point is the wrong mental model. The honest object is a point plus an uncertainty — a radius, a confidence tier, or at minimum the method that produced it.

Why a hundred meters changes the answer

Tens of meters sounds tolerable until you remember that the consumers of crime coordinates are boundary-sensitive. A census block group, a school attendance zone, a hexagonal heat cell, and the radius of a “crime within 0.5 miles” query are all defined by edges. A point that is 80 meters off does not produce an answer that is 80 meters wrong; it produces an answer that is in the wrong bin — categorically, not proportionally. The incident counts for the right block and the wrong block both change by one. As we have argued in our walkthrough of crime hotspot mapping, the choice of bin and bandwidth can manufacture or erase a hotspot from identical inputs. Geocoding error is the upstream version of the same problem: it moves the point before the binning ever begins.

This is also where the temptation to over-claim certainty becomes dangerous. Geoff Circo's analysis of classifier calibration makes a point that transfers cleanly from machine-learning confidence scores to geocoding: a system can be confidently wrong, and confidence is not accuracy. A geocoder returns a coordinate with no error bars by default. Treating that coordinate as exact is the spatial equivalent of trusting an overconfident token probability — the number looks precise precisely because the uncertainty has been discarded, not because it was small.

What to demand of a crime data API

If you are evaluating or building on a crime data API — the broader version of this checklist is in our guide to how to evaluate a crime data API — geocoding deserves its own set of questions:

Per-record precision flags. Every coordinate should carry the method that produced it (parcel, interpolation, block centroid, ZIP fallback). If the API cannot tell you how a given dot was geocoded, it cannot tell you whether to trust it.
A published match rate.What share of incidents geocode to street-level or better, and what happens to the rest? “Drop and report” is acceptable; “silent ZIP fallback” is not.
No fabricated precision. Hundred-block input should yield block-flagged output, not invented rooftop coordinates. Precision should never exceed the source.
Reference-data currency. How recent are the street and parcel files, and how are new addresses handled? Stale references fail hardest exactly in the fast-growing areas where demand for the data is highest.
Sentinel detection.The vendor should be able to confirm that (0, 0), city-center, and headquarters pile-ups have been identified and excluded rather than served as incidents.

None of this is exotic. The crime-analysis community has built these checks into routine tooling for years; CrimeDe-Coder's walkthrough of a practical Python data-science pipeline for crime data treats geocoding diagnostics as a first-class step, not an afterthought. The gap is not knowledge; it is whether the API you consume exposes any of it.

The honest version

Geocoding cannot be made perfect, because the source data is generalized by design and the reference layers are always somewhat stale. That is not a reason to distrust crime maps; it is a reason to insist they carry their uncertainty rather than hide it. The national picture — a US violent crime rate of 359 per 100,000 and a property crime rate of 1,760 per 100,000 in 2024, both down year over year (USAFacts) — is robust to a few mis-geocoded points. A claim about your block is not. The lower the count and the smaller the geography, the more the geocoding step decides whether the answer is right.

So the standard is modest and specific: preserve the method, publish the match rate, never fabricate precision the source did not contain, and represent each incident as a point with a known uncertainty rather than a needle that implies one it does not have. That is the difference between a map that informs a decision and a map that merely looks like it does.