Data Quality

Our data, in plain English

We don't claim to be "most accurate" — that's a marketing word, not an engineering one. We publish every source, every refresh cadence, and every methodology choice so you can evaluate the data yourself.

3,771
Cities graded
51
States covered
203M
Population in our city dataset

What we publish (so you don't have to take our word for it)

Most crime-data vendors hide behind phrases like "proprietary" and "most accurate." Here is what we put in writing.

Named sources

Every feed we pull from is listed below by name, including who runs it and how often we refresh from it. No 'thousands of unnamed sources' fog.

Refresh cadence

Per-source cadence published in this table. Not 'real-time' as a slogan — actual numbers per source.

Methodology, in code

Our Crime Grade decile cutoffs are recomputed from the full national pull. The cutoffs are published on every page and in our git history. Audit it.

Coverage by city

We list which cities are covered, the population threshold, and what's missing. No vague 'we cover everywhere' claims.

What we don't do

No 'predictive crime AI.' No proprietary safety algorithms with no inputs disclosed. No 'most accurate' superlatives. See the explicit list below.

Changelog via git

Every refresh of our public pages is a git commit. The history is public on GitHub.

Sources we currently ingest

This is the live ingestion roster. New sources are added regularly; this list updates with every refresh.

SourceTypeRefreshCoverageNotes
Lexington Police DepartmentPolice feed15 minutesLexington-Fayette County, KYDirect CAD-derived feed via SpotCrime's public-records pipeline.
Baltimore County Police DepartmentPolice feedTwice weeklyBaltimore County, MD — 10 precinctsScraped from precinct blotter pages with deduplication against historical URLs.
Jonesboro AR Police DepartmentPolice PDF feedDaily (business days)Jonesboro, ARDaily PDF released on the agency's Google Drive; parsed into incidents.
FBI Crime Data Explorer (UCR / NIBRS)Federal aggregatorEvery 6 months (FBI publishes annually each September)All 50 states + DC, 3,000+ reporting citiesPowers the Crime Grade pages and state/national rate comparisons. Pulled directly from cde.ucr.cjis.gov — no third-party intermediary.
Bluegrass Crime StoppersTip-line / wanted-suspect feedQuarterlyFayette County, KYWanted-person and arrested-person snapshots, diffed against previous run to surface adds/removes/status changes.
Google NewsNews aggregatorEvery 5 hoursUnited States + Canada — shooting incidentsFiltered, address-validated, and run through a CSV schema validator before ingestion. Source URL preserved on every incident.
SpotCrime.comMaster crime indexReal-time22,000+ U.S. citiesThe original SpotCrime data lake aggregating thousands of agency, news, and community sources since 2007.

Plus the underlying SpotCrime.com data lake, which has aggregated police-feed, news, and community-reported incidents since 2007 across 22,000+ U.S. cities.

What we don't claim — and why

Common crime-data marketing phrases that we deliberately avoid.

"Most accurate"

There is no standardized benchmark for crime-data accuracy. Anyone can claim it; nobody can verify it. We publish sources and cadences so you can compare instead.

"AI-powered predictive crime intelligence"

Predictive policing has well-documented bias and reliability problems. Our Crime Grade is a transparent decile against FBI data — not a prediction.

"Proprietary safety algorithm"

Black-box scoring is fine for a startup deck. It is not fine for someone deciding where to live or where to staff a security detail. Our decile cutoffs are published.

"Real-time data" (vague)

We say specifically how often each source refreshes — 15 minutes for police CAD feeds, every 6 months for FBI annual data. There is no single 'real-time' number across all sources.

"22,000+ cities" without qualification

That number refers to the underlying SpotCrime data lake. Our public Crime Grade pages are calibrated against the FBI's roughly 3,000 city-population reporting agencies; we cite both numbers separately.

"Bank-grade security"

Our security posture is described concretely in the API terms. Slogans don't move us.

How we process the data

Ingestion. Each source has its own scraper or API client running on a published cadence. Failures alert; partial runs are logged. The full pipeline runs in version-controlled code.

Address parsing. Incidents must include a usable address — street with number, block address, or named cross-streets. Pure neighborhood names without specificity are rejected (we flag the rule explicitly on every news scan).

Classification. Nine canonical categories: theft, burglary, robbery, assault, arson, shooting, vandalism, arrest, other. Each one has a defined operational meaning; we publish the same definitions in our public Crime Categories page and on lexingtoncrime.com.

Deduplication. Incidents are deduped by (agency, address, datetime, category) tuples and against historical URLs. We do not silently merge — a duplicate flag remains on the record.

Aggregation for Crime Grade. For the FBI-derived pages, we pull annual-rate-per-100,000 from the FBI Crime Data Explorer, compute a decile against all reporting cities with population ≥ 10,000, and map deciles 1–3 to A, 4–5 to B, 6–7 to C, 8 to D, 9–10 to F. Decile cutoffs are recomputed each refresh and published on every page.

Refresh. The Crime Grade pages refresh every 6 months (around the FBI's September release). Police-feed sources refresh on their own cadences (see table above). Each refresh writes a git commit; the history is public.

Try the data, then evaluate it

Run a side-by-side test against any vendor that claims to be "most accurate." We will publish the comparison.