Inside SpotScore™: How a Block-Level Neighborhood Safety Rating Is Actually Built

Neighborhood safety scores are everywhere. Real estate listings have them. Family safety apps have them. Insurance underwriting workflows have them. Almost none of them publish a methodology that a working analyst could reproduce. This post walks through how SpotScore™ is actually constructed — what goes in, how it is normalized, how it is weighted, and what the score is and is not claiming to measure.

The motivation for writing this down is simple. A safety score is a derived quantity. Two scores trained on the same incident feed can disagree by a wide margin depending on choices the authors made about geography, weighting, and normalization. A developer integrating a third-party score into a product should be able to interrogate those choices. A consumer looking at a score on a listing page deserves to know what they are looking at.

We are going to lean toward methodological transparency over marketing here. Where we make a judgment call, we will name it. Where the literature is unsettled, we will say so. Where SpotScore™ differs from competing approaches, we will say what we do and why.

What a safety score is trying to do

A neighborhood safety score is a forecast dressed up as a description. The score you see on a listing today is built from incidents that already happened. The user reading the score is implicitly asking a forward-looking question: how safe will this address be over the period I plan to live, work, or invest here?

That gap — between retrospective measurement and prospective inference — is the entire methodological problem. Most published scores paper over it. The responsible approach is to acknowledge it and to design the score so that the retrospective signal is as informative about the prospective question as the data allow.

Three design constraints fall out of that framing:

Resolution must match the question. A ZIP code is not a neighborhood. The 11226 ZIP in Brooklyn covers roughly 100,000 people across blocks with materially different incident profiles. A score that averages over that area is, in effect, a score for a different geography than the one the user cares about.
Categories must be weighted, not summed. A burglary and a homicide are not equivalent events. Producing a single number from raw incident counts without weighting categories implies a value judgment — and an unstated one.
Comparison must be honest. A score of 7 in Birmingham, Alabama and a score of 7 in Burlington, Vermont have to mean the same thing, or the score is unusable for anyone making cross-market decisions.

The data layer

SpotScore™ is computed from the same incident corpus that drives the SpotCrime crime map and the SpotCrime API: direct feeds from each covered law enforcement agency, normalized into a consistent incident taxonomy, geocoded to a precision consistent with what the originating agency publishes.

That last point matters and is worth stating plainly. We do not invent precision we do not have. Many agencies publish incidents at the hundred-block level — “100 block of Main St” — rather than at the parcel level, to comply with state public records statutes and to avoid identifying victims of crimes like domestic violence and sexual assault. SpotScore™ inherits that ceiling. A block-level score is what the data support; a parcel-level score generally is not, and platforms claiming otherwise are usually interpolating in a way that the underlying records do not justify.

The other data inputs are population, residential and commuter density, and the comparison cohorts we use for normalization. We come back to those below.

Normalization: the part most scores get wrong

A raw count of incidents in a square mile tells you nothing about safety without knowing how many people are in that square mile. Crime rate, not crime count, is what you want.

That sounds obvious. In practice, most published neighborhood scores either skip normalization entirely or use a single national denominator. Both choices are wrong in opposite ways. Skipping normalization punishes dense neighborhoods for being dense. Using a single national denominator punishes high-density urban areas for not looking like suburban averages.

The 2024 FBI Uniform Crime Reporting data, summarized by USAFacts, put the US violent crime rate at 359 per 100,000 and the property crime rate at 1,760 per 100,000, down 5.4% and 9% year over year. State-level rates vary by roughly a factor of seven for violent crime — Alaska reported 724 per 100,000, Maine 100 per 100,000 — and by roughly a factor of four for property crime, with New Mexico at 2,751 and Idaho at 736. A score that does not adjust for those structural differences is not a safety score; it is a state-population proxy.

SpotScore™ normalizes incidents against a daytime-adjusted population estimate at the block-group level. Daytime adjustment matters in commercial corridors and transit-dense neighborhoods where the resident population is a small fraction of the population physically present during business hours. A downtown block with 50 residents and 5,000 daily commuters should not be evaluated against the 50-resident denominator.

We then express the block's incident rate as a percentile relative to two cohorts: the surrounding metropolitan statistical area, and a national peer cohort of similar block-group types. The metro percentile is the one most users care about; the national percentile is what makes cross-market comparison possible. Both are published.

Weighting: who decides a burglary is worth less than a robbery?

Crime category weights are where transparency tends to evaporate. We will publish ours.

SpotScore™ weights incidents in three tiers, anchored to the federal Part I categories used in the FBI Uniform Crime Reporting program:

Tier 1 (highest weight): homicide, non-negligent manslaughter, forcible rape, robbery, aggravated assault, and confirmed shootings.
Tier 2 (intermediate): burglary, motor vehicle theft, arson, and weapons offenses.
Tier 3 (lowest weight): larceny, vandalism, and order-maintenance offenses such as disorderly conduct.

The relative ratios are 5:2:1. A homicide on a block contributes five times the score impact of a larceny on the same block.

These weights are a deliberate choice, not an empirical derivation. We have anchored them to the National Crime Victimization Survey's severity weights and to the cost-of-crime literature, but the fact that a homicide is worse than a broken window is a value judgment, and the magnitude of that worseness is also a value judgment. We chose ratios that are defensible from the literature and that avoid the two failure modes we see most often: collapsing all violent crime into a single bucket (which inflates scores for areas with a single tragic incident), and giving property crime so much weight that downtown commercial blocks score as unsafe purely because they have higher larceny counts.

Resolution: why block-level and not parcel-level

We publish at the block group and, where the source data support it, the individual block. We do not publish a parcel-level safety score, and we specifically do not produce one by interpolation.

The reason is straightforward: the underlying records do not support that resolution. A score that suggests two addresses 150 feet apart on the same block face have meaningfully different safety profiles is overclaiming what the incident data can say. The geocoding error in the source records — typically tens to hundreds of feet at the block level — exceeds the implied resolution of a parcel score in most cases.

This is the same principle that drives the methodological guidance we laid out in our earlier post on hotspot mapping for developers: the right resolution is the one the data can sustain, and aggressive smoothing into apparent precision is a way of laundering uncertainty into the user's decision.

Time decay and the rolling window

A burglary three years ago is not the same signal as a burglary three weeks ago. SpotScore™ uses a rolling 36-month window with an exponential time decay. The half-life is 18 months for property offenses and 24 months for violent offenses — violent incident patterns are less variable from month to month and more informative about the underlying rate, so they should age out of the score more slowly.

The 36-month outer window is a compromise. Too short a window and the score becomes noisy on blocks with low base rates. Too long a window and the score stops responding to genuine changes in neighborhood conditions — and there have been a lot of those over the past five years. US violent crime declined an estimated 15% in 2024 and a further estimated 20% in 2025 according to the available preliminary data, the steepest sustained drop in a generation. A score built on a ten-year window would be telling users about a city that does not exist anymore.

Calibration: what the number is claiming

We display SpotScore™ on a 1-to-10 integer scale. That is a deliberate compression — a continuous score expressed to two decimal places implies a precision the underlying data do not support. The integer scale is honest about the fact that the difference between a 7.1 and a 7.3 is not meaningful, and the difference between a 4 and an 8 is.

Calibration here is the question of whether a score of 7 actually corresponds to the safety profile a user would infer it does. This is the same calibration problem that, in machine learning, has produced a growing literature on classifier reliability. The calibration work on LLM confidence scores using NEISS injury data is a good example of how confidence outputs that look reasonable can be systematically miscalibrated — a 0.8 confidence score might correspond to an actual hit rate of 0.6. The same risk applies to any composite score, including safety scores.

We calibrate SpotScore™ against two reference distributions. The first is the empirical distribution of incident rates across our covered geography — a score of 5 should land near the median, a score of 9 near the 90th percentile, a score of 1 near the bottom 10%. The second is a sanity check against the survey-based perception data we have access to: users in areas scored 1–3 should report materially different perceived safety than users in areas scored 8–10. Where those two cross-checks disagree, we investigate before we ship the change.

What SpotScore™ does not claim

The most important sentence in any methodology document is the list of things the metric is not. For SpotScore™:

It is not a prediction about you specifically. A block-level rate tells you nothing about your personal exposure, which depends on circumstances, hours, behavior, and chance. The score is a property of the block, not of the resident.
It is not a substitute for due diligence. For real estate transactions, insurance underwriting, or executive protection use cases, the score is the starting point of an investigation, not the conclusion of one.
It is not a causal model. Two blocks with the same score can have entirely different underlying dynamics — gang-related violence concentrated on weekend nights versus daytime commercial property crime — and the implications for a resident or developer differ. We expose the category breakdown alongside the score for exactly this reason.
It is not a predictive policing tool. The score is descriptive of past incidents and is not designed to guide enforcement allocation. We have written separately on the distinction between predictive and descriptive crime data and why developers should be careful not to drift across that line.

How developers should display the score

A score is only as honest as the interface that presents it. A few practical guidelines for product teams integrating SpotScore™:

Show the category breakdown. A composite is useful for ranking; the breakdown is what lets a user make sense of it. The same 6 score driven mostly by larceny tells a different story than a 6 driven by aggravated assault.
Show the comparison cohort. A block scored 7 in San Francisco and a block scored 7 in suburban Indiana are roughly equivalent on the national percentile, but a user trying to choose between them will want to know that both are 7 against the national distribution and also see the metro percentile, which can diverge.
Show the freshness date. An incident feed is only as current as the most recent ingestion from the source agency. A few agencies update daily; others lag by a week or more. The display should show the as-of date, not the date the page was rendered.
Do not use the score as a binary gate. A score below 4 is not a fail; a score above 8 is not a pass. Treating the score as a threshold produces fair-housing exposure that descriptive scores were specifically designed to avoid, especially in real estate workflows that HUD has only recently clarified.

What we will publish next

We are working on a methodology appendix that will publish the exact normalization denominators, the time-decay function, the category weight matrix, and the calibration cross-checks. The current post is the prose version of that document. The intent is that any analyst — using the tools described in the practitioner posts like the Python data science guide for crime analysts at Crime De-Coder — should be able to reproduce SpotScore™ on a sample of cities given the source feeds and the published parameters.

That is the standard. A safety score that cannot be reproduced from its inputs by a competent third party is not a score; it is a brand. Crime data deserves better.