Almost every crime data API on the market is an aggregator of aggregators. One company scrapes open data portals; another resells the first's feed under a different brand; a third bundles the second into a real estate product. By the time the data reaches a developer it has passed through two or three intermediaries, with each layer adding latency, dropping fields, and quietly inheriting the upstream gaps. SpotCrime is built on a different model. We contacted every police department whose data we publish. We operate under a direct arrangement with each. We are not owned by a real estate platform, an insurance carrier, or a public records reseller, and we do not source any of our incident data through a competing aggregator. This piece is about why the pipeline matters as much as the data — and what an independent, department-direct source actually changes in practice.
The aggregator-of-aggregators problem
The shortest way to start a crime data API in 2026 is to find an existing crime data API, sign up for the developer plan, and resell its feed. This is not hypothetical. A meaningful share of the products marketed as “real-time crime data APIs” are thin wrappers over one of three or four upstream sources, and a meaningful share of those upstream sources are themselves wrappers over open data portals plus a handful of scraped municipal sites.
The pattern has three predictable failure modes.
Coverage inheritance.If the upstream feed does not cover a city, the wrapper does not cover that city. There is no mechanism by which a reseller can fix the gap, because the reseller does not have a relationship with the missing department. The map looks comprehensive until a customer asks about a specific small or mid-sized city and the answer is “not in our coverage area,” with no path to add it.
Silent breakage. When a municipal site changes its schema, a scraping pipeline breaks. The reseller does not know it broke. The customer sees a feed that has gone quiet and assumes the city has had a quiet week. We have seen public crime maps continue to display data for jurisdictions whose underlying source had been offline for weeks.
Field attrition. Each layer of the stack tends to drop fields. The original record may have included a narrative description, an officer disposition, a clearance status, and a precise time-of-call. By the time it has passed through two intermediaries it may have only category, latitude, longitude, and date. Developers building on the downstream wrapper assume that is what the police department published. It is not.
What “direct from the department” actually means
Crime data from a US law enforcement agency reaches the public through one of three channels, with very different downstream properties.
Open data portals. Cities like Chicago, Los Angeles, and Seattle publish incident data on Socrata or similar platforms. The data is genuinely public, but the publication cadence, field set, and address precision are controlled entirely by the city. When a city decides to redact, suppress, or delay, every downstream consumer is affected at once. The LAPD case we covered earlier is the clearest recent example.
Public records scraping. For departments without a portal, some aggregators scrape public-facing crime maps, daily blotters, or press releases. This is legal where the underlying records are public, but it is fragile (any layout change breaks the pipeline), it is incomplete (most departments do not publish their full incident log on a public page), and it provides no recourse when the data is wrong.
Direct data-sharing arrangements.The third channel is a formal arrangement with the department in which the department transmits its incident data to a specific recipient on an agreed cadence, under an agreed schema, with the department's approval. The recipient is in a position to ask about gaps, request additional fields, and notify the department when the upstream feed appears to have stalled. The relationship is bidirectional. This is the channel SpotCrime operates on.
To be precise about what that means: SpotCrime contacted every police department whose data appears in our feed. Each department reviewed the use case and approved data sharing in some form — in some cases by directing us to an existing public feed, in others by establishing a direct transmission, in others by clarifying which fields they were and were not comfortable publishing. The approval is not a marketing claim. It is a precondition for being in our coverage area at all.
Why direct contact changes coverage
The most important consequence of department-direct sourcing is that coverage is bounded by what we can negotiate, not by what the open data portal ecosystem happens to include.
The portal ecosystem skews to large cities. Large city departments, especially in jurisdictions with active public records traditions, were the early adopters of Socrata and similar platforms. Small and mid-sized departments — which together account for the majority of US law enforcement agencies and a substantial share of the population — very often do not run portals at all. There are roughly 18,000 state, county, and municipal law enforcement agencies in the United States. The number with a public, queryable incident data portal is in the low hundreds.
A pure-portal aggregator's coverage map is a coverage map of municipal IT budgets, not a coverage map of where crime happens. Direct department outreach is the only mechanism by which a crime data API can extend into jurisdictions that publish nothing on the open web.
Why direct contact changes freshness
Open data portals update on the schedule the city sets. For most large municipal portals, that is a daily or weekly batch, often with a multi-day lag from event to publication. Scraped sources update on whatever schedule the underlying public-facing page updates, which is generally similar. Neither model produces a feed that is meaningfully “real-time” in the sense a developer reading marketing copy would assume.
A direct arrangement allows for a faster transmission cadence where the department is willing to provide one — in some cases hourly, in some cases close to real-time on a push basis, in some cases on the same daily cadence as the portal but without the intermediary publication step. The honest framing is that direct sourcing does not automatically make data real-time; it makes the cadence negotiable rather than externally fixed.
Why direct contact changes normalization
Crime incident taxonomy is not standardized across departments. The Los Angeles Police Department uses several hundred incident codes; smaller departments use a few dozen; some use NIBRS categories directly and some use legacy local codes. Mapping these onto a single normalized taxonomy is the most expensive and error-prone part of running a national crime data feed, and it is the part of the pipeline where aggregator-of-aggregator products most visibly fail.
Direct contact with each department means we can ask. When a Tulsa code translates to two different categories under our taxonomy depending on context, we can ask the department's records manager which interpretation matches their internal usage rather than guessing. When a small department asks us to suppress a specific category — juvenile incidents, certain domestic categories, sex offenses that have not yet been judicially confirmed — we can comply at the source rather than discovering the issue after a complaint. Our taxonomy mapping is reviewed quarterly with input from departments that have flagged misclassifications.
Why direct contact changes the legal posture
Public crime data sits at the intersection of three legal regimes: state public records laws (which determine what is public), the federal Criminal Justice Information Services policy (which determines what cannot be commingled with restricted criminal history information), and a patchwork of subject-matter privacy statutes covering juveniles, domestic violence, sexual assault, and other categories. We covered the developer-facing implications of those regimes in our piece on what crime APIs suppress and why.
An aggregator-of-aggregator product inherits the legal posture of its upstream sources whether it wants to or not. If an upstream scraper inadvertently pulled juvenile incidents that should have been suppressed under state law, every downstream API publishes them. The reseller has no path to audit the upstream pipeline because they do not own it. A direct relationship with each department, by contrast, includes an explicit conversation about what is and is not in scope. The department's records manager knows what is supposed to be redacted under state law and what is not; the conversation forces those choices to be explicit rather than accidental.
Independence: who owns the API matters
A crime data product's incentive structure is shaped by who owns it. A crime data feed owned by a real estate listing platform has a structural interest in how crime appears next to a specific set of listings. A feed owned by an insurance underwriter has an interest in how it shapes pricing decisions. A feed owned by a real-time location platform has an interest in how it surfaces in a parent product's app. None of those are inherently malign; they are simply alignments that shape the product.
SpotCrime is not owned by a real estate platform, an insurance carrier, a public records reseller, or a consumer location app. The customer base spans real estate, insurance, family safety, executive protection, and journalism, and no single customer category accounts for a controlling share. The result is that taxonomy decisions, geographic coverage decisions, and suppression decisions are made on the basis of what the underlying data and the participating departments support — not on the basis of what any single downstream platform would prefer.
This is also why we are willing to publish posts that argue against the use of our category of product where the data does not support a use case — for example, our piece on the limits of descriptive crime data for predictive applications, or our analysis of the CrimeRadar false alert. A product whose incentive structure aligned with a single downstream buyer would have a harder time publishing either.
The trade-offs of the direct model
None of this is free, and the honest framing requires acknowledging the trade-offs. A department-direct, independent pipeline is slower to expand into new jurisdictions than a scraping pipeline; each new department requires an outreach conversation, a review of the proposed use, and in some cases a written arrangement. It is also more expensive to operate at the margins, because the per-department engineering cost is non-zero. There are coverage gaps in our map where we have not yet completed outreach, and we publish the coverage explicitly rather than implying universal coverage we do not have.
The alternative — claiming universal coverage and silently scraping anything we can find — would expand the marketing surface and contract the legal surface in the wrong direction. We have not found a way to make that trade make sense for a product whose primary buyers (developers, underwriters, security analysts) need to be able to defend their data source under audit.
What developers should ask any crime data API
Independent of which API a developer chooses, the questions that distinguish a direct source from an aggregator-of-aggregators are not hard to ask. We covered the full framework in our piece on how to evaluate a crime data API. The four highest-signal questions:
1. Where does the data for jurisdiction X come from?A direct source can name the department contact, the transmission method, and the cadence. An aggregator-of-aggregators will name an upstream vendor or wave at “public records.”
2. What happens when a jurisdiction's feed stops updating? A direct source notices because the relationship is bidirectional. A scraping pipeline often does not, because the silence looks the same as a quiet week.
3. How is the category taxonomy maintained? A direct source can describe an active mapping process and a review cadence. An aggregator-of-aggregators will typically describe a taxonomy frozen at the time the upstream feed was integrated, with no mechanism for updating it.
4. Who owns the company and who are its largest customers? Ownership and customer concentration shape every other decision the product makes, and an API operator who cannot answer the question cleanly is not a counterparty to build production infrastructure on.
The case for boring infrastructure
Crime data is a quiet, unglamorous category of infrastructure data. It is most useful when it is correct, complete within its declared coverage, fresh on its declared cadence, and unambiguous in what it represents. The features that make a crime data API useful are not the dashboard, the marketing copy, or the integrations — they are the underlying decisions about where the data comes from, who approved it, and who owns the operator. SpotCrime is built on the assumption that those underlying decisions are the product. The rest is packaging.
Access Address-Level Crime Data
Direct from the departments. Independent of every platform. 22,000+ US cities, normalized and verified — because the pipeline matters as much as the data.