What Crime Data APIs Suppress, and Why: A Practical Guide to Privacy, CJIS, and the Limits of Public Records

Crime data is, in most US jurisdictions, public record. That does not mean every field in a police report is publishable. The gap between “public record” and “responsibly publishable” is where crime data APIs do most of their work — and the rules vary more than most developers realize.

The legal framework, in three layers

There are three overlapping legal regimes that constrain what a crime data API can publish, and they are commonly conflated in product documentation:

Federal CJIS Security Policy (FBI). Governs how Criminal Justice Information (CJI) is stored, transmitted, and accessed by agencies that connect to FBI systems. CJIS controls upstream access to NCIC, NICS, III, and similar federal systems. It does not directly govern downstream republication of incident data that an agency has chosen to release.
State public records laws. This is where most of the operational decisions are made. Every state has a public records statute, and almost every state exempts at least some categories of police records from disclosure. The same incident type may be public in one state and non-disclosable in another.
Subject-matter privacy statutes. Domestic violence reports, juvenile records, sexual assault cases, and certain medical-related calls are protected by statutes that override the general public records framework. California Penal Code 293, for example, protects sexual assault victim identities; most states have similar provisions for juveniles.

A crime data API has to model all three layers simultaneously. The API does not have the option to publish data that any one of the three would prohibit, and in practice the binding constraint is usually the strictest of the three for a given record type.

What “address-level” actually means

A common point of confusion: “address-level” crime data does not mean the API publishes the victim's address. It means the API publishes the incident location at the precision the agency chose to release, which is almost always one of:

Hundred-block precision (e.g., “1200 block of Main Street”) — the most common standard for residential incidents in modern computer-aided dispatch (CAD) feeds.
Intersection-level (e.g., “Main St & 12th Ave”) — common for street incidents, traffic stops, and disturbances on public ways.
Exact address — typically only used for businesses, public buildings, and incidents on commercial property where the location is itself public.
Suppressed or generalized — for sensitive categories, the location may be reduced to a neighborhood or police beat, or omitted entirely.

The hundred-block convention is not arbitrary. It is a long-standing journalism and CAD convention designed to identify the area without identifying the dwelling. A house number ending in 47 on a block has roughly one-in-ten odds of being any given home on that block; that is informative for risk analysis, but generally inadequate to single out a victim.

What a responsibly designed API suppresses

Across published US municipal open-data portals (Chicago, Seattle, Los Angeles, Denver, Detroit, San Francisco, and others), the suppression patterns converge on a small set of rules:

Domestic violence is published as “Disturbance” or “Family Disturbance” with no relationship detail.
Sexual assault is either suppressed entirely, delayed, or generalized to a larger geographic unit.
Juvenile suspect or victim names and addresses are stripped.
Officer-involved incidents are sometimes published with delays pending investigation.
Mental health calls are generalized, often as “Welfare Check” or “Disturbance.”
Witness names and phone numbers are not included in public extracts.

A crime data API that aggregates these feeds inherits the upstream suppression — it cannot publish what was not released. But it can compound the suppression in ways the original agency did not, and a well-run API does. The most common forms of secondary suppression are:

Rounding the incident timestamp to the nearest hour for residential incidents.
Snapping geocoded points to the centroid of the hundred-block segment rather than passing through raw lat/long values.
Holding back incident publication for 24–72 hours on certain categories to avoid revealing in-progress investigations.
Withholding reporting party identifiers even when the source agency includes them.

The re-identification problem

Even with hundred-block addresses and generalized categories, a sufficiently determined attacker can sometimes re-identify an incident. The classic vector is cross-referencing the API record against:

Local news coverage of the same incident, often with neighborhood-level location.
Social media posts (“there was a robbery on my block last night”).
Property records (“the only commercial building in that block is...”).
Court records — charged defendants and public dockets.

This is a version of the re-identification problem that has surfaced in every “anonymized” public dataset since the Netflix Prize. An API cannot eliminate the risk through its own suppression alone; it can only raise the cost of re-identification high enough that casual attackers are deterred.

For developers building on crime APIs, this has a specific implication: do not display incident records alongside other location-correlated data — real estate listings, owner names, demographic overlays — in ways that lower the re-identification cost. A safety score that aggregates incidents to a hex grid is meaningfully different, in privacy terms, from a map pin sitting on top of a single house photograph.

CJIS, what it actually requires

Because “CJIS-compliant” gets used loosely in marketing copy, it is worth being specific. The CJIS Security Policy (current published version 5.9.5, last major revision December 2023) governs entities that handle CJI — fingerprint records, criminal history records, NCIC hot-file data, and similar federal-source information. Its core requirements include:

Multi-factor authentication for any user accessing CJI.
Encryption in transit (TLS 1.2 or higher) and at rest, using FIPS 140-2 validated modules.
Access controls with documented least-privilege provisioning.
Audit logging of all CJI access events.
Personnel screening, including fingerprint-based background checks, for anyone with access.
Physical security controls for facilities housing CJI.

For a public-facing crime data API that publishes incident summaries from already-released agency feeds, CJIS does not directly apply — there is no CJI flowing through the system. Where CJIS does apply is when an API ingests data from an agency's records management system under a data-sharing agreement that grants pre-release access, or when the API serves a customer who is themselves a CJIS-regulated entity (a police department, a court, a federally-regulated firearms dealer). In those cases, the entire pipeline has to meet CJIS requirements end-to-end.

The practical answer for most developers: if you are consuming public incident data through a third-party API to display on a consumer-facing product (a real estate site, a family safety app), CJIS is generally not your obligation. If you are building a tool used by law enforcement agencies to access criminal history information, it is.

Where the rules diverge: state-by-state

A small selection of state divergences that affect what a national crime data API can publish:

State	Treatment of basic incident data	Notable carve-outs
California	Public via individual agency portals; no statewide CAD feed	Penal Code 293 protects sexual assault victims; SB 1421 added narrow officer misconduct disclosures
Texas	Generally open via Public Information Act	Active investigation exemption interpreted broadly by some agencies
New York	FOIL governs; many agencies require formal request	Civil Rights Law 50-a was repealed in 2020, opening prior misconduct records
Illinois	Open under state FOIA; Chicago publishes a comprehensive open data feed	Juvenile records sealed under 705 ILCS 405
Florida	“Sunshine” state — strong default toward openness	Marsy's Law (2018) restricts victim identifying information

Florida's Marsy's Law is worth flagging because it has produced unusual outcomes. The amendment, which establishes victim rights including privacy, has been interpreted in some Florida jurisdictions to redact officer names when the officer is the victim of a crime in the line of duty. That interpretation is contested, but it materially affects what shows up in data feeds from those jurisdictions, and it is the kind of jurisdictional variance an API has to model rather than paper over.

Aggregate vs. incident-level: a recurring debate

There is a recurring debate in the crime data space about whether platforms should publish individual incident records at all, or only aggregate counts. The case for incident-level records is that aggregation is reductive — a single shooting on a block tells you something different than a property crime, and combining them into a generic “crime score” loses critical signal. The case for aggregate-only is that incident records, even with hundred-block precision, carry residual re-identification risk that aggregates do not.

The current consensus in the field, reflected in the practices of the major municipal portals, is that incident-level records are appropriate when the suppression rules described above are followed, and that aggregates are appropriate when the consumer's use case does not require record-level detail. Most production APIs offer both, with aggregates as the default exposure and incident records gated behind authentication and use-case review.

A checklist for developers

A short list for anyone building on a crime data API:

Confirm what the API suppresses upstream. Ask for documentation of the categories that are filtered, generalized, or delayed. If the API cannot answer this, that itself is a signal.
Apply secondary suppression at your display layer. Even if the API returns a precise lat/long, snap it to a block centroid before placing it on a public map. Even if the API returns a precise timestamp, round it to a one-hour window for residential incidents.
Do not display incidents alongside identifying overlays. A pin showing a “domestic disturbance” sitting on top of a single home photograph or a named owner record is a re-identification waiting to happen, even if the API itself behaved correctly.
Honor takedown requests. Establish a documented process for receiving and acting on requests to remove specific incidents — usually from victims, their attorneys, or victim advocate organizations. This is not a CJIS or statutory requirement in most jurisdictions, but it is a baseline professional expectation.
Audit your aggregation choices. If you are displaying a neighborhood safety score, document how the underlying incidents are weighted, what the geographic unit is, and what time window is used. Opaque scoring is a liability — reputationally and, in some jurisdictions, legally.

Where this goes next

Two trends are worth watching. First, AI-driven re-identification is getting cheaper. Tools that cross-reference incident feeds with social media posts, news articles, and public records can re-identify victims at a cost that was prohibitive five years ago and may be trivial in five more. The suppression rules that deter casual attackers today will work less reliably against automated ones, and the half-life of any given anonymization scheme is shrinking. Adjacent work on AI calibration is relevant here: a recent analysis of LLM classifier confidence on NEISS injury data found that token-probability confidence scores are systematically overconfident relative to actual accuracy — the same pattern likely holds for AI re-identification tools, which means the harm distribution will be skewed by overconfident automated guesses, not just correct ones.

The Mount Vernon, Missouri incident on April 13, 2026, in which the CrimeRadar AI alert system misinterpreted a routine police radio transmission as an active shooter and triggered a false elementary school lockdown, illustrates a related point about AI in this space. As the post-incident analysis put it: “Systems like this are built to be fast. Safety requires being right. Those two things are not the same.” The same dynamic applies to re-identification: an AI that is fast at guessing victim identity from cross-referenced data does not have to be right to cause harm.

Second, the policy environment is moving in opposite directions in different jurisdictions. Several states have tightened victim privacy protections in the wake of Marsy's Law (Florida, Wisconsin, Ohio). Others have opened officer disciplinary records (California, New York, Illinois). The net effect for a national crime data API is that the same incident type may be published with different fields in different jurisdictions — and the API has to model that variance rather than impose a single national standard.

For developers and crime analysts who want to go deeper on the practical mechanics of working with this data, Andrew Wheeler's walkthrough of crime data science workflows is a useful reference for how the suppression rules above interact with day-to-day pandas/SQL pipelines.

The two reasons

Crime data is public for good reasons: accountability, situational awareness, research, and the ability of communities to understand what is happening around them. The reasons for suppression are also good: victim safety, investigation integrity, due process for the accused. The work of a responsibly designed crime data API is to honor both. A useful test for any specific publication decision is whether it would survive scrutiny from both a transparency advocate and a victim advocate — not because they will always agree, but because publication decisions that fail one of those tests usually deserve a second look.