How to access Pittsburgh arrest data using Python - Beginner-friendly

Arrest data can be used to help us understand how police are focusing their attention and resources. This is a short explainer on how to access arrest for the city of Pittsburgh.
Find the data you want to access
https://data.wprdc.org/datastore/dump/e03a89dd-134a-4ee8-a2bd-62c40aeebc6f
then your resource ID is
e03a89dd-134a-4ee8-a2bd-62c40aeebc6f
Build your request URL
WPRDC uses DataStore to house its data. You can ask WPRDC to send you data by requesting it in a URL format the database understands, an API. To make a request, we’ll use the datastore_search call. Here’s the base of the URL:
Now we need to tell the database which file we’re looking for. Add your resource id:
Since we’re just getting started still, let’s add a limit on the amount of data WPRDC will send us at once. That way we don’t accidentally request huge data files, wasting resources for both us and WPRDC. I’ll start with a limit of 5 rows of data. Once we know our program is working, we can come back later and remove the limit.
There’s lots of other helpful tricks you can use when building your URL. For example, you can add filters to reduce the amount of irrelevant data you receive. The datastore_search documentation explains more. For now, just hold on to that URL.
Install Python and a few libraries
pip install requests
pip install pandas
pip install notebook
jupyter notebook
Request your data using Python
Open a new Python file or notebook, and import your libraries:
import requests
import pandas as pd
Paste in the request URL we built earlier:
Ask (request) WPRDC to send you that data, then turn it into a readable format (JSON).
resp = requests.get(url).json()
The response contains a bunch of extra metadata that we don’t need right now. So let’s grab the meaty part of the response (“result”) and pull out the actual data (“records”). We’ll use pandas to turn that into a nice spreadsheet table (a “DataFrame”).
data = pd.DataFrame(resp[‘result’][‘records’])
data (if you’re using a notebook)
display(data.to_string()) (if you’re not using a notebook)
Now you have arrest data that you can start to investigate.
A note on API politeness
Every time you run requests.get(url), you’re connecting to the web, asking WPRDC to send you data, and downloading the response. When you’re using an API, try to be conscientious about the frequency and size of the requests you’re making. Many databases will enforce a limit on the number of requests you make in a day, and some will even ban your IP address if they think you’re trying to abuse their servers with tons of spammy requests. I didn’t find any documentation about the API limits for WPRDC, but it’s still best-practice to be intentional about how you design your code to only request new data when you actually need it.
This tutorial was originally written in August 2023.