The suspension of US Census Bureau field operations due to the COVID-19 pandemic has created vast disparities in census counts between rural and non-rural areas.
The majority of areas in the United States receive mailed invitations to fill out the census online, by phone, or by mail - a process that requires no visit from a census worker (unless a household fails to respond and requires a Non-Response Follow-Up). This makes it easy for these "Self-Response" areas to participate in the census relatively normally, even with visits from census workers suspended.
However, areas without reliable mail service rely on initial contact through door-to-door visits from census workers, who either enumerate households themselves (referred to as "Update Enumerate (UE)" or "Remote Alaska (RA)"), or leave a packet with a form/invitation for the households to respond with themselves (also known as "Update Leave (UL)").
Until this visit happens, households in these areas will not receive a direct invitation to participate in the census.
However, households in these rural areas can often still fill out the census online, even before they receive an invitation. Even in these areas, online participation is still usually possible - the lack of an invitation to participate is the issue we are analyzing here.
A detailed map of these areas can be found here.
Inevitably, the areas relying on the three kinds of in-person initial contact rather than mail invitations (and the rural states heavily stocked with them) have fallen far behind in census enumeration due to the pandemic - and there is no telling how much (or how little) these disparities will be assuaged when in-person enumeration begins.
Should these disparities linger even after in-person operations begin, it could lead to very low self-response rates, which will hurt these areas' abilities to be properly counted.
As in-person operations begin again, these rural areas will face added pressure to actively participate in the census, given how far behind they have fallen (by no fault of their own). This could have a lasting impact on these areas' collective ability to receive federal/state funding and resources, congressional apportionment, and general representation.
This analysis can help any journalist or researcher display just how wide this gap has become, and how much more important proactive census participation has become for these areas.
We will now examine how wide these disparities are by comparing the response rates for tracts that rely heavily on in-person census operations with response rates for tracts that have largely received invitations by mail.
First we will import the data, and then we will visualize the trends.
# import packages and establish settings
import pandas as pd
import numpy as np
import requests
import json
import matplotlib.pyplot as plt
import warnings
from matplotlib.ticker import MaxNLocator
from IPython.display import HTML
import statsmodels.api as sm
from patsy import dmatrix
import math as m
%matplotlib inline
warnings.filterwarnings('ignore')
# set API key
key = '2988f01f5e86175bda8beae2b5035e1ccef2d052'
# tested shortcut for obtaining state FIPS codes
url = f"https://api.census.gov/data/2010/dec/responserate?get=GEO_ID,FSRR2010&key={key}&for=state:*"
JSONContent = requests.get(url).json()
states = pd.DataFrame(JSONContent)
states = states.iloc[1:,2]
states = [int(i) for i in states if i !='72']
states = sorted(states)
# data frame to hold tract responses
tract_responses = pd.DataFrame(columns=['GEO_ID','CRRALL'])
# pull tract response data for 2020
for i in states:
if i < 10:
url = f"https://api.census.gov/data/2020/dec/responserate?get=GEO_ID,CRRALL&key={key}&for=tract:*&in=state:0"\
+ str(i)
else:
url = f"https://api.census.gov/data/2020/dec/responserate?get=GEO_ID,CRRALL&key={key}&for=tract:*&in=state:"\
+ str(i)
try:
JSONContent = requests.get(url).json()
temp = pd.DataFrame(JSONContent)
temp.columns = temp.iloc[0]
temp = temp.iloc[1:,:]
tract_responses = pd.concat([tract_responses,temp],sort=True)
except json.JSONDecodeError:
pass
# set index and column title for 2020 response rates
tract_responses['CRRALL'] = tract_responses['CRRALL'].astype('float')
tract_responses.index = tract_responses.GEO_ID.str.replace('1400000US','')
tract_responses = tract_responses.drop(columns = 'GEO_ID')
tract_responses.columns = ['response','county','state','tract']
# pull type of enumeration data
tea = pd.read_excel('https://www2.census.gov/geo/maps/DC2020/TEA/TEA_PCT_Housing_Tract.xlsx')
tea.index = tea.TRACT_GEOID
tea.index.name = 'GEO_ID'
tea.index = tea.index.astype('str').str.zfill(11)
tea['inperson'] = tea.PCT_HU_TEA2 + tea.PCT_HU_TEA4 + tea.PCT_HU_TEA6
tea = tea.drop(columns=['TRACT_GEOID','PCT_HU_TEA2','PCT_HU_TEA3','PCT_HU_TEA4','PCT_HU_TEA6'])
tea.columns=['mail','inperson']
# remove type of enumeration entries with no data
tea = tea[np.sum(tea,axis=1) > 99.5]
# count percentage of tracts in our response dataset for which we have enumeration data
temp = pd.merge(tract_responses,tea,'left','GEO_ID')
print(np.round(100 * np.sum(pd.notnull(temp.mail)) / temp.shape[0],1),'% of tracts with response rate data \
also have enumeration strategy logged and will be a part of this analysis',sep='')
# create dataset and sort by inperson
data = pd.merge(tract_responses,tea,'inner','GEO_ID')
data = data.sort_values('inperson')
With the data imported, we see that approximately three-quarters of the tracts with response rate data also have enumeration strategies logged. Thus, these will be the tracts we use for our analysis (and about a quarter will not be included).
We start by creating a scatterplot that shows how the percentage of a tract relying on in-person census operations for initial contact relates to response rates, and fitting cubic splines.
Clearly, tracts that rely more heavily on in-person enumeration tend to have vastly lower response rates than areas that mostly received mail invitations.
# make scatterplot of data and add curves
plt.figure(figsize=(15,8))
uniq = np.linspace(0,100,1001)
new_x = dmatrix("bs(train, df=df, degree=3, include_intercept=True)", {"train": data.inperson,\
"df":np.max([m.ceil(1/(1-np.mean(data.inperson<1))),4])},return_type='dataframe')
model = sm.GLM(data.response, new_x)
results = model.fit()
plt.plot(data.inperson,results.predict(new_x),c='orange',linewidth=5)
plt.scatter(data.inperson,data.response,alpha=0.7,label='Areas Relying Heavily On In-Person Operations')
plt.title('Census Response Rates Based On Enumeration Strategies',size=20)
plt.xlim(0,100)
plt.ylim(0,100)
plt.xlabel('Amount of Tract Relying On In-Person Census Operations (%)',size=15)
plt.ylabel('Response Rate For Tract (%)',size=15)
plt.show()
Next, we will compare the response rates for two types of tracts:
# make histograms of data
all_mail = data[data.inperson == 0]
all_inperson = data[data.mail == 0]
fig,ax = plt.subplots(2,1,figsize=(15,9),sharex=True)
plt.xlabel('Response Rate (%)',size=16)
plt.subplots_adjust(top=0.87)
plt.suptitle('Census Response Rate Distribution Based On Enumeration Strategy',size=25)
mai = ax[0]
inp = ax[1]
mai.yaxis.set_major_locator(MaxNLocator(integer=True))
inp.yaxis.set_major_locator(MaxNLocator(integer=True))
mai.set_ylabel('Number of Tracts',size=12)
inp.set_ylabel('Number of Tracts',size=12)
mai.hist(all_mail.response,bins='auto')
inp.hist(all_inperson.response,bins='auto')
mai.set_title('Tracts Not Relying On In-Person Census Operations For Initial Contact',size=18)
inp.set_title('Tracts Relying Exclusively On In-Person Census Operations For Initial Contact',size=18)
plt.show()
As we compare these two types of areas, we can very clearly see how massive the response rate disparities between them are.
Surely, these will be assuaged to some degree once in-person operations ramp up again - but there is no telling how much (or how little) this gap will thin.
We will now conduct this same analysis for each of the fifty states, plus the District of Columbia.
Some of these states rely heavily on in-person enumeration (and thus will provide robust data), while some barely depend on it. The former category of state is more interesting to us, but we will show all of the states in alphabetical order for the sake of being thorough.
For each state, we will show these same graphs, and provide a list of any tracts in the state that rely entirely on in-person enumeration. I recommend CUNY's Hard to Count map for visualizing and analyzing individual tracts.
Keep in mind - we can easily do this analysis for a smaller region within a state or a group of states almost instantly - email me any time at benjamin.livingston@columbia.edu if you'd like to see these results for your region, too.
NewsCounts provides countless data & research resources that make telling these stories easy - just drop us a line and we'll be happy to help.
Note: be sure not to miss our findings and additional information at the end of this guide.
# get names of areas to aid lookup in helper function below
names = pd.read_excel('https://www2.census.gov/programs-surveys/popest/geographies/2017/state-geocodes-v2017.xlsx',\
skiprows=range(5),index_col=2).iloc[:,2]
names.columns=['name']
# add hyperlinks to each state
html = '<h2>Shortcut to state:</h2><p>'
links = sorted(names[names.index!=0])
for link in links:
html += "<a href='#" + link + "'>" + link + "<br/></a>"
html += "</p><h3><a href='#Conclusion'>Skip to Conclusion / Additional Information" + "</a></h3>"
display(HTML(html))
# helper function to plot individual areas
def show_area(fips):
# convert to string
area = str(fips)
if len(area) == 1:
area = area.zfill(2)
# get name of area
name = names[fips]
# show name of area and add anchor
display(HTML("<center><h1 id='" + name + "'>" + name +"</h1></center>"))
# trim dataset to include only area in question and sort
new = data[data.index.str.startswith(area)]
new = new.sort_values('inperson')
# if none of area relies on in-person enumeration, note and return
if sum(new.inperson) == 0:
display(HTML('<h4>' + name + ' does not rely on in-person enumeration</h4>'))
return
# generate separate datasets for areas with all in-person and no in-person enumeration
all_mail = new[new.inperson == 0]
all_inperson = new[new.mail == 0]
# generate list of tracts that rely completely on in-person enumeration
tracts = sorted(all_inperson.index)
# make scatterplot of data, and add curves
plt.figure(figsize=(15,8))
uniq = np.linspace(0,100,1001)
new_x = dmatrix("bs(train, df=df, degree=3, include_intercept=True)", {"train": new.inperson,\
"df":np.max([m.ceil(1/(1-np.mean(new.inperson<1))),4])},return_type='dataframe')
model = sm.GLM(new.response, new_x)
results = model.fit()
plt.plot(new.inperson,results.predict(new_x),c='orange',linewidth=5)
plt.scatter(new.inperson,new.response,alpha=0.7,label='Areas Relying Heavily On In-Person Operations')
plt.title('Census Response Rates Based On Enumeration Strategies In ' + name,size=20)
plt.xlim(0,100)
plt.ylim(0,100)
plt.xlabel('Amount of Tract Relying On In-Person Census Operations (%)',size=15)
plt.ylabel('Response Rate For Tract (%)',size=15)
plt.show()
if len(tracts) != 0:
# make histogram of data
fig,ax = plt.subplots(2,1,figsize=(15,9),sharex=True)
plt.xlabel('Response Rate (%)',size=16)
plt.subplots_adjust(top=0.87)
plt.suptitle('Census Response Rate Distribution Based On Enumeration Strategy In ' + name,size=20)
mai = ax[0]
inp = ax[1]
mai.yaxis.set_major_locator(MaxNLocator(integer=True))
inp.yaxis.set_major_locator(MaxNLocator(integer=True))
mai.set_ylabel('Number of Tracts',size=12)
inp.set_ylabel('Number of Tracts',size=12)
mai.hist(all_mail.response,bins='auto')
inp.hist(all_inperson.response,bins='auto')
mai.set_title('Tracts Not Relying On In-Person Census Operations For Initial Contact',size=18)
inp.set_title('Tracts Relying Exclusively On In-Person Census Operations For Initial Contact',size=18)
plt.show()
# print total number of tracts examined
display(HTML('<h4>Total number of tracts examined in ' + name + ': ' + str(new.shape[0]) + '</h4>'))
# print FIPS codes of tracts that rely on in-person operations
if len(tracts) == 0:
display(HTML('<h4>No tracts in ' + name + ' rely entirely on in-person census operations for initial contact</h4>'))
else:
display(HTML('<h4>Tracts in ' + name + \
' that are known to rely entirely on in-person census operations for initial contact:</h4>'))
for tract in tracts:
print(tract)
# add links and space
display(HTML("<h4><a href='#State-Case-Studies'> Back to top</a>"+"<br/>" + \
"<a href='#Conclusion'>Skip to Conclusion / Additional Information" + "</a></h4>"))
print('\n')
# iterate through all states to display data
states_iter = names[names.index != 0].sort_values()
for place in states_iter.index:
show_area(place)
Rural areas relying on in-person census operations for initial contact have generally had very low response rates. This is true both nationally and for virtually every state where these areas exist en masse.
The census is a major driver for federal/state funding decisions, resource allocation, congressional apportionment, and general representation over the next ten years. If these rural areas (and the states that contain many of them) do not catch up when in-person operations begin, they could be vastly underrepresented.
The stakes have become very large for these areas, and this analysis can help tell that story.
As mentioned earlier, households in rural areas can typically still fill out the census online even before they receive invitation. In most cases, you do not need to wait for an invitation to respond to the census.
NewsCounts can help you run this analysis for your region, whether it is a smaller region within a state or a collection of states. It is very easy for us to pop out these same numbers for just about any level of US geography with a FIPS code, as long as it has a reasonable mass of tracts relying on in-person enumeration.
We can do it almost instantaneously - just ask!
Feel free to email me at benjamin.livingston@columbia.edu any time if you'd like us to do this for your area, or if you have any questions.
The Census Bureau is tracking 2020 response rates and provides a wonderful map with up-to-date data. NewsCounts also provides a beta dashboard and API that allows you to grab the daily response data for yourself.
We have also conducted a couple other analyses that you may find useful for local census reporting:
Please don't hesistate to reach out with any census reporting-related questions. We recognize that 2020 is a challenging time for journalists, and we're here to make covering this pivotal census easier for you.