NYC Restaurant Inspections, Creepy Crawly Edition#

By Sara-Jean Redfield

There are a wide variety of ways to ruin a restaurant, creatures are only a small slice of the unsanitary pie.

Using restaurant inspection data from the New York City Department of Health, I examined the change in violations involving unwelcome live creatures over seasons. I filtered for violation codes citing evidence of rats, mice, flies, and roaches, and I sorted these results by winter, spring, summer, and fall.

I wanted to know if are there more rodent/fly/roach violations at different times of year? I hypothesized that violations for all creatures, especially flies, will be worse in the summer compared to winter.

As you will see, I am only right on the technicality that when summer is compared to winter, there are more flies in summer. Otherwise, I was wrong about the worst season, and I am wrong about the most common creature (it’s mice). My results may indicate that certain creatures follow seasonal trends, but those trends are different from what I had anticipated.

About the Data:#

This dataset contains open and ongoing health inspection violations for New York City Restaurants. This means all these restaurants are still open, and any time a violation is resolved or successfully adjudicated, it is removed. This is not a comprehensive list of infractions. I primarily used ‘VIOLATION CODE’ and ‘INSPECTION DATE’ to filter and organize the data. From the NYC Health Code, I identified the four inspection codes listed below to focus on. There are other codes that indicate a space may be open to creatures, but I filtered for violations that indicated the presence of creatures.

04K: Evidence of rats or live rats in establishment’s food or non-food areas.

04L: Evidence of mice or live mice in establishment’s food or non-food areas.

04M: Live roaches in establishment’s food or non-food areas.

04N: Filth flies or food/refuse/sewage associated (FRSA) flies or other nuisance pests in establishment’s food and/or non-food areas. FRSA flies include house flies, blow flies, bottle flies, flesh flies, drain flies, Phorid flies and fruit flies.

import pandas as pd
import as px
# boilerplate for allowing PDF export
import as pio
pio.renderers.default = "notebook_connected+pdf"
inspection_data = pd.read_csv("DOHMH_New_York_City_Restaurant_Inspection_Results.csv")

Data Cleanup: The Extermination#

Before filtering for creatures, I created some new columns on the dataset to assist in ease of use. I turned the inspection date into a DateTime unit and added a month column to organize violations by season. I created the month column in anticipation of using it tables, but I did not use it in that way.

To focus on creatures only, I filtered for the violation codes of interest concerning evidence of rats, mice, roaches, and flies. I then created a function to rename the violation codes to their corresponding creatures.

Immediately, I was surprised to see mice, not flies, have the highest violation count at 20,742 open cases since 2016.

inspection_data['INSPECTION DATE'] = pd.to_datetime(inspection_data['INSPECTION DATE'], format='%m/%d/%Y')
inspection_data['Month'] = inspection_data['INSPECTION DATE'].dt.month_name()
violation_list = ['04K','04L','04M','04N']
creature_data = inspection_data[inspection_data['VIOLATION CODE'].isin(violation_list)]
def rename_creatures(row):
    if (('04K') in row['VIOLATION CODE']):
        return 'Rats'
    elif (('04L') in row['VIOLATION CODE']):
        return 'Mice'
    elif (('04M') in row['VIOLATION CODE']):
        return 'Roaches'
    elif (('04N') in row['VIOLATION CODE']):
        return 'Flies'
creature_data['Creature'] = creature_data.sort_values('INSPECTION DATE', ascending=False).apply(rename_creatures, axis =1)
/var/folders/z9/98yvk3y51gsg5b12ck5chclm0000gn/T/ipykernel_10944/ SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation:
creature_count = creature_data.groupby('Creature').size().to_frame(name='Creature Count')
Creature Count
Flies 14165
Mice 20724
Rats 2244
Roaches 6158

I added a color map to keep creature colors consistent throughout all visual aids.

color_discrete_map = {'Mice': 'rgb(144, 12, 63)', 'Flies': 'rgb(255, 195, 0)','Roaches': 'rgb(255, 87, 51)', 'Rats': 'rgb(199, 0, 57)'}
fig = px.histogram(creature_data,
                   color = 'Creature',
                   title='Violations per Creature')

I used inpection dates to organize a scatterplot of violations across all years. Because this file only contains unresolved and unadjudicated health code violations, there is nothing to glean from the rise and fall of violations over time other than the obvious lack of inspections from early 2020 to mid 2021 due to the COVID-19 pandemic. While the results by year may not be useful, aggregating months across all years may still provide information about seasonal changes in violations.

scatter_data = creature_data.groupby('Creature').resample('M', on='INSPECTION DATE').size().reset_index(name='Violation Count')

fig = px.scatter(scatter_data,
                 x='INSPECTION DATE',
                 y='Violation Count',
                 trendline= 'ols',
                 title= 'Creatures Over Time: 2016-2022')

Method: Seasons Greetings!#

For my research question, I asked if there are more creature violations at different times of year? To explore this, I created seasons based on months, not based on actual seasons. In New York, December usually feels colder than March. While December is only 1/3 winter, I have determined it to be winter for the sake of trend forecasting. Instead of using inspection dates directly, I filtered seasons by the Month columnn I created.

Seasons Winter: December, January, February

Spring: March, April, May

Summer: June, July, August

Fall: September, October, November

From these seasonal dataframes, I counted instances of each creature per season and total creatures per season.

creature_data['Month'] = creature_data['INSPECTION DATE'].dt.month_name()

winter_list = ('December', 'January', 'February')
winter_data = creature_data[creature_data['Month'].isin(winter_list)]

spring_list = ('March', 'April', 'May')
spring_data = creature_data[creature_data['Month'].isin(spring_list)]

summer_list = ('June', 'July', 'August')
summer_data = creature_data[creature_data['Month'].isin(summer_list)]

fall_list = ('September', 'October', 'November')
fall_data = creature_data[creature_data['Month'].isin(fall_list)]
/var/folders/z9/98yvk3y51gsg5b12ck5chclm0000gn/T/ipykernel_10944/ SettingWithCopyWarning:

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation:

Winter has the highest mice count and rat counts.

winter_count = winter_data.groupby('Creature').size().to_frame(name='Winter Count').sort_values('Winter Count', ascending=False)
Winter Count
Mice 6265
Flies 2522
Roaches 1648
Rats 735
Winter Count    11170
dtype: int64
 spring_count = spring_data.groupby('Creature').size().to_frame(name='Spring Count').sort_values('Spring Count', ascending=False)
Spring Count
Mice 5858
Flies 1546
Roaches 1208
Rats 649
Spring Count    9261
dtype: int64
summer_count = summer_data.groupby('Creature').size().to_frame(name='Summer Count').sort_values('Summer Count', ascending=False)
Summer Count
Flies 4753
Mice 4068
Roaches 1552
Rats 443
Summer Count    10816
dtype: int64

Fall has the highest total violation count and the highest fly violation count.

fall_count = fall_data.groupby('Creature').size().to_frame(name='Fall Count').sort_values('Fall Count', ascending=False)
Fall Count
Flies 5344
Mice 4533
Roaches 1750
Rats 417
Fall Count    12044
dtype: int64

From the seasonal counts, I noticed some trends among creatures. Mice and rats (especially mice) receive more violations in the winter and spring and insects (especially flies) receive more violations in the summer and fall. Following these trends, I consolidated the seasonal dataframes into a mice season and a fly season to compare creature seasonality.

mice_list = ('December', 'January', 'February','March', 'April', 'May')
mice_data = creature_data[creature_data['Month'].isin(mice_list)]
mice_count = mice_data.groupby('Creature').size().to_frame(name='Mice Season').sort_values('Mice Season', ascending=False)

fly_list = ('June', 'July', 'August', 'September', 'October', 'November')
fly_data = creature_data[creature_data['Month'].isin(fly_list)]
fly_count = fly_data.groupby('Creature').size().to_frame(name='Fly Season').sort_values('Fly Season', ascending=False)
fig = px.histogram(mice_data,
                   color= 'Creature',
                   title='Creature Violations: Mice Season, December - May')
fig = px.histogram(fly_data,
                   color= 'Creature',
                   title='Creature Violations: Fly Season, June - November')
creature_split = pd.merge(left=mice_count, right=fly_count, left_on='Creature', right_on='Creature')
creature_split['pct_change'] = (((creature_split['Mice Season']-creature_split['Fly Season'])/creature_split['Mice Season'])*100)
Mice Season Fly Season pct_change
Mice 12123 8601 29.052215
Flies 4068 10097 -148.205506
Roaches 2856 3302 -15.616246
Rats 1384 860 37.861272

Results: Mouse is King#

As it turns out, mice are a problem all year, albeit slightly worse in colder months. The percent change indicates mice experience the second lowest percent change from season to season. They are likely seeking shelter and quieter businesses means they don’t get caught as quickly. Rats and roaches are likely the victims of the strongest extermination tactics as they are grossest and sometimes most harmful.

This brings us to the true seasonal creature: the fly. The annual fall fly trend made sense once I saw it. This is the time of year when flies swarm our windows seeking warm air. Maybe they seek warm air all year, but people often open their windows in the fall. Luckily they die in winter only to plague us again the following year. All other creatures are winter resistant.

Why Fall?#

According to The New York Times circa 2011, flies in fall are a known problem, and Americans go through harrowing trials and tribulations to conquer flies each year.

Hypothesis Revisited#

Per my hypothesis, I compared winter and summer directly. I thought summer would have more creatures, because the city comes alive in the summer. Everyone is out and about, and I thought that applied to creatures too. Flies were the only creature to make a stronger appearance in summer, but summer is not the worst season for flies! I was wrong on all accounts.

While I was wrong, I believe it is worth noting that flies may be easier to overlook and less tempting cite unless their presence is especially heinous. Mice, roaches, and rats are all likely much less tolerated in New York City restaurants.

solstice_split= pd.merge(left=winter_count, right=summer_count, left_on='Creature', right_on='Creature')
solstice_split['pct_change'] = (((solstice_split['Winter Count']-solstice_split['Summer Count'])/solstice_split['Winter Count'])*100)
Winter Count Summer Count pct_change
Mice 6265 4068 35.067837
Flies 2522 4753 -88.461538
Roaches 1648 1552 5.825243
Rats 735 443 39.727891
from IPython.display import HTML
HTML('<img src="">')
#Credit to Reddit user: santiofalltrades. Please copy/paste code to be delighted.