Investigating Elevated Blood Lead Level Rates in Children in NYC

Investigating Elevated Blood Lead Level Rates in Children in NYC#

By Tara Merigan

Introduction#

Dataset:
I am using the Children Under 6 yrs with Elevated Blood Lead Levels Dataset from NYC Open Data. The dataset contains 576 rows: the number of children with elevated blood lead levels (with various amounts - ranging from 5-15µg/dL), the number of children tested and the rate per 1000 tested at each blood lead level (BLL). The data is across locations (borough and neighbourhood) and years.

Questions:

  1. How has the amount of children under 6 with elevated blood lead levels in NYC changed over the years?

    • Does this vary at different concentrations (mcg/dL)?

  2. How does the amount of children with elevated blood lead levels vary between boroughs?

    • Between neighbourhoods?

    • Does this vary at different concentrations?

  3. How does the change in rate of children with elevated blood lead levels over the years vary between neighbourhoods/boroughs?

Hypothesis:
I hypothesise that the number of children with elevated blood lead level rates will fall over the years, as regulation and infrastructure adapts with knowledge about the dangers of lead exposure to children. I predict that there will be areas (both boroughs and neighbourhoods) with more concentrated frequencies of children with elevated lead blood levels and these areas are more likely to have higher amounts of more severe lead levels (10 or 15µg/dL). I estimate that in areas with higher concentrations of elevated blood lead levels the rate of change of the years will more pronounced (decrease) than in areas with low concentration as the severity of these cases call for more urgent government intervention.

Step 1#

I began by importing the necessary packages - pandas and plotly. plotly is a package which creates charts, graphs and other data visualisation tools.
Then I read my dataframe (Children_Elevated_BLL.csv) into the notebook and displayed the beginning lines to see what it looked like and that it had read correctly. I then performed some functions to look at the contents of various columns (namely the columns which contained notes) and to get more information about the dataframe which will be helpful later.

import pandas as pd
import plotly.express as px
bll_df = pd.read_csv('Children_Elevated_BLL.csv')
bll_df.head()
geo_type geo_area_id geo_area_name borough_id time_period Children under 6 years with elevated blood lead levels (BLL) Number BLL >=5 µg/dL Children under 6 years with elevated blood lead levels (BLL) Number BLL >=5 µg/dL _NOTES Children under 6 years with elevated blood lead levels (BLL) Number BLL>=10 µg/dL Children under 6 years with elevated blood lead levels (BLL) Number BLL>=10 µg/dL _NOTES Children under 6 years with elevated blood lead levels (BLL) Number BLL>=15 µg/dL Children under 6 years with elevated blood lead levels (BLL) Number BLL>=15 µg/dL _NOTES Children under 6 years with elevated blood lead levels (BLL) Number Tested Children under 6 years with elevated blood lead levels (BLL) Number Tested _NOTES Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=5 µg/dL per 1,000 tested Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=5 µg/dL per 1,000 tested_NOTES Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=10 µg/dL per 1,000 tested Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=10 µg/dL per 1,000 tested_NOTES Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=15 µg/dL per 1,000 tested Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=15 µg/dL per 1,000 tested_NOTES
0 Borough 1 Bronx 1.0 2005 8245 NaN 595 NaN 167 NaN 64500 NaN 127.7 NaN 9.2 NaN 2.6 NaN
1 Borough 1 Bronx 1.0 2006 7272 NaN 474 NaN 144 NaN 67200 NaN 108.2 NaN 7.1 NaN 2.1 NaN
2 Borough 1 Bronx 1.0 2007 6174 NaN 438 NaN 135 NaN 68300 NaN 90.4 NaN 6.4 NaN 2.0 NaN
3 Borough 1 Bronx 1.0 2008 4254 NaN 292 NaN 105 NaN 69800 NaN 60.9 NaN 4.2 NaN 1.5 NaN
4 Borough 1 Bronx 1.0 2009 2742 NaN 278 NaN 103 NaN 70000 NaN 39.2 NaN 4.0 NaN 1.5 NaN
bll_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 576 entries, 0 to 575
Data columns (total 19 columns):
 #   Column                                                                                                  Non-Null Count  Dtype  
---  ------                                                                                                  --------------  -----  
 0   geo_type                                                                                                576 non-null    object 
 1   geo_area_id                                                                                             576 non-null    int64  
 2   geo_area_name                                                                                           576 non-null    object 
 3   borough_id                                                                                              564 non-null    float64
 4   time_period                                                                                             576 non-null    int64  
 5   Children under 6 years with elevated blood lead levels (BLL) Number BLL >=5 µg/dL                       576 non-null    int64  
 6   Children under 6 years with elevated blood lead levels (BLL) Number BLL >=5 µg/dL _NOTES                9 non-null      object 
 7   Children under 6 years with elevated blood lead levels (BLL) Number BLL>=10 µg/dL                       576 non-null    int64  
 8   Children under 6 years with elevated blood lead levels (BLL) Number BLL>=10 µg/dL _NOTES                139 non-null    object 
 9   Children under 6 years with elevated blood lead levels (BLL) Number BLL>=15 µg/dL                       576 non-null    int64  
 10  Children under 6 years with elevated blood lead levels (BLL) Number BLL>=15 µg/dL _NOTES                295 non-null    object 
 11  Children under 6 years with elevated blood lead levels (BLL) Number Tested                              576 non-null    int64  
 12  Children under 6 years with elevated blood lead levels (BLL) Number Tested _NOTES                       0 non-null      float64
 13  Children under 6 years with elevated blood lead levels (BLL) Rate  BLL>=5 µg/dL per 1,000 tested        576 non-null    float64
 14  Children under 6 years with elevated blood lead levels (BLL) Rate  BLL>=5 µg/dL per 1,000 tested_NOTES  9 non-null      object 
 15  Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=10 µg/dL per 1,000 tested        576 non-null    float64
 16  Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=10 µg/dL per 1,000 tested_NOTES  139 non-null    object 
 17  Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=15 µg/dL per 1,000 tested        576 non-null    float64
 18  Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=15 µg/dL per 1,000 tested_NOTES  295 non-null    object 
dtypes: float64(5), int64(6), object(8)
memory usage: 85.6+ KB
bll_df['Children under 6 years with elevated blood lead levels (BLL) Number BLL>=10 µg/dL _NOTES'].unique()
array([nan,
       '*Estimate is based on small numbers so should be interpreted with caution.'],
      dtype=object)
bll_df['geo_type'].unique()
array(['Borough', 'Neighborhood (UHF 42)', 'Citywide'], dtype=object)

Step 2#

After looking at the contents of the ‘notes’ columns, I chose to remove them from the dataframe as they did not include information that I think is useful for the questions I am investigating. I believe the warning to interpret the small numbers with caution speaks more broadly to regression analysis, or making inferences about the wider population. I then renamed some of the columns to make the output tables more appealing visually and to remove any extraneous information in the column titles.

bll_df.drop(bll_df.columns[[6, 8, 10, 12, 14, 16, 18]], axis = 1, inplace = True)
bll_df.head()
geo_type geo_area_id geo_area_name borough_id time_period Children under 6 years with elevated blood lead levels (BLL) Number BLL >=5 µg/dL Children under 6 years with elevated blood lead levels (BLL) Number BLL>=10 µg/dL Children under 6 years with elevated blood lead levels (BLL) Number BLL>=15 µg/dL Children under 6 years with elevated blood lead levels (BLL) Number Tested Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=5 µg/dL per 1,000 tested Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=10 µg/dL per 1,000 tested Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=15 µg/dL per 1,000 tested
0 Borough 1 Bronx 1.0 2005 8245 595 167 64500 127.7 9.2 2.6
1 Borough 1 Bronx 1.0 2006 7272 474 144 67200 108.2 7.1 2.1
2 Borough 1 Bronx 1.0 2007 6174 438 135 68300 90.4 6.4 2.0
3 Borough 1 Bronx 1.0 2008 4254 292 105 69800 60.9 4.2 1.5
4 Borough 1 Bronx 1.0 2009 2742 278 103 70000 39.2 4.0 1.5
bll_df.rename(columns = {'Children under 6 years with elevated blood lead levels (BLL) Number BLL >=5 µg/dL':
                         'Elevated BLL >=5', 
                         'Children under 6 years with elevated blood lead levels (BLL) Number BLL>=10 µg/dL':
                         'Elevated BLL >=10', 
                         'Children under 6 years with elevated blood lead levels (BLL) Number BLL>=15 µg/dL':
                         'Elevated BLL >=15', 
                         'Children under 6 years with elevated blood lead levels (BLL) Number Tested':
                         'Number Tested', 
                         'Children under 6 years with elevated blood lead levels (BLL) Rate  BLL>=5 µg/dL per 1,000 tested':
                         'Rate BLL>=5 per 1000 tested', 
                         'Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=10 µg/dL per 1,000 tested':
                         'Rate BLL>=10 per 1000 tested', 
                         'Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=15 µg/dL per 1,000 tested':
                         'Rate BLL>=15 per 1000 tested'}, 
              inplace = True)
bll_df.head()
geo_type geo_area_id geo_area_name borough_id time_period Elevated BLL >=5 Elevated BLL >=10 Elevated BLL >=15 Number Tested Rate BLL>=5 per 1000 tested Rate BLL>=10 per 1000 tested Rate BLL>=15 per 1000 tested
0 Borough 1 Bronx 1.0 2005 8245 595 167 64500 127.7 9.2 2.6
1 Borough 1 Bronx 1.0 2006 7272 474 144 67200 108.2 7.1 2.1
2 Borough 1 Bronx 1.0 2007 6174 438 135 68300 90.4 6.4 2.0
3 Borough 1 Bronx 1.0 2008 4254 292 105 69800 60.9 4.2 1.5
4 Borough 1 Bronx 1.0 2009 2742 278 103 70000 39.2 4.0 1.5

Step 3#

I then created a new dataframe which only included the rows where the geographic area name was ‘New York City’ as this provided data from the entire city for each year. I then reshaped this new dataframe using the ‘melt’ function to include the year, rates (µg/dL) and number per 1000 children tested. This allowed me to create a line chart which has different lines for each BLL rate.

citywide = bll_df[bll_df.geo_area_name  == 'New York City']
citywide = citywide.sort_values('time_period')
citywide
geo_type geo_area_id geo_area_name borough_id time_period Elevated BLL >=5 Elevated BLL >=10 Elevated BLL >=15 Number Tested Rate BLL>=5 per 1000 tested Rate BLL>=10 per 1000 tested Rate BLL>=15 per 1000 tested
98 Citywide 1 New York City NaN 2005 37344 3082 1014 310100 120.4 9.9 3.3
345 Citywide 1 New York City NaN 2006 34629 2767 928 313900 110.3 8.8 3.0
327 Citywide 1 New York City NaN 2007 30493 2282 745 318200 95.8 7.2 2.3
65 Citywide 1 New York City NaN 2008 20423 1803 612 328000 62.3 5.5 1.9
212 Citywide 1 New York City NaN 2009 15224 1565 565 331800 45.9 4.7 1.7
204 Citywide 1 New York City NaN 2010 13951 1574 566 340900 40.9 4.6 1.7
90 Citywide 1 New York City NaN 2011 11437 1332 447 342900 33.4 3.9 1.3
259 Citywide 1 New York City NaN 2012 8179 1053 392 328600 24.9 3.2 1.2
313 Citywide 1 New York City NaN 2013 7204 910 325 322900 22.3 2.8 1.0
523 Citywide 1 New York City NaN 2014 6550 959 341 314500 20.8 3.0 1.1
446 Citywide 1 New York City NaN 2015 5371 908 318 311300 17.3 2.9 1.0
319 Citywide 1 New York City NaN 2016 4928 822 300 299000 16.5 2.7 1.0
citywide_conc = citywide.melt(id_vars='time_period', 
                              value_vars=['Rate BLL>=5 per 1000 tested', 
                                          'Rate BLL>=10 per 1000 tested',
                                         'Rate BLL>=15 per 1000 tested'],
                             var_name='Rates', value_name='Number per 1000 Tested')
citywide_conc.sample(5)
time_period Rates Number per 1000 Tested
0 2005 Rate BLL>=5 per 1000 tested 120.4
20 2013 Rate BLL>=10 per 1000 tested 2.8
7 2012 Rate BLL>=5 per 1000 tested 24.9
35 2016 Rate BLL>=15 per 1000 tested 1.0
17 2010 Rate BLL>=10 per 1000 tested 4.6
fig = px.line(citywide_conc, x= "time_period", y="Number per 1000 Tested", 
              title = "Citywide Elevated BLL Rates per 1000 tested",
              color = "Rates",
             labels={'time_period':'Year'})
fig.show()

Citywide Line Chart
The above line chart shows how elevated BLL rates have fallen over the years, consistent with my hypothesis. As there are a relatively few number of children with elevated BLL >=10 or >=15 µg/dL per 1000, I chose to proceed with the majority of my analysis using the BLL >=5 µg/dL rate per 1000 tested. Additionally the CDC reference for lead exposure is 3.5µg/dL and thus using >=5µg/dL is adequate for analysing elevated BLL for children across the city.

Step 4#

I am now looking at the elevated BLL rates for each borough, using the >=5 µg/dL concentration. I began by creating a new dataframe by finding the rows that had ‘Borough’ in the geo_type column. I then produced a line chart which had the rate per 1000 tested for each borough over the years.

boroughs = bll_df[bll_df.geo_type  == 'Borough']
boroughs = boroughs.sort_values('time_period')
boroughs.head()
geo_type geo_area_id geo_area_name borough_id time_period Elevated BLL >=5 Elevated BLL >=10 Elevated BLL >=15 Number Tested Rate BLL>=5 per 1000 tested Rate BLL>=10 per 1000 tested Rate BLL>=15 per 1000 tested
0 Borough 1 Bronx 1.0 2005 8245 595 167 64500 127.7 9.2 2.6
24 Borough 3 Manhattan 3.0 2005 4851 324 85 43900 110.6 7.4 1.9
36 Borough 4 Queens 4.0 2005 8238 750 278 80400 102.5 9.3 3.5
12 Borough 2 Brooklyn 2.0 2005 15015 1301 448 106800 140.6 12.2 4.2
48 Borough 5 Staten Island 5.0 2005 990 112 36 14500 68.2 7.7 2.5
fig = px.line(boroughs, x= "time_period", y="Rate BLL>=5 per 1000 tested", color = 'geo_area_name',
             title = "Elevated BLL Rate per 1000 tested",
             labels={'time_period':'Year', "Rate BLL>=5 per 1000 tested":'Rate BLL>=5 µg/dL per 1000 tested',
                    'geo_area_name':'Borough'})
fig.show()

Borough Line Chart
The above line chart displays the rate of children per 1000 tested who had elevated BLL >=5µg/dL. All five boroughs show a reduction in the rate over time. Staten Island is the only borough to show an increase rate for some years, though it still has a downward trend. Brooklyn consistently has the highest rate across the years of all the boroughs, although over time all five converge to relatively low rates per 1000. In 2005, Manhattan was the median rate across the five boroughs - but the lowest by 2016.

Step 5#

I the pivoted the boroughs dataframe so that the columns each contained years, allowing me to later find the relative change of rate per 1000 children tested in each borough (>=5 µg/dL concentration). I had to convert the year column titles to strings (as opposed to integers) so I could refer to them by their title. I then created a column for the relative change by using the values from the 2005 and 2016 columns for each borough.

boroughs_five = boroughs.pivot(index='geo_area_name', columns='time_period', values='Rate BLL>=5 per 1000 tested')
boroughs_five
time_period 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
geo_area_name
Bronx 127.7 108.2 90.4 60.9 39.2 37.5 28.5 20.9 20.1 18.7 15.7 15.0
Brooklyn 140.6 136.9 120.7 76.6 60.0 52.6 45.8 35.3 30.1 26.8 22.6 22.3
Manhattan 110.6 101.7 88.2 55.4 36.4 27.2 22.2 15.3 15.1 14.0 10.6 8.1
Queens 102.5 92.3 79.6 53.7 40.2 38.2 28.5 20.6 18.2 18.6 15.4 14.3
Staten Island 68.2 62.9 63.5 38.2 33.8 24.3 20.6 16.8 17.7 17.0 11.9 14.8
boroughs_five.info()
<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, Bronx to Staten Island
Data columns (total 12 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   2005    5 non-null      float64
 1   2006    5 non-null      float64
 2   2007    5 non-null      float64
 3   2008    5 non-null      float64
 4   2009    5 non-null      float64
 5   2010    5 non-null      float64
 6   2011    5 non-null      float64
 7   2012    5 non-null      float64
 8   2013    5 non-null      float64
 9   2014    5 non-null      float64
 10  2015    5 non-null      float64
 11  2016    5 non-null      float64
dtypes: float64(12)
memory usage: 520.0+ bytes
boroughs_five.columns = boroughs_five.columns.astype(str)
boroughs_five['Relative Change'] = ((boroughs_five['2016'] - boroughs_five['2005']) / boroughs_five['2005'])*100
boroughs_five.head()
time_period 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Relative Change
geo_area_name
Bronx 127.7 108.2 90.4 60.9 39.2 37.5 28.5 20.9 20.1 18.7 15.7 15.0 -88.253720
Brooklyn 140.6 136.9 120.7 76.6 60.0 52.6 45.8 35.3 30.1 26.8 22.6 22.3 -84.139403
Manhattan 110.6 101.7 88.2 55.4 36.4 27.2 22.2 15.3 15.1 14.0 10.6 8.1 -92.676311
Queens 102.5 92.3 79.6 53.7 40.2 38.2 28.5 20.6 18.2 18.6 15.4 14.3 -86.048780
Staten Island 68.2 62.9 63.5 38.2 33.8 24.3 20.6 16.8 17.7 17.0 11.9 14.8 -78.299120

Borough Relative Change
The above table shows the relative change of the rate of elevated BLL for each borough from 2005 to 2016. All show a reduction and the relative changes are fairly similar. Manhattan had the highest relative change and Staten Island the lowest.

Step 6#

I then wanted to compare the values of the 2005 rate per 1000 and the relative change, to see if there appeared to be a correlation between high rates and more urgent intervention (as measured by relative change). I created a new dataframe and dropped the columns from 2006 to 2016, reset the dataframes index, sorted the boroughs by the 2005 rate and got the absolute value of the relative change column. I then created a bar chart from this dataframe to visually compare the 2005 rates and absolute relative change across the boroughs.

boroughs_compare = boroughs_five.drop(['2006', '2007', '2008', '2009', '2010', 
                                       '2011', '2012', '2013', '2014', '2015', '2016'], axis=1)
boroughs_compare = boroughs_compare.reset_index()
boroughs_compare = boroughs_compare.sort_values(('2005'), ascending=False)
boroughs_compare['Relative Change'] = boroughs_compare['Relative Change'].abs()
boroughs_compare.head()
time_period geo_area_name 2005 Relative Change
1 Brooklyn 140.6 84.139403
0 Bronx 127.7 88.253720
2 Manhattan 110.6 92.676311
3 Queens 102.5 86.048780
4 Staten Island 68.2 78.299120
fig = px.bar(boroughs_compare, x='geo_area_name', y=['2005', 'Relative Change'], barmode='group',
            title = 'Relative Change and 2005 Rate per 1000 Comparison by Borough', 
             labels={'geo_area_name':'Borough', 'value':'Value', 'variable': 'Variable'})
fig.show()

Bar Chart
Whilst the values for relative change and the rate are different units, this bar chart shows that there is no clear correlation between the elevated BLL rates per 1000 tested in 2005 and absolute relative change for each borough.

Step 7#

I then wanted to filter the original dataframe to just include neighbourhoods, I did this by finding ‘Neighborhood (UHF 42)’ under the ‘geo_type’ column and sorted by ‘geo_area_id’ to group them into boroughs. Whilst I will refer to each row in the dataframe as a ‘neighbourhood’ they can include multiple neighbourhoods grouped together and approximate community planning districts.
I then created another dataframe with only the values from 2005 to get the initial rates, and sorted it by the elevated BLL rate per 1000 tested (>=5µg/dL). I then wanted to create stacked bar chart to show the breakdowns of the elevated BLL rate of each neighbourhood by boroughs. I renamed the ‘borough_id’ values to their corresponding borough names in the dataframe to make the chart easier to interpret.

neighbourhoods =  bll_df[bll_df.geo_type == 'Neighborhood (UHF 42)']
neighbourhoods = neighbourhoods.sort_values('geo_area_id')
neighbourhoods.head()
geo_type geo_area_id geo_area_name borough_id time_period Elevated BLL >=5 Elevated BLL >=10 Elevated BLL >=15 Number Tested Rate BLL>=5 per 1000 tested Rate BLL>=10 per 1000 tested Rate BLL>=15 per 1000 tested
201 Neighborhood (UHF 42) 101 Kingsbridge - Riverdale 1.0 2009 85 7 1 3300 25.5 2.1 0.3
307 Neighborhood (UHF 42) 101 Kingsbridge - Riverdale 1.0 2013 30 4 0 3300 9.0 1.2 0.0
218 Neighborhood (UHF 42) 101 Kingsbridge - Riverdale 1.0 2007 243 13 4 3200 76.6 4.1 1.3
233 Neighborhood (UHF 42) 101 Kingsbridge - Riverdale 1.0 2005 198 22 4 2800 71.9 8.0 1.5
175 Neighborhood (UHF 42) 101 Kingsbridge - Riverdale 1.0 2008 147 3 1 3200 45.7 0.9 0.3
neighbourhoods_initial = neighbourhoods[neighbourhoods.time_period == 2005]
neighbourhoods_initial = neighbourhoods_initial.sort_values('Rate BLL>=5 per 1000 tested', ascending=False)
neighbourhoods_initial.head()
geo_type geo_area_id geo_area_name borough_id time_period Elevated BLL >=5 Elevated BLL >=10 Elevated BLL >=15 Number Tested Rate BLL>=5 per 1000 tested Rate BLL>=10 per 1000 tested Rate BLL>=15 per 1000 tested
339 Neighborhood (UHF 42) 211 Williamsburg - Bushwick 2.0 2005 2072 178 70 11600 178.3 15.3 6.0
360 Neighborhood (UHF 42) 201 Greenpoint 2.0 2005 869 71 22 5100 171.1 14.0 4.3
521 Neighborhood (UHF 42) 204 East New York 2.0 2005 1767 156 47 10800 163.8 14.5 4.4
515 Neighborhood (UHF 42) 303 East Harlem 3.0 2005 778 42 6 4800 161.8 8.7 1.2
520 Neighborhood (UHF 42) 203 Bedford Stuyvesant - Crown Heights 2.0 2005 2528 220 69 16100 156.8 13.6 4.3
neighbourhoods_initial.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 42 entries, 339 to 268
Data columns (total 12 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   geo_type                      42 non-null     object 
 1   geo_area_id                   42 non-null     int64  
 2   geo_area_name                 42 non-null     object 
 3   borough_id                    42 non-null     float64
 4   time_period                   42 non-null     int64  
 5   Elevated BLL >=5              42 non-null     int64  
 6   Elevated BLL >=10             42 non-null     int64  
 7   Elevated BLL >=15             42 non-null     int64  
 8   Number Tested                 42 non-null     int64  
 9   Rate BLL>=5 per 1000 tested   42 non-null     float64
 10  Rate BLL>=10 per 1000 tested  42 non-null     float64
 11  Rate BLL>=15 per 1000 tested  42 non-null     float64
dtypes: float64(4), int64(6), object(2)
memory usage: 4.3+ KB
neighbourhoods_initial['borough_id'].replace({1:'Bronx', 2:'Brooklyn', 3:'Manhtttan', 
                                              4:'Queens', 5:'Staten Island'}, inplace=True)
neighbourhoods_initial.head()
geo_type geo_area_id geo_area_name borough_id time_period Elevated BLL >=5 Elevated BLL >=10 Elevated BLL >=15 Number Tested Rate BLL>=5 per 1000 tested Rate BLL>=10 per 1000 tested Rate BLL>=15 per 1000 tested
339 Neighborhood (UHF 42) 211 Williamsburg - Bushwick Brooklyn 2005 2072 178 70 11600 178.3 15.3 6.0
360 Neighborhood (UHF 42) 201 Greenpoint Brooklyn 2005 869 71 22 5100 171.1 14.0 4.3
521 Neighborhood (UHF 42) 204 East New York Brooklyn 2005 1767 156 47 10800 163.8 14.5 4.4
515 Neighborhood (UHF 42) 303 East Harlem Manhtttan 2005 778 42 6 4800 161.8 8.7 1.2
520 Neighborhood (UHF 42) 203 Bedford Stuyvesant - Crown Heights Brooklyn 2005 2528 220 69 16100 156.8 13.6 4.3
fig = px.bar(neighbourhoods_initial,
             x='borough_id',
             y='Rate BLL>=5 per 1000 tested',
             hover_data=['geo_area_name'],
            labels={'borough_id':'Borough'}, title='Rate per 1000 Tested by Borough and Neighbourhood, 2005')

fig.show()

Stacked Bar Chart
The above bar chart shows the elevated BLL rates per 1000 tested in each neighbourhood by borough in 2005. Hovering over the chart allows one to see where the highest rates are in each borough. Williamsburg/Bushwick (Brooklyn), East Harlem (Manhattan), Hunts Point/Mott Haven (the Bronx), West Queens (Queens) and Port Richmond (Staten Island) each had the highest elevated BLL rates for their repsective boroughs.

Step 8#

I then wanted to to look at the relative change for each neighbourhood between 2005 and 2016, and compare this to the initial elevated BLL rates per 1000 tested. I began by creating a choropleth map of the initial rate for each neighbourhood, allowing me to see where the highest frequencies were concentrated across the city. I then wanted to create a new dataframe that could be used to find the relative change for each neighbourhood. I did this by pivoting the neighbourhoods dataframe to create a new dataframe that had a column for each year. I created a new column in this dataframe for the relative change using the 2005 and 2016 values. Next, I filtered this dataframe to only include the columns with relevant data to the choropleth map and reset the index. Finally, I used this dataframe to create the choropleth map of relative change in rates across each neighbourhood.

import json
f = open('uhf42.geojson')
geojson = json.load(f)
geojson['features'][1]['properties']
{'cartodb_id': 5,
 'objectid': 5,
 'borough': 'Bronx',
 'uhf_neigh': 'Pelham - Throgs Neck',
 'shape_area': 386573664.368,
 'shape_leng': 250903.372273,
 'uhfcode': 104}
fig = px.choropleth_mapbox(neighbourhoods_initial,
                           geojson=geojson,
                           locations='geo_area_id',
                           featureidkey='properties.uhfcode',
                           color='Rate BLL>=5 per 1000 tested',
                           hover_data=['geo_area_name'],
                           center = {'lat': 40.73, 'lon': -73.98},
                           zoom=9,
                           mapbox_style='carto-positron',
                          title='Elevated BLL Rate per 1000 Tested, 2005')

fig.update_layout(height=700)
fig.show()

2005 Neighbourhood Rates Choropleth Map
The above map shows the initial elevated BLL rates (>=5µg/dL concentration) for each neighbourhood area. The highest frequency of elevated BLL per 1000 tested was in Williamsburg/Bushwick and the other notably high rates of elevated BLL were concentrated nearby in other parts of Brooklyn. Staten Island had noticeably low rates per 1000 tested in 2005. The Bronx did not have the highest elevated BLL rates, but had multiple neighbourhoods with rates on the higher end. The disparity between rates in neighbouring areas in Manhattan is worth noting - with East Harlem/Harlem/Morningside Heights having some of the highest elevated BLL rates across the city and being right next to the Upper West and East Sides which had some of the lowest.

neighbourhoods_five = neighbourhoods.pivot(index=['geo_area_id', 'geo_area_name'], 
                                           columns='time_period', values='Rate BLL>=5 per 1000 tested')
neighbourhoods_five.head()
time_period 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
geo_area_id geo_area_name
101 Kingsbridge - Riverdale 71.9 60.8 76.6 45.7 25.5 21.4 18.1 9.8 9.0 10.2 9.5 9.0
102 Northeast Bronx 121.6 103.5 89.5 57.3 36.5 34.7 26.4 19.7 18.6 18.6 16.2 14.3
103 Fordham - Bronx Pk 130.5 109.7 97.0 69.5 46.5 45.1 32.5 27.0 24.2 22.2 17.4 17.5
104 Pelham - Throgs Neck 113.2 108.2 84.6 54.9 37.4 32.8 27.6 16.1 17.2 17.8 18.5 16.9
105 Crotona -Tremont 135.5 111.9 90.5 61.4 40.3 36.5 28.4 22.8 19.0 17.5 15.1 13.4
neighbourhoods_five.columns = neighbourhoods_five.columns.astype(str)
neighbourhoods_five['Relative Change'] = ((neighbourhoods_five['2016'] - neighbourhoods_five['2005']) / neighbourhoods_five['2005'])*100
neighbourhoods_five.head()
time_period 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Relative Change
geo_area_id geo_area_name
101 Kingsbridge - Riverdale 71.9 60.8 76.6 45.7 25.5 21.4 18.1 9.8 9.0 10.2 9.5 9.0 -87.482615
102 Northeast Bronx 121.6 103.5 89.5 57.3 36.5 34.7 26.4 19.7 18.6 18.6 16.2 14.3 -88.240132
103 Fordham - Bronx Pk 130.5 109.7 97.0 69.5 46.5 45.1 32.5 27.0 24.2 22.2 17.4 17.5 -86.590038
104 Pelham - Throgs Neck 113.2 108.2 84.6 54.9 37.4 32.8 27.6 16.1 17.2 17.8 18.5 16.9 -85.070671
105 Crotona -Tremont 135.5 111.9 90.5 61.4 40.3 36.5 28.4 22.8 19.0 17.5 15.1 13.4 -90.110701
neighbourhoods_change = neighbourhoods_five.filter(['geo_area_id', 'Relative Change'], axis=1)
neighbourhoods_change = neighbourhoods_change.reset_index(level=[0,1])
neighbourhoods_change.head()
time_period geo_area_id geo_area_name Relative Change
0 101 Kingsbridge - Riverdale -87.482615
1 102 Northeast Bronx -88.240132
2 103 Fordham - Bronx Pk -86.590038
3 104 Pelham - Throgs Neck -85.070671
4 105 Crotona -Tremont -90.110701
fig = px.choropleth_mapbox(neighbourhoods_change,
                           geojson=geojson,
                           locations='geo_area_id',
                           featureidkey='properties.uhfcode',
                           color='Relative Change',
                           hover_data=['geo_area_name'],
                           center = {'lat': 40.73, 'lon': -73.98},
                           zoom=9,
                           mapbox_style='carto-positron',
                          title='Relative Change in Elevated BLL Rate per 1000 Tested, 2005-2016')

fig.update_layout(height=700)
fig.show()

Neighbourhood Relative Change Choropleth Map
The above map displays the relative change in elevated BLL rates per 1000 tested for each neighbourhood between 2005 and 2016. Staten Island’s neighbourhoods had a comparatively low relative change - however, as seen on the previous map they also had low elevated BLL rates in 2005. Greenpoint has the lowest relative change between 2005 and 2016 elevated BLL rate per 1000 tested. It is worth noting tha Greenpoint also had the second highest rate in 2005.

Conclusions#

My hypothesis was conistent regarding the falling number of elevated BLL in children under six in NYC over the years. This was true at all three measured concentrations used to identify elevated BLL rates in children. Due to the low frequency of elevated BLL rates at the 10µg/dL and 15µg/dL concentrations citywide, my analysis dealt mostly with 5µg/dL concentration.
In line with my hypothesis, there were areas that had higher rates than others acoss the city. Brooklyn consistently had the the highest rate per 1000 tested at the 5µg/dL concentration over the yers, although all five of the boroughs saw some convergence to lower rates over time. This was similarly reflected in the breakdown by neighbourhood areas.
I hypothesised that there would be larger relative changes in the 2005 and 2016 elevated BLL rates for areas that had high concentrations - however, in light of the low frequency of these more severe cases I chose to use 2005 elevated BLL rates per 1000 tested as a measure of severity. Notably, there did not appear to be a correlation between the 2005 rates and the relative change in either boroughs or neighbourhoods. Manhattan had the highest relative change, despite having the median 2005 rate per 1000 tested. In a similar vein, Greenpoint had the second highest rate in 2005, and the lowest relative change. This may speak to socio-economic and geographic factors that influence the speed of infrastructure and regulation change/compliance - government priorities, physical or financial barriers to changing infrastructure in areas with high lead exposure risks and individuals’ abilities to guard their children against lead exposure.