Investigating Elevated Blood Lead Level Rates in Children in NYC

Investigating Elevated Blood Lead Level Rates in Children in NYC#

Introduction#

Dataset:
I am using the Children Under 6 yrs with Elevated Blood Lead Levels Dataset from NYC Open Data. The dataset contains 576 rows: the number of children with elevated blood lead levels (with various amounts - ranging from 5-15µg/dL), the number of children tested and the rate per 1000 tested at each blood lead level (BLL). The data is across locations (borough and neighbourhood) and years.

Questions:

How has the amount of children under 6 with elevated blood lead levels in NYC changed over the years?
- Does this vary at different concentrations (mcg/dL)?
How does the amount of children with elevated blood lead levels vary between boroughs?
- Between neighbourhoods?
- Does this vary at different concentrations?
How does the change in rate of children with elevated blood lead levels over the years vary between neighbourhoods/boroughs?

Hypothesis:
I hypothesise that the number of children with elevated blood lead level rates will fall over the years, as regulation and infrastructure adapts with knowledge about the dangers of lead exposure to children. I predict that there will be areas (both boroughs and neighbourhoods) with more concentrated frequencies of children with elevated lead blood levels and these areas are more likely to have higher amounts of more severe lead levels (10 or 15µg/dL). I estimate that in areas with higher concentrations of elevated blood lead levels the rate of change of the years will more pronounced (decrease) than in areas with low concentration as the severity of these cases call for more urgent government intervention.

Step 1#

I began by importing the necessary packages - pandas and plotly. plotly is a package which creates charts, graphs and other data visualisation tools.
Then I read my dataframe (Children_Elevated_BLL.csv) into the notebook and displayed the beginning lines to see what it looked like and that it had read correctly. I then performed some functions to look at the contents of various columns (namely the columns which contained notes) and to get more information about the dataframe which will be helpful later.

import pandas as pd
import plotly.express as px

bll_df = pd.read_csv('Children_Elevated_BLL.csv')
bll_df.head()

	geo_type	geo_area_id	geo_area_name	borough_id	time_period	Children under 6 years with elevated blood lead levels (BLL) Number BLL >=5 µg/dL	Children under 6 years with elevated blood lead levels (BLL) Number BLL >=5 µg/dL _NOTES	Children under 6 years with elevated blood lead levels (BLL) Number BLL>=10 µg/dL	Children under 6 years with elevated blood lead levels (BLL) Number BLL>=10 µg/dL _NOTES	Children under 6 years with elevated blood lead levels (BLL) Number BLL>=15 µg/dL	Children under 6 years with elevated blood lead levels (BLL) Number BLL>=15 µg/dL _NOTES	Children under 6 years with elevated blood lead levels (BLL) Number Tested	Children under 6 years with elevated blood lead levels (BLL) Number Tested _NOTES	Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=5 µg/dL per 1,000 tested	Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=5 µg/dL per 1,000 tested_NOTES	Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=10 µg/dL per 1,000 tested	Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=10 µg/dL per 1,000 tested_NOTES	Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=15 µg/dL per 1,000 tested	Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=15 µg/dL per 1,000 tested_NOTES
0	Borough	1	Bronx	1.0	2005	8245	NaN	595	NaN	167	NaN	64500	NaN	127.7	NaN	9.2	NaN	2.6	NaN
1	Borough	1	Bronx	1.0	2006	7272	NaN	474	NaN	144	NaN	67200	NaN	108.2	NaN	7.1	NaN	2.1	NaN
2	Borough	1	Bronx	1.0	2007	6174	NaN	438	NaN	135	NaN	68300	NaN	90.4	NaN	6.4	NaN	2.0	NaN
3	Borough	1	Bronx	1.0	2008	4254	NaN	292	NaN	105	NaN	69800	NaN	60.9	NaN	4.2	NaN	1.5	NaN
4	Borough	1	Bronx	1.0	2009	2742	NaN	278	NaN	103	NaN	70000	NaN	39.2	NaN	4.0	NaN	1.5	NaN

bll_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 576 entries, 0 to 575
Data columns (total 19 columns):
 #   Column                                                                                                  Non-Null Count  Dtype  
---  ------                                                                                                  --------------  -----  
 geo_type                                                                                                576 non-null    object 
 geo_area_id                                                                                             576 non-null    int64  
 geo_area_name                                                                                           576 non-null    object 
 borough_id                                                                                              564 non-null    float64
 time_period                                                                                             576 non-null    int64  
 Children under 6 years with elevated blood lead levels (BLL) Number BLL >=5 µg/dL                       576 non-null    int64  
 Children under 6 years with elevated blood lead levels (BLL) Number BLL >=5 µg/dL _NOTES                9 non-null      object 
 Children under 6 years with elevated blood lead levels (BLL) Number BLL>=10 µg/dL                       576 non-null    int64  
 Children under 6 years with elevated blood lead levels (BLL) Number BLL>=10 µg/dL _NOTES                139 non-null    object 
 Children under 6 years with elevated blood lead levels (BLL) Number BLL>=15 µg/dL                       576 non-null    int64  
Children under 6 years with elevated blood lead levels (BLL) Number BLL>=15 µg/dL _NOTES                295 non-null    object 
Children under 6 years with elevated blood lead levels (BLL) Number Tested                              576 non-null    int64  
Children under 6 years with elevated blood lead levels (BLL) Number Tested _NOTES                       0 non-null      float64
Children under 6 years with elevated blood lead levels (BLL) Rate  BLL>=5 µg/dL per 1,000 tested        576 non-null    float64
Children under 6 years with elevated blood lead levels (BLL) Rate  BLL>=5 µg/dL per 1,000 tested_NOTES  9 non-null      object 
Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=10 µg/dL per 1,000 tested        576 non-null    float64
Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=10 µg/dL per 1,000 tested_NOTES  139 non-null    object 
Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=15 µg/dL per 1,000 tested        576 non-null    float64
Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=15 µg/dL per 1,000 tested_NOTES  295 non-null    object 
dtypes: float64(5), int64(6), object(8)
memory usage: 85.6+ KB

bll_df['Children under 6 years with elevated blood lead levels (BLL) Number BLL>=10 µg/dL _NOTES'].unique()

array([nan,
       '*Estimate is based on small numbers so should be interpreted with caution.'],
      dtype=object)

bll_df['geo_type'].unique()

array(['Borough', 'Neighborhood (UHF 42)', 'Citywide'], dtype=object)

Step 2#

After looking at the contents of the ‘notes’ columns, I chose to remove them from the dataframe as they did not include information that I think is useful for the questions I am investigating. I believe the warning to interpret the small numbers with caution speaks more broadly to regression analysis, or making inferences about the wider population. I then renamed some of the columns to make the output tables more appealing visually and to remove any extraneous information in the column titles.

bll_df.drop(bll_df.columns[[6, 8, 10, 12, 14, 16, 18]], axis = 1, inplace = True)
bll_df.head()

	geo_type	geo_area_id	geo_area_name	borough_id	time_period	Children under 6 years with elevated blood lead levels (BLL) Number BLL >=5 µg/dL	Children under 6 years with elevated blood lead levels (BLL) Number BLL>=10 µg/dL	Children under 6 years with elevated blood lead levels (BLL) Number BLL>=15 µg/dL	Children under 6 years with elevated blood lead levels (BLL) Number Tested	Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=5 µg/dL per 1,000 tested	Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=10 µg/dL per 1,000 tested	Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=15 µg/dL per 1,000 tested
0	Borough	1	Bronx	1.0	2005	8245	595	167	64500	127.7	9.2	2.6
1	Borough	1	Bronx	1.0	2006	7272	474	144	67200	108.2	7.1	2.1
2	Borough	1	Bronx	1.0	2007	6174	438	135	68300	90.4	6.4	2.0
3	Borough	1	Bronx	1.0	2008	4254	292	105	69800	60.9	4.2	1.5
4	Borough	1	Bronx	1.0	2009	2742	278	103	70000	39.2	4.0	1.5

bll_df.rename(columns = {'Children under 6 years with elevated blood lead levels (BLL) Number BLL >=5 µg/dL':
                         'Elevated BLL >=5', 
                         'Children under 6 years with elevated blood lead levels (BLL) Number BLL>=10 µg/dL':
                         'Elevated BLL >=10', 
                         'Children under 6 years with elevated blood lead levels (BLL) Number BLL>=15 µg/dL':
                         'Elevated BLL >=15', 
                         'Children under 6 years with elevated blood lead levels (BLL) Number Tested':
                         'Number Tested', 
                         'Children under 6 years with elevated blood lead levels (BLL) Rate  BLL>=5 µg/dL per 1,000 tested':
                         'Rate BLL>=5 per 1000 tested', 
                         'Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=10 µg/dL per 1,000 tested':
                         'Rate BLL>=10 per 1000 tested', 
                         'Children under 6 years with elevated blood lead levels (BLL) Rate BLL>=15 µg/dL per 1,000 tested':
                         'Rate BLL>=15 per 1000 tested'}, 
              inplace = True)
bll_df.head()

	geo_type	geo_area_id	geo_area_name	borough_id	time_period	Elevated BLL >=5	Elevated BLL >=10	Elevated BLL >=15	Number Tested	Rate BLL>=5 per 1000 tested	Rate BLL>=10 per 1000 tested	Rate BLL>=15 per 1000 tested
0	Borough	1	Bronx	1.0	2005	8245	595	167	64500	127.7	9.2	2.6
1	Borough	1	Bronx	1.0	2006	7272	474	144	67200	108.2	7.1	2.1
2	Borough	1	Bronx	1.0	2007	6174	438	135	68300	90.4	6.4	2.0
3	Borough	1	Bronx	1.0	2008	4254	292	105	69800	60.9	4.2	1.5
4	Borough	1	Bronx	1.0	2009	2742	278	103	70000	39.2	4.0	1.5

Step 3#

I then created a new dataframe which only included the rows where the geographic area name was ‘New York City’ as this provided data from the entire city for each year. I then reshaped this new dataframe using the ‘melt’ function to include the year, rates (µg/dL) and number per 1000 children tested. This allowed me to create a line chart which has different lines for each BLL rate.

citywide = bll_df[bll_df.geo_area_name  == 'New York City']
citywide = citywide.sort_values('time_period')
citywide

	geo_type	geo_area_id	geo_area_name	borough_id	time_period	Elevated BLL >=5	Elevated BLL >=10	Elevated BLL >=15	Number Tested	Rate BLL>=5 per 1000 tested	Rate BLL>=10 per 1000 tested	Rate BLL>=15 per 1000 tested
98	Citywide	1	New York City	NaN	2005	37344	3082	1014	310100	120.4	9.9	3.3
345	Citywide	1	New York City	NaN	2006	34629	2767	928	313900	110.3	8.8	3.0
327	Citywide	1	New York City	NaN	2007	30493	2282	745	318200	95.8	7.2	2.3
65	Citywide	1	New York City	NaN	2008	20423	1803	612	328000	62.3	5.5	1.9
212	Citywide	1	New York City	NaN	2009	15224	1565	565	331800	45.9	4.7	1.7
204	Citywide	1	New York City	NaN	2010	13951	1574	566	340900	40.9	4.6	1.7
90	Citywide	1	New York City	NaN	2011	11437	1332	447	342900	33.4	3.9	1.3
259	Citywide	1	New York City	NaN	2012	8179	1053	392	328600	24.9	3.2	1.2
313	Citywide	1	New York City	NaN	2013	7204	910	325	322900	22.3	2.8	1.0
523	Citywide	1	New York City	NaN	2014	6550	959	341	314500	20.8	3.0	1.1
446	Citywide	1	New York City	NaN	2015	5371	908	318	311300	17.3	2.9	1.0
319	Citywide	1	New York City	NaN	2016	4928	822	300	299000	16.5	2.7	1.0

citywide_conc = citywide.melt(id_vars='time_period', 
                              value_vars=['Rate BLL>=5 per 1000 tested', 
                                          'Rate BLL>=10 per 1000 tested',
                                         'Rate BLL>=15 per 1000 tested'],
                             var_name='Rates', value_name='Number per 1000 Tested')
citywide_conc.sample(5)

	time_period	Rates	Number per 1000 Tested
0	2005	Rate BLL>=5 per 1000 tested	120.4
20	2013	Rate BLL>=10 per 1000 tested	2.8
7	2012	Rate BLL>=5 per 1000 tested	24.9
35	2016	Rate BLL>=15 per 1000 tested	1.0
17	2010	Rate BLL>=10 per 1000 tested	4.6

fig = px.line(citywide_conc, x= "time_period", y="Number per 1000 Tested", 
              title = "Citywide Elevated BLL Rates per 1000 tested",
              color = "Rates",
             labels={'time_period':'Year'})
fig.show()

Citywide Line Chart
The above line chart shows how elevated BLL rates have fallen over the years, consistent with my hypothesis. As there are a relatively few number of children with elevated BLL >=10 or >=15 µg/dL per 1000, I chose to proceed with the majority of my analysis using the BLL >=5 µg/dL rate per 1000 tested. Additionally the CDC reference for lead exposure is 3.5µg/dL and thus using >=5µg/dL is adequate for analysing elevated BLL for children across the city.

Step 4#

I am now looking at the elevated BLL rates for each borough, using the >=5 µg/dL concentration. I began by creating a new dataframe by finding the rows that had ‘Borough’ in the geo_type column. I then produced a line chart which had the rate per 1000 tested for each borough over the years.

boroughs = bll_df[bll_df.geo_type  == 'Borough']
boroughs = boroughs.sort_values('time_period')
boroughs.head()

	geo_type	geo_area_id	geo_area_name	borough_id	time_period	Elevated BLL >=5	Elevated BLL >=10	Elevated BLL >=15	Number Tested	Rate BLL>=5 per 1000 tested	Rate BLL>=10 per 1000 tested	Rate BLL>=15 per 1000 tested
0	Borough	1	Bronx	1.0	2005	8245	595	167	64500	127.7	9.2	2.6
24	Borough	3	Manhattan	3.0	2005	4851	324	85	43900	110.6	7.4	1.9
36	Borough	4	Queens	4.0	2005	8238	750	278	80400	102.5	9.3	3.5
12	Borough	2	Brooklyn	2.0	2005	15015	1301	448	106800	140.6	12.2	4.2
48	Borough	5	Staten Island	5.0	2005	990	112	36	14500	68.2	7.7	2.5

fig = px.line(boroughs, x= "time_period", y="Rate BLL>=5 per 1000 tested", color = 'geo_area_name',
             title = "Elevated BLL Rate per 1000 tested",
             labels={'time_period':'Year', "Rate BLL>=5 per 1000 tested":'Rate BLL>=5 µg/dL per 1000 tested',
                    'geo_area_name':'Borough'})
fig.show()

Borough Line Chart
The above line chart displays the rate of children per 1000 tested who had elevated BLL >=5µg/dL. All five boroughs show a reduction in the rate over time. Staten Island is the only borough to show an increase rate for some years, though it still has a downward trend. Brooklyn consistently has the highest rate across the years of all the boroughs, although over time all five converge to relatively low rates per 1000. In 2005, Manhattan was the median rate across the five boroughs - but the lowest by 2016.

Step 5#

I the pivoted the boroughs dataframe so that the columns each contained years, allowing me to later find the relative change of rate per 1000 children tested in each borough (>=5 µg/dL concentration). I had to convert the year column titles to strings (as opposed to integers) so I could refer to them by their title. I then created a column for the relative change by using the values from the 2005 and 2016 columns for each borough.

boroughs_five = boroughs.pivot(index='geo_area_name', columns='time_period', values='Rate BLL>=5 per 1000 tested')
boroughs_five

time_period	2005	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016
geo_area_name
Bronx	127.7	108.2	90.4	60.9	39.2	37.5	28.5	20.9	20.1	18.7	15.7	15.0
Brooklyn	140.6	136.9	120.7	76.6	60.0	52.6	45.8	35.3	30.1	26.8	22.6	22.3
Manhattan	110.6	101.7	88.2	55.4	36.4	27.2	22.2	15.3	15.1	14.0	10.6	8.1
Queens	102.5	92.3	79.6	53.7	40.2	38.2	28.5	20.6	18.2	18.6	15.4	14.3
Staten Island	68.2	62.9	63.5	38.2	33.8	24.3	20.6	16.8	17.7	17.0	11.9	14.8

boroughs_five.info()

<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, Bronx to Staten Island
Data columns (total 12 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   2005    5 non-null      float64
 1   2006    5 non-null      float64
 2   2007    5 non-null      float64
 3   2008    5 non-null      float64
 4   2009    5 non-null      float64
 5   2010    5 non-null      float64
 6   2011    5 non-null      float64
 7   2012    5 non-null      float64
 8   2013    5 non-null      float64
 9   2014    5 non-null      float64
 10  2015    5 non-null      float64
 11  2016    5 non-null      float64
dtypes: float64(12)
memory usage: 520.0+ bytes

boroughs_five.columns = boroughs_five.columns.astype(str)

boroughs_five['Relative Change'] = ((boroughs_five['2016'] - boroughs_five['2005']) / boroughs_five['2005'])*100
boroughs_five.head()

time_period	2005	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016	Relative Change
geo_area_name
Bronx	127.7	108.2	90.4	60.9	39.2	37.5	28.5	20.9	20.1	18.7	15.7	15.0	-88.253720
Brooklyn	140.6	136.9	120.7	76.6	60.0	52.6	45.8	35.3	30.1	26.8	22.6	22.3	-84.139403
Manhattan	110.6	101.7	88.2	55.4	36.4	27.2	22.2	15.3	15.1	14.0	10.6	8.1	-92.676311
Queens	102.5	92.3	79.6	53.7	40.2	38.2	28.5	20.6	18.2	18.6	15.4	14.3	-86.048780
Staten Island	68.2	62.9	63.5	38.2	33.8	24.3	20.6	16.8	17.7	17.0	11.9	14.8	-78.299120

Borough Relative Change
The above table shows the relative change of the rate of elevated BLL for each borough from 2005 to 2016. All show a reduction and the relative changes are fairly similar. Manhattan had the highest relative change and Staten Island the lowest.

Step 6#

I then wanted to compare the values of the 2005 rate per 1000 and the relative change, to see if there appeared to be a correlation between high rates and more urgent intervention (as measured by relative change). I created a new dataframe and dropped the columns from 2006 to 2016, reset the dataframes index, sorted the boroughs by the 2005 rate and got the absolute value of the relative change column. I then created a bar chart from this dataframe to visually compare the 2005 rates and absolute relative change across the boroughs.

boroughs_compare = boroughs_five.drop(['2006', '2007', '2008', '2009', '2010', 
                                       '2011', '2012', '2013', '2014', '2015', '2016'], axis=1)
boroughs_compare = boroughs_compare.reset_index()
boroughs_compare = boroughs_compare.sort_values(('2005'), ascending=False)
boroughs_compare['Relative Change'] = boroughs_compare['Relative Change'].abs()
boroughs_compare.head()

time_period	geo_area_name	2005	Relative Change
1	Brooklyn	140.6	84.139403
0	Bronx	127.7	88.253720
2	Manhattan	110.6	92.676311
3	Queens	102.5	86.048780
4	Staten Island	68.2	78.299120

fig = px.bar(boroughs_compare, x='geo_area_name', y=['2005', 'Relative Change'], barmode='group',
            title = 'Relative Change and 2005 Rate per 1000 Comparison by Borough', 
             labels={'geo_area_name':'Borough', 'value':'Value', 'variable': 'Variable'})
fig.show()

Bar Chart
Whilst the values for relative change and the rate are different units, this bar chart shows that there is no clear correlation between the elevated BLL rates per 1000 tested in 2005 and absolute relative change for each borough.

Step 7#

I then wanted to filter the original dataframe to just include neighbourhoods, I did this by finding ‘Neighborhood (UHF 42)’ under the ‘geo_type’ column and sorted by ‘geo_area_id’ to group them into boroughs. Whilst I will refer to each row in the dataframe as a ‘neighbourhood’ they can include multiple neighbourhoods grouped together and approximate community planning districts.
I then created another dataframe with only the values from 2005 to get the initial rates, and sorted it by the elevated BLL rate per 1000 tested (>=5µg/dL). I then wanted to create stacked bar chart to show the breakdowns of the elevated BLL rate of each neighbourhood by boroughs. I renamed the ‘borough_id’ values to their corresponding borough names in the dataframe to make the chart easier to interpret.

neighbourhoods =  bll_df[bll_df.geo_type == 'Neighborhood (UHF 42)']
neighbourhoods = neighbourhoods.sort_values('geo_area_id')
neighbourhoods.head()

	geo_type	geo_area_id	geo_area_name	borough_id	time_period	Elevated BLL >=5	Elevated BLL >=10	Elevated BLL >=15	Number Tested	Rate BLL>=5 per 1000 tested	Rate BLL>=10 per 1000 tested	Rate BLL>=15 per 1000 tested
201	Neighborhood (UHF 42)	101	Kingsbridge - Riverdale	1.0	2009	85	7	1	3300	25.5	2.1	0.3
307	Neighborhood (UHF 42)	101	Kingsbridge - Riverdale	1.0	2013	30	4	0	3300	9.0	1.2	0.0
218	Neighborhood (UHF 42)	101	Kingsbridge - Riverdale	1.0	2007	243	13	4	3200	76.6	4.1	1.3
233	Neighborhood (UHF 42)	101	Kingsbridge - Riverdale	1.0	2005	198	22	4	2800	71.9	8.0	1.5
175	Neighborhood (UHF 42)	101	Kingsbridge - Riverdale	1.0	2008	147	3	1	3200	45.7	0.9	0.3

neighbourhoods_initial = neighbourhoods[neighbourhoods.time_period == 2005]
neighbourhoods_initial = neighbourhoods_initial.sort_values('Rate BLL>=5 per 1000 tested', ascending=False)
neighbourhoods_initial.head()

	geo_type	geo_area_id	geo_area_name	borough_id	time_period	Elevated BLL >=5	Elevated BLL >=10	Elevated BLL >=15	Number Tested	Rate BLL>=5 per 1000 tested	Rate BLL>=10 per 1000 tested	Rate BLL>=15 per 1000 tested
339	Neighborhood (UHF 42)	211	Williamsburg - Bushwick	2.0	2005	2072	178	70	11600	178.3	15.3	6.0
360	Neighborhood (UHF 42)	201	Greenpoint	2.0	2005	869	71	22	5100	171.1	14.0	4.3
521	Neighborhood (UHF 42)	204	East New York	2.0	2005	1767	156	47	10800	163.8	14.5	4.4
515	Neighborhood (UHF 42)	303	East Harlem	3.0	2005	778	42	6	4800	161.8	8.7	1.2
520	Neighborhood (UHF 42)	203	Bedford Stuyvesant - Crown Heights	2.0	2005	2528	220	69	16100	156.8	13.6	4.3

neighbourhoods_initial.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 42 entries, 339 to 268
Data columns (total 12 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   geo_type                      42 non-null     object 
 1   geo_area_id                   42 non-null     int64  
 2   geo_area_name                 42 non-null     object 
 3   borough_id                    42 non-null     float64
 4   time_period                   42 non-null     int64  
 5   Elevated BLL >=5              42 non-null     int64  
 6   Elevated BLL >=10             42 non-null     int64  
 7   Elevated BLL >=15             42 non-null     int64  
 8   Number Tested                 42 non-null     int64  
 9   Rate BLL>=5 per 1000 tested   42 non-null     float64
 10  Rate BLL>=10 per 1000 tested  42 non-null     float64
 11  Rate BLL>=15 per 1000 tested  42 non-null     float64
dtypes: float64(4), int64(6), object(2)
memory usage: 4.3+ KB

neighbourhoods_initial['borough_id'].replace({1:'Bronx', 2:'Brooklyn', 3:'Manhtttan', 
                                              4:'Queens', 5:'Staten Island'}, inplace=True)
neighbourhoods_initial.head()

	geo_type	geo_area_id	geo_area_name	borough_id	time_period	Elevated BLL >=5	Elevated BLL >=10	Elevated BLL >=15	Number Tested	Rate BLL>=5 per 1000 tested	Rate BLL>=10 per 1000 tested	Rate BLL>=15 per 1000 tested
339	Neighborhood (UHF 42)	211	Williamsburg - Bushwick	Brooklyn	2005	2072	178	70	11600	178.3	15.3	6.0
360	Neighborhood (UHF 42)	201	Greenpoint	Brooklyn	2005	869	71	22	5100	171.1	14.0	4.3
521	Neighborhood (UHF 42)	204	East New York	Brooklyn	2005	1767	156	47	10800	163.8	14.5	4.4
515	Neighborhood (UHF 42)	303	East Harlem	Manhtttan	2005	778	42	6	4800	161.8	8.7	1.2
520	Neighborhood (UHF 42)	203	Bedford Stuyvesant - Crown Heights	Brooklyn	2005	2528	220	69	16100	156.8	13.6	4.3

fig = px.bar(neighbourhoods_initial,
             x='borough_id',
             y='Rate BLL>=5 per 1000 tested',
             hover_data=['geo_area_name'],
            labels={'borough_id':'Borough'}, title='Rate per 1000 Tested by Borough and Neighbourhood, 2005')

fig.show()

Stacked Bar Chart
The above bar chart shows the elevated BLL rates per 1000 tested in each neighbourhood by borough in 2005. Hovering over the chart allows one to see where the highest rates are in each borough. Williamsburg/Bushwick (Brooklyn), East Harlem (Manhattan), Hunts Point/Mott Haven (the Bronx), West Queens (Queens) and Port Richmond (Staten Island) each had the highest elevated BLL rates for their repsective boroughs.

Step 8#

I then wanted to to look at the relative change for each neighbourhood between 2005 and 2016, and compare this to the initial elevated BLL rates per 1000 tested. I began by creating a choropleth map of the initial rate for each neighbourhood, allowing me to see where the highest frequencies were concentrated across the city. I then wanted to create a new dataframe that could be used to find the relative change for each neighbourhood. I did this by pivoting the neighbourhoods dataframe to create a new dataframe that had a column for each year. I created a new column in this dataframe for the relative change using the 2005 and 2016 values. Next, I filtered this dataframe to only include the columns with relevant data to the choropleth map and reset the index. Finally, I used this dataframe to create the choropleth map of relative change in rates across each neighbourhood.

import json
f = open('uhf42.geojson')
geojson = json.load(f)

geojson['features'][1]['properties']

{'cartodb_id': 5,
 'objectid': 5,
 'borough': 'Bronx',
 'uhf_neigh': 'Pelham - Throgs Neck',
 'shape_area': 386573664.368,
 'shape_leng': 250903.372273,
 'uhfcode': 104}

fig = px.choropleth_mapbox(neighbourhoods_initial,
                           geojson=geojson,
                           locations='geo_area_id',
                           featureidkey='properties.uhfcode',
                           color='Rate BLL>=5 per 1000 tested',
                           hover_data=['geo_area_name'],
                           center = {'lat': 40.73, 'lon': -73.98},
                           zoom=9,
                           mapbox_style='carto-positron',
                          title='Elevated BLL Rate per 1000 Tested, 2005')

fig.update_layout(height=700)
fig.show()

2005 Neighbourhood Rates Choropleth Map
The above map shows the initial elevated BLL rates (>=5µg/dL concentration) for each neighbourhood area. The highest frequency of elevated BLL per 1000 tested was in Williamsburg/Bushwick and the other notably high rates of elevated BLL were concentrated nearby in other parts of Brooklyn. Staten Island had noticeably low rates per 1000 tested in 2005. The Bronx did not have the highest elevated BLL rates, but had multiple neighbourhoods with rates on the higher end. The disparity between rates in neighbouring areas in Manhattan is worth noting - with East Harlem/Harlem/Morningside Heights having some of the highest elevated BLL rates across the city and being right next to the Upper West and East Sides which had some of the lowest.

neighbourhoods_five = neighbourhoods.pivot(index=['geo_area_id', 'geo_area_name'], 
                                           columns='time_period', values='Rate BLL>=5 per 1000 tested')
neighbourhoods_five.head()

	time_period	2005	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016
geo_area_id	geo_area_name
101	Kingsbridge - Riverdale	71.9	60.8	76.6	45.7	25.5	21.4	18.1	9.8	9.0	10.2	9.5	9.0
102	Northeast Bronx	121.6	103.5	89.5	57.3	36.5	34.7	26.4	19.7	18.6	18.6	16.2	14.3
103	Fordham - Bronx Pk	130.5	109.7	97.0	69.5	46.5	45.1	32.5	27.0	24.2	22.2	17.4	17.5
104	Pelham - Throgs Neck	113.2	108.2	84.6	54.9	37.4	32.8	27.6	16.1	17.2	17.8	18.5	16.9
105	Crotona -Tremont	135.5	111.9	90.5	61.4	40.3	36.5	28.4	22.8	19.0	17.5	15.1	13.4

neighbourhoods_five.columns = neighbourhoods_five.columns.astype(str)

neighbourhoods_five['Relative Change'] = ((neighbourhoods_five['2016'] - neighbourhoods_five['2005']) / neighbourhoods_five['2005'])*100
neighbourhoods_five.head()

	time_period	2005	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015	2016	Relative Change
geo_area_id	geo_area_name
101	Kingsbridge - Riverdale	71.9	60.8	76.6	45.7	25.5	21.4	18.1	9.8	9.0	10.2	9.5	9.0	-87.482615
102	Northeast Bronx	121.6	103.5	89.5	57.3	36.5	34.7	26.4	19.7	18.6	18.6	16.2	14.3	-88.240132
103	Fordham - Bronx Pk	130.5	109.7	97.0	69.5	46.5	45.1	32.5	27.0	24.2	22.2	17.4	17.5	-86.590038
104	Pelham - Throgs Neck	113.2	108.2	84.6	54.9	37.4	32.8	27.6	16.1	17.2	17.8	18.5	16.9	-85.070671
105	Crotona -Tremont	135.5	111.9	90.5	61.4	40.3	36.5	28.4	22.8	19.0	17.5	15.1	13.4	-90.110701

neighbourhoods_change = neighbourhoods_five.filter(['geo_area_id', 'Relative Change'], axis=1)
neighbourhoods_change = neighbourhoods_change.reset_index(level=[0,1])
neighbourhoods_change.head()

time_period	geo_area_id	geo_area_name	Relative Change
0	101	Kingsbridge - Riverdale	-87.482615
1	102	Northeast Bronx	-88.240132
2	103	Fordham - Bronx Pk	-86.590038
3	104	Pelham - Throgs Neck	-85.070671
4	105	Crotona -Tremont	-90.110701

fig = px.choropleth_mapbox(neighbourhoods_change,
                           geojson=geojson,
                           locations='geo_area_id',
                           featureidkey='properties.uhfcode',
                           color='Relative Change',
                           hover_data=['geo_area_name'],
                           center = {'lat': 40.73, 'lon': -73.98},
                           zoom=9,
                           mapbox_style='carto-positron',
                          title='Relative Change in Elevated BLL Rate per 1000 Tested, 2005-2016')

fig.update_layout(height=700)
fig.show()

Neighbourhood Relative Change Choropleth Map
The above map displays the relative change in elevated BLL rates per 1000 tested for each neighbourhood between 2005 and 2016. Staten Island’s neighbourhoods had a comparatively low relative change - however, as seen on the previous map they also had low elevated BLL rates in 2005. Greenpoint has the lowest relative change between 2005 and 2016 elevated BLL rate per 1000 tested. It is worth noting tha Greenpoint also had the second highest rate in 2005.

Conclusions#

My hypothesis was conistent regarding the falling number of elevated BLL in children under six in NYC over the years. This was true at all three measured concentrations used to identify elevated BLL rates in children. Due to the low frequency of elevated BLL rates at the 10µg/dL and 15µg/dL concentrations citywide, my analysis dealt mostly with 5µg/dL concentration.
In line with my hypothesis, there were areas that had higher rates than others acoss the city. Brooklyn consistently had the the highest rate per 1000 tested at the 5µg/dL concentration over the yers, although all five of the boroughs saw some convergence to lower rates over time. This was similarly reflected in the breakdown by neighbourhood areas.
I hypothesised that there would be larger relative changes in the 2005 and 2016 elevated BLL rates for areas that had high concentrations - however, in light of the low frequency of these more severe cases I chose to use 2005 elevated BLL rates per 1000 tested as a measure of severity. Notably, there did not appear to be a correlation between the 2005 rates and the relative change in either boroughs or neighbourhoods. Manhattan had the highest relative change, despite having the median 2005 rate per 1000 tested. In a similar vein, Greenpoint had the second highest rate in 2005, and the lowest relative change. This may speak to socio-economic and geographic factors that influence the speed of infrastructure and regulation change/compliance - government priorities, physical or financial barriers to changing infrastructure in areas with high lead exposure risks and individuals’ abilities to guard their children against lead exposure.