Class 2: Manipulating and combining data#
How did the homework go?
Various notes#
Written responses
Instructions are specific
Data cleaning thread - we’ll talk more about it shortly
Feeling overwhlemed?#
Reminder that learning to code is like learning a spoken language. It’s not obvious, and people will pick it up at different speeds at different spots. Try:
Taking notes in the lecture notebooks
Using another Python/pandas learning resource
Hear things explained another way
Ask in Ed Discussions if others have recommendations
-
Otherwise, trying to do two steps in your head:
Figuring out the logic
Figuring out the syntax
Small example of comment-driven development:
# find valid ZIP codes
# filter the DataFrame to only invalid ZIP codes
Data cleaning#
Data Cleansing is a process of removing or fixing incorrect, malformed, incomplete, duplicate, or corrupted data
https://hevodata.com/learn/data-cleansing-a-simplified-guide/
Things to check for#
From my workshop on data cleaning:
Missing data
Empty values
Bad (junk) values
Duplicates
Mismatched types/formatting
Categorical data
Unique values (cardinality)
Value counts
Continuous values
Ranges
Spread (distribution)
Notes:
“Values” in this case can be a single cell (in the spreadsheet sense) or a whole row
“Missing” or “duplicates” can be columns (Series), tables (DataFrames), rows, or cells
“Categorical data” have a fixed set of values
This isn’t everything you can check for, but should cover most things
Data cleaning mnemonic#
Empty
Bad
Unique
Spread
Today’s goal: Which Community Districts have the most 311 requests? Why might that be?#
What’s a Community District?#
59 local governance districts each run by an appointed Community Board
Community boards advise on land use and zoning, participate in the city budget process, and address service delivery in their district.
Community boards are each composed of up to 50 volunteer members appointed by the local borough president, half from nominations by the local City Council members.
Setup#
import pandas as pd
# Display more rows and columns in the DataFrames
pd.options.display.max_columns = 100
pd.options.display.max_rows = 100
Read our cleaned 311 Service Requests dataset#
url = "https://storage.googleapis.com/python-public-policy2/data/311_requests_2018-19_sample_clean.csv.zip"
requests = pd.read_csv(url)
/var/folders/kr/nx0m1j811kz5vy8c87ffchzr0000gn/T/ipykernel_94558/610958175.py:2: DtypeWarning: Columns (8,20,31,34) have mixed types. Specify dtype option on import or set low_memory=False.
requests = pd.read_csv(url)
Dealing with dtypes#
More data cleaning!
DtypeWarning: Columns (8,20,31,34) have mixed types.
requests.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 499958 entries, 0 to 499957
Data columns (total 41 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unique Key 499958 non-null int64
1 Created Date 499958 non-null object
2 Closed Date 476140 non-null object
3 Agency 499958 non-null object
4 Agency Name 499958 non-null object
5 Complaint Type 499958 non-null object
6 Descriptor 492496 non-null object
7 Location Type 392573 non-null object
8 Incident Zip 480394 non-null object
9 Incident Address 434529 non-null object
10 Street Name 434504 non-null object
11 Cross Street 1 300825 non-null object
12 Cross Street 2 299624 non-null object
13 Intersection Street 1 107377 non-null object
14 Intersection Street 2 107042 non-null object
15 Address Type 451006 non-null object
16 City 476632 non-null object
17 Landmark 32516 non-null object
18 Facility Type 134918 non-null object
19 Status 499958 non-null object
20 Due Date 171534 non-null object
21 Resolution Description 457354 non-null object
22 Resolution Action Updated Date 488788 non-null object
23 Community Board 499958 non-null object
24 BBL 407338 non-null float64
25 Borough 499958 non-null object
26 X Coordinate (State Plane) 470815 non-null float64
27 Y Coordinate (State Plane) 470815 non-null float64
28 Open Data Channel Type 499958 non-null object
29 Park Facility Name 499931 non-null object
30 Park Borough 499958 non-null object
31 Vehicle Type 37 non-null object
32 Taxi Company Borough 403 non-null object
33 Taxi Pick Up Location 4474 non-null object
34 Bridge Highway Name 696 non-null object
35 Bridge Highway Direction 765 non-null object
36 Road Ramp 759 non-null object
37 Bridge Highway Segment 1007 non-null object
38 Latitude 470815 non-null float64
39 Longitude 470815 non-null float64
40 Location 470815 non-null object
dtypes: float64(5), int64(1), object(35)
memory usage: 156.4+ MB
list(requests["Incident Zip"].unique())
['11235',
'11221',
'11693',
'11216',
'10465',
'11367',
'10459',
'11101',
'11362',
'10014',
'11234',
'11436',
'10305',
'10467',
'11208',
'10451',
'11419',
'11237',
'11220',
'10469',
'11385',
'10470',
'11694',
'10036',
nan,
'10473',
'11435',
'10040',
'10472',
'11225',
'10019',
'11434',
'11226',
'10010',
'11211',
'11421',
'10026',
'10013',
'11423',
'10002',
'10453',
'11213',
'11104',
'11249',
'11361',
'11233',
'11224',
'11374',
'10025',
'10022',
'11214',
'11209',
'11366',
'10304',
'10027',
'11378',
'11206',
'10021',
'11364',
'10065',
'10456',
'10314',
'10312',
'11212',
'11379',
'10462',
'11231',
'10460',
'11416',
'10001',
'11357',
'11413',
'11210',
'11217',
'11223',
'11417',
'11418',
'11218',
'11230',
'11207',
'11691',
'10468',
'10007',
'10310',
'10306',
'11103',
'11105',
'11433',
'11203',
'10307',
'11229',
'11372',
'10032',
'11420',
'10017',
'10301',
'11368',
'11201',
'11365',
'11422',
'10452',
'11377',
'10029',
'10003',
'10075',
'11222',
'10128',
'11415',
'11204',
'10030',
'11432',
'10308',
'11102',
'10016',
'10463',
'11412',
'10011',
'11106',
'10457',
'11375',
'11356',
'11228',
'11369',
'10458',
'11411',
'11215',
'10309',
'11358',
'11355',
'11219',
'10031',
'10303',
'11232',
'10302',
'11238',
'10005',
'11429',
'11205',
'10023',
'11373',
'10033',
'10039',
'10028',
'11363',
'11354',
'11692',
'10455',
'11370',
'10035',
'10012',
'10024',
'10009',
'10034',
'11001',
'10466',
'11427',
'11004',
'10454',
'11414',
'11360',
'10461',
'10018',
'10006',
'11236',
'11428',
'10474',
'10471',
'10037',
'10475',
'11430',
'10119',
'10038',
'11426',
'11239',
'10151',
'10120',
'10112',
'10168',
'10004',
'10464',
'18773-9640',
'10282',
'11109',
'10280',
'07114',
'11040',
'10000',
'07090',
'10020',
'00083',
'10044',
'10069',
'10118',
'07102',
'11359',
'11005',
'10169',
'11697',
'10115',
11226.0,
11236.0,
10466.0,
11210.0,
10452.0,
10457.0,
10304.0,
11416.0,
11205.0,
11214.0,
11692.0,
11101.0,
11201.0,
10470.0,
11375.0,
10305.0,
11420.0,
11421.0,
11419.0,
11412.0,
10309.0,
11385.0,
11379.0,
11356.0,
10021.0,
11417.0,
11377.0,
11423.0,
10001.0,
10469.0,
10463.0,
11372.0,
11422.0,
10017.0,
11368.0,
11213.0,
11220.0,
10312.0,
10025.0,
11357.0,
11230.0,
10456.0,
11104.0,
11232.0,
11208.0,
11206.0,
11229.0,
10465.0,
10472.0,
10039.0,
10016.0,
10009.0,
11355.0,
11211.0,
10040.0,
11366.0,
11234.0,
10013.0,
11207.0,
10032.0,
10036.0,
10029.0,
11204.0,
11435.0,
11373.0,
10007.0,
11216.0,
10306.0,
11378.0,
11414.0,
10314.0,
10128.0,
10027.0,
11221.0,
10024.0,
11364.0,
10468.0,
11233.0,
10014.0,
11238.0,
11374.0,
11105.0,
11219.0,
11106.0,
10003.0,
10026.0,
11411.0,
11223.0,
11693.0,
11365.0,
11209.0,
11103.0,
11361.0,
11427.0,
10467.0,
11203.0,
11102.0,
10455.0,
10454.0,
11225.0,
11212.0,
11434.0,
10461.0,
11369.0,
11432.0,
10031.0,
11217.0,
11235.0,
10458.0,
10453.0,
11215.0,
11367.0,
11231.0,
11413.0,
11358.0,
11694.0,
10462.0,
11218.0,
11430.0,
10473.0,
10011.0,
10460.0,
10019.0,
10310.0,
10037.0,
10451.0,
10301.0,
10010.0,
11354.0,
11222.0,
11691.0,
10034.0,
10308.0,
10012.0,
11237.0,
10801.0,
10030.0,
11428.0,
11418.0,
10033.0,
10459.0,
11429.0,
10065.0,
10022.0,
10005.0,
10002.0,
10035.0,
11370.0,
10004.0,
11415.0,
11426.0,
11362.0,
10006.0,
11360.0,
10038.0,
11249.0,
11001.0,
11224.0,
10075.0,
10018.0,
10302.0,
11436.0,
11433.0,
10474.0,
10023.0,
11004.0,
10028.0,
11228.0,
10475.0,
10069.0,
11363.0,
10303.0,
10282.0,
10307.0,
10103.0,
10280.0,
10471.0,
10271.0,
10000.0,
10464.0,
10107.0,
7078.0,
10044.0,
10020.0,
56303.0,
10119.0,
11040.0,
11386.0,
10952.0,
6811.0,
11239.0,
10121.0,
37214.0,
10538.0,
10112.0,
10279.0,
11109.0,
11371.0,
18017.0,
7115.0,
77036.0,
7114.0,
'10123',
'11030',
'10801',
'92626-1902',
'07302',
'18773',
'12222',
'10710',
'10103',
'07057',
'10162',
'11582',
'10281',
'10271',
'10107',
'HARRISBURG',
'11735',
'07305',
'N5X3A6',
'11746',
'11371',
'23450',
11580.0,
10591.0,
11005.0,
19034.0,
11596.0,
11779.0,
7621.0,
11021.0,
11241.0,
10169.0,
100000.0,
11242.0,
10151.0,
11697.0,
7086.0,
10177.0,
10118.0,
10105.0,
10152.0,
10168.0,
7093.0,
10917.0,
10110.0,
10153.0,
10178.0,
11570.0,
10601.0,
10704.0,
7424.0,
10281.0,
10158.0,
'10158',
'10172',
'10179',
'IDK',
'11801',
'10601',
'11590',
'10155',
'11202',
'1801',
'11581',
11359.0,
11758.0,
10278.0,
43017.0,
10154.0,
11553.0,
10162.0,
11695.0,
10041.0,
11741.0,
98335.0,
14814.0,
10111.0,
'12345',
'11572',
'11520',
'14614-195',
'10121',
'10105',
'10701',
0.0,
17106.0,
979113.0,
10120.0,
12345.0,
11030.0,
11797.0,
100.0,
11710.0,
33624.0,
8682.0,
11747.0,
1757.0,
11561.0,
7304.0,
6851.0,
11590.0,
94267.0,
10167.0,
11749.0,
11756.0,
10174.0,
10550.0,
89119.0,
14068.0,
11722.0,
11520.0,
6460.0,
32255.0,
10173.0,
10165.0,
11946.0,
'29616-0759',
'07032',
'10278',
'11575',
11963.0,
10106.0,
83.0,
11566.0,
6870.0,
7001.0,
10710.0,
11735.0,
11572.0,
'10165',
'10279',
'11251',
'NJ 07114',
'10106',
'07666',
'11516',
'10177',
'10170',
'43215-1441',
'00000',
'08081',
'10803',
11507.0,
11701.0,
11563.0,
7047.0,
3108.0,
'11021',
'07003',
'10152',
'07029',
'10041',
'31093',
'11735-0230',
89118.0,
11803.0,
11559.0,
11565.0,
7080.0,
12601.0,
10155.0,
10171.0,
7208.0,
11757.0,
11042.0,
'1101',
'10111',
'10173',
'10096',
'07087',
'DID N',
'10956',
10048.0,
10123.0,
10122.0,
11251.0]
ZIP codes look numeric, but aren’t really.
Read the ZIP codes in as strings.
requests2 = pd.read_csv(url, dtype={"Incident Zip": "string"})
/var/folders/kr/nx0m1j811kz5vy8c87ffchzr0000gn/T/ipykernel_94558/1354368569.py:1: DtypeWarning: Columns (20,31,34) have mixed types. Specify dtype option on import or set low_memory=False.
requests2 = pd.read_csv(url, dtype={"Incident Zip": "string"})
We fixed the dtype warning for column 8 (Incident Zip
).
list(requests2["Incident Zip"].unique())
['11235',
'11221',
'11693',
'11216',
'10465',
'11367',
'10459',
'11101',
'11362',
'10014',
'11234',
'11436',
'10305',
'10467',
'11208',
'10451',
'11419',
'11237',
'11220',
'10469',
'11385',
'10470',
'11694',
'10036',
<NA>,
'10473',
'11435',
'10040',
'10472',
'11225',
'10019',
'11434',
'11226',
'10010',
'11211',
'11421',
'10026',
'10013',
'11423',
'10002',
'10453',
'11213',
'11104',
'11249',
'11361',
'11233',
'11224',
'11374',
'10025',
'10022',
'11214',
'11209',
'11366',
'10304',
'10027',
'11378',
'11206',
'10021',
'11364',
'10065',
'10456',
'10314',
'10312',
'11212',
'11379',
'10462',
'11231',
'10460',
'11416',
'10001',
'11357',
'11413',
'11210',
'11217',
'11223',
'11417',
'11418',
'11218',
'11230',
'11207',
'11691',
'10468',
'10007',
'10310',
'10306',
'11103',
'11105',
'11433',
'11203',
'10307',
'11229',
'11372',
'10032',
'11420',
'10017',
'10301',
'11368',
'11201',
'11365',
'11422',
'10452',
'11377',
'10029',
'10003',
'10075',
'11222',
'10128',
'11415',
'11204',
'10030',
'11432',
'10308',
'11102',
'10016',
'10463',
'11412',
'10011',
'11106',
'10457',
'11375',
'11356',
'11228',
'11369',
'10458',
'11411',
'11215',
'10309',
'11358',
'11355',
'11219',
'10031',
'10303',
'11232',
'10302',
'11238',
'10005',
'11429',
'11205',
'10023',
'11373',
'10033',
'10039',
'10028',
'11363',
'11354',
'11692',
'10455',
'11370',
'10035',
'10012',
'10024',
'10009',
'10034',
'11001',
'10466',
'11427',
'11004',
'10454',
'11414',
'11360',
'10461',
'10018',
'10006',
'11236',
'11428',
'10474',
'10471',
'10037',
'10475',
'11430',
'10119',
'10038',
'11426',
'11239',
'10151',
'10120',
'10112',
'10168',
'10004',
'10464',
'18773-9640',
'10282',
'11109',
'10280',
'07114',
'11040',
'10000',
'07090',
'10020',
'00083',
'10044',
'10069',
'10118',
'07102',
'11359',
'11005',
'10169',
'11697',
'10115',
'10801',
'10103',
'10271',
'10107',
'07078',
'56303',
'11386',
'10952',
'06811',
'10121',
'37214',
'10538',
'10279',
'11371',
'18017',
'07115',
'77036',
'10123',
'11030',
'92626-1902',
'07302',
'18773',
'12222',
'10710',
'07057',
'10162',
'11582',
'10281',
'HARRISBURG',
'11735',
'07305',
'N5X3A6',
'11746',
'23450',
'11580',
'10591',
'19034',
'11596',
'11779',
'07621',
'11021',
'11241',
'100000',
'11242',
'07086',
'10177',
'10105',
'10152',
'07093',
'10917',
'10110',
'10153',
'10178',
'11570',
'10601',
'10704',
'07424',
'10158',
'10172',
'10179',
'IDK',
'11801',
'11590',
'10155',
'11202',
'1801',
'11581',
'11758',
'10278',
'43017',
'10154',
'11553',
'11695',
'10041',
'11741',
'98335',
'14814',
'10111',
'12345',
'11572',
'11520',
'14614-195',
'10701',
'00000',
'17106',
'979113',
'11797',
'100',
'11710',
'33624',
'8682',
'11747',
'01757',
'11561',
'07304',
'000000',
'06851',
'94267',
'10167',
'11749',
'11756',
'10174',
'10550',
'89119',
'14068',
'11722',
'06460',
'32255',
'10173',
'10165',
'11946',
'29616-0759',
'07032',
'11575',
'11963',
'10106',
'11566',
'06870',
'07001',
'11251',
'NJ 07114',
'07666',
'11516',
'10170',
'43215-1441',
'08081',
'10803',
'11507',
'11701',
'11563',
'07047',
'03108',
'07003',
'07029',
'31093',
'11735-0230',
'89118',
'11803',
'11559',
'11565',
'07080',
'12601',
'10171',
'07208',
'11757',
'11042',
'1101',
'10096',
'07087',
'DID N',
'10956',
'10048',
'10122']
Find invalid ZIP codes#
Use a regular expression (regex) to find strings that match a pattern:
^\d{5}(?:-\d{4})?$
│ │ │ │ │└─ end of string
│ │ │ │ └─ optional
│ │ │ └─ capture group
│ │ └─ count
│ └─ numeric/digit character
└─ start of string
regex101 is useful for testing them.
# find valid ZIP codes
valid_zips = requests2["Incident Zip"].str.contains(r"^\d{5}(?:-\d{4})?$")
# filter the DataFrame to only invalid ZIP codes
invalid_zips = valid_zips == False
requests_with_invalid_zips = requests2[invalid_zips]
requests_with_invalid_zips["Incident Zip"]
55017 HARRISBURG
58100 N5X3A6
80798 100000
120304 IDK
123304 1801
173518 14614-195
192034 979113
201463 100
207158 8682
216745 000000
325071 NJ 07114
425985 1101
441166 DID N
Name: Incident Zip, dtype: string
Clear any invalid ZIP codes:
requests2.loc[invalid_zips, "Incident Zip"] = None
Additonal data cleaning tips:
Hard part is finding what needs to be done
Will be specific to your use case
Document what you did, since it will affect your results
In-class exercise#
Skipping to save time. While not graded, I encourage you to do it on your own, as it will help prepare you for the Final Project.
View the contents of the community_board
column in our 311 data#
requests["Community Board"].unique()
array(['15 BROOKLYN', '03 BROOKLYN', '14 QUEENS', '10 BRONX', '08 QUEENS',
'02 BRONX', '01 QUEENS', '11 QUEENS', '02 MANHATTAN',
'18 BROOKLYN', '12 QUEENS', '01 STATEN ISLAND', '12 BRONX',
'05 BROOKLYN', '01 BRONX', '09 QUEENS', '04 BROOKLYN',
'10 BROOKLYN', '02 STATEN ISLAND', '05 QUEENS', '04 MANHATTAN',
'11 BRONX', 'Unspecified BROOKLYN', '09 BRONX', '12 MANHATTAN',
'09 BROOKLYN', '14 BROOKLYN', '06 MANHATTAN', '10 MANHATTAN',
'Unspecified QUEENS', '01 MANHATTAN', '03 MANHATTAN', '05 BRONX',
'08 BROOKLYN', '02 QUEENS', '12 BROOKLYN', '01 BROOKLYN',
'16 BROOKLYN', '13 BROOKLYN', '06 QUEENS', '07 MANHATTAN',
'11 BROOKLYN', 'Unspecified BRONX', '08 MANHATTAN',
'03 STATEN ISLAND', '06 BROOKLYN', '03 BRONX', '05 MANHATTAN',
'07 QUEENS', '13 QUEENS', '17 BROOKLYN', '06 BRONX', '02 BROOKLYN',
'10 QUEENS', 'Unspecified MANHATTAN', '03 QUEENS', '04 BRONX',
'11 MANHATTAN', '08 BRONX', '07 BROOKLYN', '07 BRONX',
'0 Unspecified', '09 MANHATTAN', '04 QUEENS', '83 QUEENS',
'80 QUEENS', 'Unspecified STATEN ISLAND', '55 BROOKLYN',
'82 QUEENS', '64 MANHATTAN', '28 BRONX', '95 STATEN ISLAND',
'81 QUEENS', '27 BRONX', '26 BRONX', '56 BROOKLYN', '84 QUEENS'],
dtype=object)
Get the count of 311 requests per Community District#
cb_counts = requests.groupby("Community Board").size().reset_index(name="num_311_requests")
cb_counts = cb_counts.sort_values("num_311_requests", ascending=False)
cb_counts
Community Board | num_311_requests | |
---|---|---|
50 | 12 MANHATTAN | 14110 |
23 | 05 QUEENS | 12487 |
51 | 12 QUEENS | 12228 |
2 | 01 BROOKLYN | 11863 |
12 | 03 BROOKLYN | 11615 |
5 | 01 STATEN ISLAND | 11438 |
31 | 07 QUEENS | 11210 |
21 | 05 BROOKLYN | 10862 |
16 | 04 BRONX | 10628 |
4 | 01 QUEENS | 10410 |
58 | 17 BROOKLYN | 10208 |
54 | 14 BROOKLYN | 10179 |
28 | 07 BRONX | 9841 |
59 | 18 BROOKLYN | 9717 |
39 | 09 QUEENS | 9599 |
53 | 13 QUEENS | 9459 |
48 | 12 BRONX | 9412 |
45 | 11 BROOKLYN | 9309 |
20 | 05 BRONX | 9094 |
13 | 03 MANHATTAN | 8905 |
35 | 08 QUEENS | 8661 |
17 | 04 BROOKLYN | 8639 |
42 | 10 MANHATTAN | 8592 |
15 | 03 STATEN ISLAND | 8524 |
56 | 15 BROOKLYN | 8508 |
43 | 10 QUEENS | 8333 |
49 | 12 BROOKLYN | 8214 |
30 | 07 MANHATTAN | 8141 |
36 | 09 BRONX | 8092 |
41 | 10 BROOKLYN | 7808 |
33 | 08 BROOKLYN | 7797 |
7 | 02 BROOKLYN | 7747 |
37 | 09 BROOKLYN | 7571 |
25 | 06 BROOKLYN | 7373 |
10 | 02 STATEN ISLAND | 7059 |
0 | 0 Unspecified | 6882 |
14 | 03 QUEENS | 6799 |
38 | 09 MANHATTAN | 6792 |
73 | Unspecified BROOKLYN | 6771 |
18 | 04 MANHATTAN | 6765 |
22 | 05 MANHATTAN | 6599 |
34 | 08 MANHATTAN | 6579 |
8 | 02 MANHATTAN | 6514 |
44 | 11 BRONX | 6448 |
74 | Unspecified MANHATTAN | 6427 |
29 | 07 BROOKLYN | 6364 |
9 | 02 QUEENS | 6142 |
19 | 04 QUEENS | 5824 |
32 | 08 BRONX | 5753 |
47 | 11 QUEENS | 5748 |
24 | 06 BRONX | 5746 |
40 | 10 BRONX | 5719 |
46 | 11 MANHATTAN | 5714 |
72 | Unspecified BRONX | 5349 |
55 | 14 QUEENS | 5284 |
11 | 03 BRONX | 5262 |
26 | 06 MANHATTAN | 5158 |
57 | 16 BROOKLYN | 5100 |
1 | 01 BRONX | 4915 |
75 | Unspecified QUEENS | 4870 |
27 | 06 QUEENS | 4607 |
52 | 13 BROOKLYN | 3840 |
3 | 01 MANHATTAN | 3623 |
6 | 02 BRONX | 3470 |
76 | Unspecified STATEN ISLAND | 541 |
69 | 83 QUEENS | 193 |
65 | 64 MANHATTAN | 126 |
66 | 80 QUEENS | 102 |
68 | 82 QUEENS | 72 |
67 | 81 QUEENS | 54 |
63 | 55 BROOKLYN | 53 |
62 | 28 BRONX | 26 |
60 | 26 BRONX | 25 |
71 | 95 STATEN ISLAND | 21 |
70 | 84 QUEENS | 18 |
61 | 27 BRONX | 17 |
64 | 56 BROOKLYN | 13 |
Research Question: What may account for the variance in count of requests per community district?#
Hypothesis: Population size may help explain the variance.#
We can combine the counts per community district dataset with population data for each community district.
We’ll use pandas’ .merge()
, comparable to:
In general, called “record linkage” or “entity resolution”.
Let’s load the population dataset and check out its contents#
Data source for population by Community District
population = pd.read_csv("https://data.cityofnewyork.us/api/views/xi7c-iiu2/rows.csv")
population.head()
Borough | CD Number | CD Name | 1970 Population | 1980 Population | 1990 Population | 2000 Population | 2010 Population | |
---|---|---|---|---|---|---|---|---|
0 | Bronx | 1 | Melrose, Mott Haven, Port Morris | 138557 | 78441 | 77214 | 82159 | 91497 |
1 | Bronx | 2 | Hunts Point, Longwood | 99493 | 34399 | 39443 | 46824 | 52246 |
2 | Bronx | 3 | Morrisania, Crotona Park East | 150636 | 53635 | 57162 | 68574 | 79762 |
3 | Bronx | 4 | Highbridge, Concourse Village | 144207 | 114312 | 119962 | 139563 | 146441 |
4 | Bronx | 5 | University Hts., Fordham, Mt. Hope | 121807 | 107995 | 118435 | 128313 | 128200 |
In order to join the two dataframes, we need to create a common ID in each.#
BORO CODE
(a.k.a. BoroCode
, borocd
, and boro_cd
) is a commonly-used a unique ID for community districts. Let’s create functions that create that unique ID in our datasets.
BoroCD is a 3 digit integer that captures the borough and district number. The borough is represented by the first digit. The district number is padded with zeros so it’s always two digits long.
Boroughs are recoded into the following numbers:
1: Manhattan
2: Bronx
3: Brooklyn
4: Queens
5: Staten Island
Ex:
Manhattan 12 –> 112
Brooklyn 6 –> 306
First, let’s create a borocd
column in cb_counts
dataframe#
cb_counts.head()
Community Board | num_311_requests | |
---|---|---|
50 | 12 MANHATTAN | 14110 |
23 | 05 QUEENS | 12487 |
51 | 12 QUEENS | 12228 |
2 | 01 BROOKLYN | 11863 |
12 | 03 BROOKLYN | 11615 |
apply()
can be used for transforming data with a custom function. How does it work?
def my_function(row):
# do stuff
return some_value
new_values = dataframe.apply(my_function, axis=1)
While pandas generally operates on an entire column at once, apply()
is similar to working with CSVs in pure Python in that you are operating row by row.
Let’s create a function called recode_borocd_counts
that takes a row
and converts the Community Board
value into a borocd
value.
def recode_borocd_counts(row):
if "MANHATTAN" in row["Community Board"]:
return "1" + row["Community Board"][0:2]
# [0:2] provides the first 2 characters, i.e. characters at indexes 0 and 1.
# you could also use [:2] without the zero.
elif "BRONX" in row["Community Board"]:
return "2" + row["Community Board"][0:2]
elif "BROOKLYN" in row["Community Board"]:
return "3" + row["Community Board"][0:2]
elif "QUEENS" in row["Community Board"]:
return "4" + row["Community Board"][0:2]
elif "STATEN ISLAND" in row["Community Board"]:
return "5" + row["Community Board"][0:2]
else:
return "Invalid BoroCD"
Let’s test out that function in isolation. We’ll grab one of the rows and pass it into the function.
sample_row = cb_counts.iloc[0]
sample_row
Community Board 12 MANHATTAN
num_311_requests 14110
Name: 50, dtype: object
recode_borocd_counts(sample_row)
'112'
Now we use apply()
to do that across all the rows.
cb_counts["boro_cd"] = cb_counts.apply(recode_borocd_counts, axis=1)
apply()
(the way we’re using it) takes a function and runs it against each row of a DataFrame, returning the results as a Seriesaxis=1
specifies that you want to apply the function across the rows instead of columnscb_counts['borocd'] = …
creates a new column in the DataFrame calledborocd
cb_counts
Community Board | num_311_requests | boro_cd | |
---|---|---|---|
50 | 12 MANHATTAN | 14110 | 112 |
23 | 05 QUEENS | 12487 | 405 |
51 | 12 QUEENS | 12228 | 412 |
2 | 01 BROOKLYN | 11863 | 301 |
12 | 03 BROOKLYN | 11615 | 303 |
5 | 01 STATEN ISLAND | 11438 | 501 |
31 | 07 QUEENS | 11210 | 407 |
21 | 05 BROOKLYN | 10862 | 305 |
16 | 04 BRONX | 10628 | 204 |
4 | 01 QUEENS | 10410 | 401 |
58 | 17 BROOKLYN | 10208 | 317 |
54 | 14 BROOKLYN | 10179 | 314 |
28 | 07 BRONX | 9841 | 207 |
59 | 18 BROOKLYN | 9717 | 318 |
39 | 09 QUEENS | 9599 | 409 |
53 | 13 QUEENS | 9459 | 413 |
48 | 12 BRONX | 9412 | 212 |
45 | 11 BROOKLYN | 9309 | 311 |
20 | 05 BRONX | 9094 | 205 |
13 | 03 MANHATTAN | 8905 | 103 |
35 | 08 QUEENS | 8661 | 408 |
17 | 04 BROOKLYN | 8639 | 304 |
42 | 10 MANHATTAN | 8592 | 110 |
15 | 03 STATEN ISLAND | 8524 | 503 |
56 | 15 BROOKLYN | 8508 | 315 |
43 | 10 QUEENS | 8333 | 410 |
49 | 12 BROOKLYN | 8214 | 312 |
30 | 07 MANHATTAN | 8141 | 107 |
36 | 09 BRONX | 8092 | 209 |
41 | 10 BROOKLYN | 7808 | 310 |
33 | 08 BROOKLYN | 7797 | 308 |
7 | 02 BROOKLYN | 7747 | 302 |
37 | 09 BROOKLYN | 7571 | 309 |
25 | 06 BROOKLYN | 7373 | 306 |
10 | 02 STATEN ISLAND | 7059 | 502 |
0 | 0 Unspecified | 6882 | Invalid BoroCD |
14 | 03 QUEENS | 6799 | 403 |
38 | 09 MANHATTAN | 6792 | 109 |
73 | Unspecified BROOKLYN | 6771 | 3Un |
18 | 04 MANHATTAN | 6765 | 104 |
22 | 05 MANHATTAN | 6599 | 105 |
34 | 08 MANHATTAN | 6579 | 108 |
8 | 02 MANHATTAN | 6514 | 102 |
44 | 11 BRONX | 6448 | 211 |
74 | Unspecified MANHATTAN | 6427 | 1Un |
29 | 07 BROOKLYN | 6364 | 307 |
9 | 02 QUEENS | 6142 | 402 |
19 | 04 QUEENS | 5824 | 404 |
32 | 08 BRONX | 5753 | 208 |
47 | 11 QUEENS | 5748 | 411 |
24 | 06 BRONX | 5746 | 206 |
40 | 10 BRONX | 5719 | 210 |
46 | 11 MANHATTAN | 5714 | 111 |
72 | Unspecified BRONX | 5349 | 2Un |
55 | 14 QUEENS | 5284 | 414 |
11 | 03 BRONX | 5262 | 203 |
26 | 06 MANHATTAN | 5158 | 106 |
57 | 16 BROOKLYN | 5100 | 316 |
1 | 01 BRONX | 4915 | 201 |
75 | Unspecified QUEENS | 4870 | 4Un |
27 | 06 QUEENS | 4607 | 406 |
52 | 13 BROOKLYN | 3840 | 313 |
3 | 01 MANHATTAN | 3623 | 101 |
6 | 02 BRONX | 3470 | 202 |
76 | Unspecified STATEN ISLAND | 541 | 5Un |
69 | 83 QUEENS | 193 | 483 |
65 | 64 MANHATTAN | 126 | 164 |
66 | 80 QUEENS | 102 | 480 |
68 | 82 QUEENS | 72 | 482 |
67 | 81 QUEENS | 54 | 481 |
63 | 55 BROOKLYN | 53 | 355 |
62 | 28 BRONX | 26 | 228 |
60 | 26 BRONX | 25 | 226 |
71 | 95 STATEN ISLAND | 21 | 595 |
70 | 84 QUEENS | 18 | 484 |
61 | 27 BRONX | 17 | 227 |
64 | 56 BROOKLYN | 13 | 356 |
Uh oh, there are some unexpected Unspecified
values in here - how can we get around them?
Let’s only recode records that don’t start with “U”.
def recode_borocd_counts(row):
if "MANHATTAN" in row["Community Board"] and row["Community Board"][0] != "U":
return "1" + row["Community Board"][0:2]
elif "BRONX" in row["Community Board"] and row["Community Board"][0] != "U":
return "2" + row["Community Board"][0:2]
elif "BROOKLYN" in row["Community Board"] and row["Community Board"][0] != "U":
return "3" + row["Community Board"][0:2]
elif "QUEENS" in row["Community Board"] and row["Community Board"][0] != "U":
return "4" + row["Community Board"][0:2]
elif "STATEN ISLAND" in row["Community Board"] and row["Community Board"][0] != "U":
return "5" + row["Community Board"][0:2]
else:
return "Invalid BoroCD"
cb_counts["boro_cd"] = cb_counts.apply(recode_borocd_counts, axis=1)
cb_counts
Community Board | num_311_requests | boro_cd | |
---|---|---|---|
50 | 12 MANHATTAN | 14110 | 112 |
23 | 05 QUEENS | 12487 | 405 |
51 | 12 QUEENS | 12228 | 412 |
2 | 01 BROOKLYN | 11863 | 301 |
12 | 03 BROOKLYN | 11615 | 303 |
5 | 01 STATEN ISLAND | 11438 | 501 |
31 | 07 QUEENS | 11210 | 407 |
21 | 05 BROOKLYN | 10862 | 305 |
16 | 04 BRONX | 10628 | 204 |
4 | 01 QUEENS | 10410 | 401 |
58 | 17 BROOKLYN | 10208 | 317 |
54 | 14 BROOKLYN | 10179 | 314 |
28 | 07 BRONX | 9841 | 207 |
59 | 18 BROOKLYN | 9717 | 318 |
39 | 09 QUEENS | 9599 | 409 |
53 | 13 QUEENS | 9459 | 413 |
48 | 12 BRONX | 9412 | 212 |
45 | 11 BROOKLYN | 9309 | 311 |
20 | 05 BRONX | 9094 | 205 |
13 | 03 MANHATTAN | 8905 | 103 |
35 | 08 QUEENS | 8661 | 408 |
17 | 04 BROOKLYN | 8639 | 304 |
42 | 10 MANHATTAN | 8592 | 110 |
15 | 03 STATEN ISLAND | 8524 | 503 |
56 | 15 BROOKLYN | 8508 | 315 |
43 | 10 QUEENS | 8333 | 410 |
49 | 12 BROOKLYN | 8214 | 312 |
30 | 07 MANHATTAN | 8141 | 107 |
36 | 09 BRONX | 8092 | 209 |
41 | 10 BROOKLYN | 7808 | 310 |
33 | 08 BROOKLYN | 7797 | 308 |
7 | 02 BROOKLYN | 7747 | 302 |
37 | 09 BROOKLYN | 7571 | 309 |
25 | 06 BROOKLYN | 7373 | 306 |
10 | 02 STATEN ISLAND | 7059 | 502 |
0 | 0 Unspecified | 6882 | Invalid BoroCD |
14 | 03 QUEENS | 6799 | 403 |
38 | 09 MANHATTAN | 6792 | 109 |
73 | Unspecified BROOKLYN | 6771 | Invalid BoroCD |
18 | 04 MANHATTAN | 6765 | 104 |
22 | 05 MANHATTAN | 6599 | 105 |
34 | 08 MANHATTAN | 6579 | 108 |
8 | 02 MANHATTAN | 6514 | 102 |
44 | 11 BRONX | 6448 | 211 |
74 | Unspecified MANHATTAN | 6427 | Invalid BoroCD |
29 | 07 BROOKLYN | 6364 | 307 |
9 | 02 QUEENS | 6142 | 402 |
19 | 04 QUEENS | 5824 | 404 |
32 | 08 BRONX | 5753 | 208 |
47 | 11 QUEENS | 5748 | 411 |
24 | 06 BRONX | 5746 | 206 |
40 | 10 BRONX | 5719 | 210 |
46 | 11 MANHATTAN | 5714 | 111 |
72 | Unspecified BRONX | 5349 | Invalid BoroCD |
55 | 14 QUEENS | 5284 | 414 |
11 | 03 BRONX | 5262 | 203 |
26 | 06 MANHATTAN | 5158 | 106 |
57 | 16 BROOKLYN | 5100 | 316 |
1 | 01 BRONX | 4915 | 201 |
75 | Unspecified QUEENS | 4870 | Invalid BoroCD |
27 | 06 QUEENS | 4607 | 406 |
52 | 13 BROOKLYN | 3840 | 313 |
3 | 01 MANHATTAN | 3623 | 101 |
6 | 02 BRONX | 3470 | 202 |
76 | Unspecified STATEN ISLAND | 541 | Invalid BoroCD |
69 | 83 QUEENS | 193 | 483 |
65 | 64 MANHATTAN | 126 | 164 |
66 | 80 QUEENS | 102 | 480 |
68 | 82 QUEENS | 72 | 482 |
67 | 81 QUEENS | 54 | 481 |
63 | 55 BROOKLYN | 53 | 355 |
62 | 28 BRONX | 26 | 228 |
60 | 26 BRONX | 25 | 226 |
71 | 95 STATEN ISLAND | 21 | 595 |
70 | 84 QUEENS | 18 | 484 |
61 | 27 BRONX | 17 | 227 |
64 | 56 BROOKLYN | 13 | 356 |
We can make this function easier to read by isolating the logic that applies to all the conditions. This is called “refactoring”.
def recode_borocd_counts(row):
board = row["Community Board"]
# doing a check and then returning from a function early is known as a "guard clause"
if board.startswith("U"):
return "Invalid BoroCD"
num = board[0:2]
if "MANHATTAN" in board:
return "1" + num
elif "BRONX" in board:
return "2" + num
elif "BROOKLYN" in board:
return "3" + num
elif "QUEENS" in board:
return "4" + num
elif "STATEN ISLAND" in board:
return "5" + num
else:
return "Invalid BoroCD"
cb_counts["boro_cd"] = cb_counts.apply(recode_borocd_counts, axis=1)
cb_counts
Community Board | num_311_requests | boro_cd | |
---|---|---|---|
50 | 12 MANHATTAN | 14110 | 112 |
23 | 05 QUEENS | 12487 | 405 |
51 | 12 QUEENS | 12228 | 412 |
2 | 01 BROOKLYN | 11863 | 301 |
12 | 03 BROOKLYN | 11615 | 303 |
5 | 01 STATEN ISLAND | 11438 | 501 |
31 | 07 QUEENS | 11210 | 407 |
21 | 05 BROOKLYN | 10862 | 305 |
16 | 04 BRONX | 10628 | 204 |
4 | 01 QUEENS | 10410 | 401 |
58 | 17 BROOKLYN | 10208 | 317 |
54 | 14 BROOKLYN | 10179 | 314 |
28 | 07 BRONX | 9841 | 207 |
59 | 18 BROOKLYN | 9717 | 318 |
39 | 09 QUEENS | 9599 | 409 |
53 | 13 QUEENS | 9459 | 413 |
48 | 12 BRONX | 9412 | 212 |
45 | 11 BROOKLYN | 9309 | 311 |
20 | 05 BRONX | 9094 | 205 |
13 | 03 MANHATTAN | 8905 | 103 |
35 | 08 QUEENS | 8661 | 408 |
17 | 04 BROOKLYN | 8639 | 304 |
42 | 10 MANHATTAN | 8592 | 110 |
15 | 03 STATEN ISLAND | 8524 | 503 |
56 | 15 BROOKLYN | 8508 | 315 |
43 | 10 QUEENS | 8333 | 410 |
49 | 12 BROOKLYN | 8214 | 312 |
30 | 07 MANHATTAN | 8141 | 107 |
36 | 09 BRONX | 8092 | 209 |
41 | 10 BROOKLYN | 7808 | 310 |
33 | 08 BROOKLYN | 7797 | 308 |
7 | 02 BROOKLYN | 7747 | 302 |
37 | 09 BROOKLYN | 7571 | 309 |
25 | 06 BROOKLYN | 7373 | 306 |
10 | 02 STATEN ISLAND | 7059 | 502 |
0 | 0 Unspecified | 6882 | Invalid BoroCD |
14 | 03 QUEENS | 6799 | 403 |
38 | 09 MANHATTAN | 6792 | 109 |
73 | Unspecified BROOKLYN | 6771 | Invalid BoroCD |
18 | 04 MANHATTAN | 6765 | 104 |
22 | 05 MANHATTAN | 6599 | 105 |
34 | 08 MANHATTAN | 6579 | 108 |
8 | 02 MANHATTAN | 6514 | 102 |
44 | 11 BRONX | 6448 | 211 |
74 | Unspecified MANHATTAN | 6427 | Invalid BoroCD |
29 | 07 BROOKLYN | 6364 | 307 |
9 | 02 QUEENS | 6142 | 402 |
19 | 04 QUEENS | 5824 | 404 |
32 | 08 BRONX | 5753 | 208 |
47 | 11 QUEENS | 5748 | 411 |
24 | 06 BRONX | 5746 | 206 |
40 | 10 BRONX | 5719 | 210 |
46 | 11 MANHATTAN | 5714 | 111 |
72 | Unspecified BRONX | 5349 | Invalid BoroCD |
55 | 14 QUEENS | 5284 | 414 |
11 | 03 BRONX | 5262 | 203 |
26 | 06 MANHATTAN | 5158 | 106 |
57 | 16 BROOKLYN | 5100 | 316 |
1 | 01 BRONX | 4915 | 201 |
75 | Unspecified QUEENS | 4870 | Invalid BoroCD |
27 | 06 QUEENS | 4607 | 406 |
52 | 13 BROOKLYN | 3840 | 313 |
3 | 01 MANHATTAN | 3623 | 101 |
6 | 02 BRONX | 3470 | 202 |
76 | Unspecified STATEN ISLAND | 541 | Invalid BoroCD |
69 | 83 QUEENS | 193 | 483 |
65 | 64 MANHATTAN | 126 | 164 |
66 | 80 QUEENS | 102 | 480 |
68 | 82 QUEENS | 72 | 482 |
67 | 81 QUEENS | 54 | 481 |
63 | 55 BROOKLYN | 53 | 355 |
62 | 28 BRONX | 26 | 228 |
60 | 26 BRONX | 25 | 226 |
71 | 95 STATEN ISLAND | 21 | 595 |
70 | 84 QUEENS | 18 | 484 |
61 | 27 BRONX | 17 | 227 |
64 | 56 BROOKLYN | 13 | 356 |
Next, let’s create the borocd
column in the population dataset#
population.head()
Borough | CD Number | CD Name | 1970 Population | 1980 Population | 1990 Population | 2000 Population | 2010 Population | |
---|---|---|---|---|---|---|---|---|
0 | Bronx | 1 | Melrose, Mott Haven, Port Morris | 138557 | 78441 | 77214 | 82159 | 91497 |
1 | Bronx | 2 | Hunts Point, Longwood | 99493 | 34399 | 39443 | 46824 | 52246 |
2 | Bronx | 3 | Morrisania, Crotona Park East | 150636 | 53635 | 57162 | 68574 | 79762 |
3 | Bronx | 4 | Highbridge, Concourse Village | 144207 | 114312 | 119962 | 139563 | 146441 |
4 | Bronx | 5 | University Hts., Fordham, Mt. Hope | 121807 | 107995 | 118435 | 128313 | 128200 |
population.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 59 entries, 0 to 58
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Borough 59 non-null object
1 CD Number 59 non-null int64
2 CD Name 59 non-null object
3 1970 Population 59 non-null int64
4 1980 Population 59 non-null int64
5 1990 Population 59 non-null int64
6 2000 Population 59 non-null int64
7 2010 Population 59 non-null int64
dtypes: int64(6), object(2)
memory usage: 3.8+ KB
Create a function recode_borocd_pop
that combines and recodes the Borough and CD Number values to create a BoroCD unique ID.
def recode_borocd_pop(row):
if row.Borough == "Manhattan":
return str(100 + row["CD Number"])
elif row.Borough == "Bronx":
return str(200 + row["CD Number"])
elif row.Borough == "Brooklyn":
return str(300 + row["CD Number"])
elif row.Borough == "Queens":
return str(400 + row["CD Number"])
elif row.Borough == "Staten Island":
return str(500 + row["CD Number"])
else:
return "Invalid BoroCD"
This is different than recode_borocd_counts()
because:
The
Borough
andCD Number
are seprate columns in thepopulation
DataFrame, rather than combined in one like the 311 dataWe are working with the
CD Number
as an integer rather than a string
population["borocd"] = population.apply(recode_borocd_pop, axis=1)
population
Borough | CD Number | CD Name | 1970 Population | 1980 Population | 1990 Population | 2000 Population | 2010 Population | borocd | |
---|---|---|---|---|---|---|---|---|---|
0 | Bronx | 1 | Melrose, Mott Haven, Port Morris | 138557 | 78441 | 77214 | 82159 | 91497 | 201 |
1 | Bronx | 2 | Hunts Point, Longwood | 99493 | 34399 | 39443 | 46824 | 52246 | 202 |
2 | Bronx | 3 | Morrisania, Crotona Park East | 150636 | 53635 | 57162 | 68574 | 79762 | 203 |
3 | Bronx | 4 | Highbridge, Concourse Village | 144207 | 114312 | 119962 | 139563 | 146441 | 204 |
4 | Bronx | 5 | University Hts., Fordham, Mt. Hope | 121807 | 107995 | 118435 | 128313 | 128200 | 205 |
5 | Bronx | 6 | East Tremont, Belmont | 114137 | 65016 | 68061 | 75688 | 83268 | 206 |
6 | Bronx | 7 | Bedford Park, Norwood, Fordham | 113764 | 116827 | 128588 | 141411 | 139286 | 207 |
7 | Bronx | 8 | Riverdale, Kingsbridge, Marble Hill | 103543 | 98275 | 97030 | 101332 | 101731 | 208 |
8 | Bronx | 9 | Soundview, Parkchester | 166442 | 167627 | 155970 | 167859 | 172298 | 209 |
9 | Bronx | 10 | Throgs Nk., Co-op City, Pelham Bay | 84948 | 106516 | 108093 | 115948 | 120392 | 210 |
10 | Bronx | 11 | Pelham Pkwy, Morris Park, Laconia | 105980 | 99080 | 97842 | 110706 | 113232 | 211 |
11 | Bronx | 12 | Wakefield, Williamsbridge | 135010 | 128226 | 129620 | 149077 | 152344 | 212 |
12 | Brooklyn | 1 | Williamsburg, Greenpoint | 179390 | 142942 | 155972 | 160338 | 173083 | 301 |
13 | Brooklyn | 2 | Brooklyn Heights, Fort Greene | 110221 | 92732 | 94534 | 98620 | 99617 | 302 |
14 | Brooklyn | 3 | Bedford Stuyvesant | 203380 | 133379 | 138696 | 143867 | 152985 | 303 |
15 | Brooklyn | 4 | Bushwick | 137902 | 92497 | 102572 | 104358 | 112634 | 304 |
16 | Brooklyn | 5 | East New York, Starrett City | 170791 | 154931 | 161350 | 173198 | 182896 | 305 |
17 | Brooklyn | 6 | Park Slope, Carroll Gardens | 138933 | 110228 | 102724 | 104054 | 104709 | 306 |
18 | Brooklyn | 7 | Sunset Park, Windsor Terrace | 111607 | 98567 | 102553 | 120063 | 126230 | 307 |
19 | Brooklyn | 8 | Crown Heights North | 121821 | 88796 | 96400 | 96076 | 96317 | 308 |
20 | Brooklyn | 9 | Crown Heights South, Wingate | 101047 | 96669 | 110715 | 104014 | 98429 | 309 |
21 | Brooklyn | 10 | Bay Ridge, Dyker Heights | 129822 | 118187 | 110612 | 122542 | 124491 | 310 |
22 | Brooklyn | 11 | Bensonhurst, Bath Beach | 170119 | 155072 | 149994 | 172129 | 181981 | 311 |
23 | Brooklyn | 12 | Borough Park, Ocean Parkway | 166301 | 155899 | 160018 | 185046 | 191382 | 312 |
24 | Brooklyn | 13 | Coney Island, Brighton Beach | 97750 | 100030 | 102596 | 106120 | 104278 | 313 |
25 | Brooklyn | 14 | Flatbush, Midwood | 137041 | 143859 | 159825 | 168806 | 160664 | 314 |
26 | Brooklyn | 15 | Sheepshead Bay, Gerritsen Beach | 164815 | 149572 | 143477 | 160319 | 159650 | 315 |
27 | Brooklyn | 16 | Brownsville, Ocean Hill | 122589 | 73801 | 84923 | 85343 | 86468 | 316 |
28 | Brooklyn | 17 | East Flatbush, Rugby, Farragut | 149496 | 154596 | 161261 | 165753 | 155252 | 317 |
29 | Brooklyn | 18 | Canarsie, Flatlands | 188643 | 169092 | 162428 | 194653 | 193543 | 318 |
30 | Manhattan | 1 | Battery Park City, Tribeca | 7706 | 15918 | 25366 | 34420 | 60978 | 101 |
31 | Manhattan | 2 | Greenwich Village, Soho | 84337 | 87069 | 94105 | 93119 | 90016 | 102 |
32 | Manhattan | 3 | Lower East Side, Chinatown | 181845 | 154848 | 161617 | 164407 | 163277 | 103 |
33 | Manhattan | 4 | Chelsea, Clinton | 83601 | 82164 | 84431 | 87479 | 103245 | 104 |
34 | Manhattan | 5 | Midtown Business District | 31076 | 39544 | 43507 | 44028 | 51673 | 105 |
35 | Manhattan | 6 | Stuyvesant Town, Turtle Bay | 122465 | 127554 | 133748 | 136152 | 142745 | 106 |
36 | Manhattan | 7 | West Side, Upper West Side | 212422 | 206669 | 210993 | 207699 | 209084 | 107 |
37 | Manhattan | 8 | Upper East Side | 200851 | 204305 | 210880 | 217063 | 219920 | 108 |
38 | Manhattan | 9 | Manhattanville, Hamilton Heights | 113606 | 103038 | 106978 | 111724 | 110193 | 109 |
39 | Manhattan | 10 | Central Harlem | 159267 | 105641 | 99519 | 107109 | 115723 | 110 |
40 | Manhattan | 11 | East Harlem | 154662 | 114569 | 110508 | 117743 | 120511 | 111 |
41 | Manhattan | 12 | Washington Heights, Inwood | 180561 | 179941 | 198192 | 208414 | 190020 | 112 |
42 | Queens | 1 | Astoria, Long Island City | 185925 | 185198 | 188549 | 211220 | 191105 | 401 |
43 | Queens | 2 | Sunnyside, Woodside | 95073 | 88927 | 94845 | 109920 | 113200 | 402 |
44 | Queens | 3 | Jackson Heights, North Corona | 123635 | 122090 | 128924 | 169083 | 171576 | 403 |
45 | Queens | 4 | Elmhurst, South Corona | 108233 | 118430 | 137023 | 167005 | 172598 | 404 |
46 | Queens | 5 | Ridgewood, Glendale, Maspeth | 161022 | 150142 | 149126 | 165911 | 169190 | 405 |
47 | Queens | 6 | Forest Hills, Rego Park | 120429 | 112245 | 106996 | 115967 | 113257 | 406 |
48 | Queens | 7 | Flushing, Bay Terrace | 207589 | 204785 | 220508 | 242952 | 247354 | 407 |
49 | Queens | 8 | Fresh Meadows, Briarwood | 142468 | 125312 | 132101 | 146594 | 151107 | 408 |
50 | Queens | 9 | Woodhaven, Richmond Hill | 110367 | 109505 | 112151 | 141608 | 143317 | 409 |
51 | Queens | 10 | Ozone Park, Howard Beach | 113857 | 105651 | 107768 | 127274 | 122396 | 410 |
52 | Queens | 11 | Bayside, Douglaston, Little Neck | 127883 | 110963 | 108056 | 116404 | 116431 | 411 |
53 | Queens | 12 | Jamaica, St. Albans, Hollis | 206639 | 189383 | 201293 | 223602 | 225919 | 412 |
54 | Queens | 13 | Queens Village, Rosedale | 184647 | 173178 | 177535 | 196284 | 188593 | 413 |
55 | Queens | 14 | The Rockaways, Broad Channel | 98228 | 100592 | 100596 | 106686 | 114978 | 414 |
56 | Staten Island | 1 | Stapleton, Port Richmond | 135875 | 138489 | 137806 | 162609 | 175756 | 501 |
57 | Staten Island | 2 | New Springville, South Beach | 85985 | 105128 | 113944 | 127071 | 132003 | 502 |
58 | Staten Island | 3 | Tottenville, Woodrow, Great Kills | 72815 | 108249 | 126956 | 152908 | 160209 | 503 |