Lecture 1 demo solutions#
# a bit of setup just for the demo; ignore me
import numpy as np
np.random.seed(5)
Working with files/CSVs in pure Python#
Plain text#
# set up the file object
with open("moby_dick.txt") as file:
for line in file:
print(line)
CSVs#
import csv
# set up the file object
with open("people.csv") as csvfile:
# set up the reader
reader = csv.DictReader(csvfile)
# loop through the rows
for row in reader:
# access the data in various columns
first = row["first_name"]
last = row["last_name"]
print(f"{first} {last}")
pandas#
Load data#
import pandas as pd
requests = pd.read_csv(
"https://storage.googleapis.com/python-public-policy2/data/311_requests_2018-19_sample.csv.zip"
)
/var/folders/j4/5js_l3j97rjclwsfww_grsb40000gn/T/ipykernel_75011/3437185872.py:3: DtypeWarning: Columns (8,20,31,34) have mixed types. Specify dtype option on import or set low_memory=False.
requests = pd.read_csv(
Preview the data#
requests
| Unique Key | Created Date | Closed Date | Agency | Agency Name | Complaint Type | Descriptor | Location Type | Incident Zip | Incident Address | ... | Vehicle Type | Taxi Company Borough | Taxi Pick Up Location | Bridge Highway Name | Bridge Highway Direction | Road Ramp | Bridge Highway Segment | Latitude | Longitude | Location | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 39885889 | 08/01/2018 12:05:13 AM | 08/01/2018 12:05:13 AM | DOT | Department of Transportation | Street Condition | Pothole | NaN | 11235 | 3143 SHORE PARKWAY | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.585156 | -73.959119 | (40.585155533520144, -73.95911915841708) |
| 1 | 39886470 | 08/01/2018 12:06:05 AM | 08/01/2018 12:06:05 AM | DOT | Department of Transportation | Street Condition | Pothole | NaN | 11235 | 3153 SHORE PARKWAY | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.585218 | -73.958608 | (40.58521848090658, -73.95860788382927) |
| 2 | 39893543 | 08/01/2018 12:06:16 AM | 08/03/2018 02:03:55 PM | HPD | Department of Housing Preservation and Develop... | HEAT/HOT WATER | ENTIRE BUILDING | RESIDENTIAL BUILDING | 11221 | 729 LAFAYETTE AVENUE | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.690733 | -73.943964 | (40.69073285353906, -73.943963521266) |
| 3 | 39886233 | 08/01/2018 12:06:29 AM | 08/01/2018 02:54:24 AM | NYPD | New York City Police Department | Noise - Residential | Banging/Pounding | Residential Building/House | 11693 | 82-01 BEACH CHANNEL DRIVE | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.589931 | -73.808896 | (40.58993080750793, -73.80889570815852) |
| 4 | 39880309 | 08/01/2018 12:06:51 AM | 08/01/2018 04:54:26 AM | NYPD | New York City Police Department | Noise - Residential | Loud Music/Party | Residential Building/House | 11216 | 64 HERKIMER STREET | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.679716 | -73.951234 | (40.67971590505359, -73.95123396494363) |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 499995 | 43622302 | 08/24/2019 01:46:09 AM | NaN | NYPD | New York City Police Department | Noise - Residential | Loud Music/Party | Residential Building/House | 10009.0 | 431 EAST 9 STREET | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.727536 | -73.983295 | (40.72753608835362, -73.98329522742081) |
| 499996 | 43619709 | 08/24/2019 01:49:49 AM | NaN | NYPD | New York City Police Department | Noise - Residential | Loud Music/Party | Residential Building/House | 10304.0 | 191 BROAD STREET | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.624157 | -74.081006 | (40.62415703282506, -74.08100614362155) |
| 499997 | 43623124 | 08/24/2019 01:56:35 AM | NaN | NYPD | New York City Police Department | Noise - Street/Sidewalk | Loud Music/Party | Street/Sidewalk | 10031.0 | 534 WEST 153 STREET | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.830718 | -73.945006 | (40.83071800761314, -73.94500557250639) |
| 499998 | 43625595 | 08/24/2019 01:56:40 AM | NaN | NYPD | New York City Police Department | Noise - Commercial | Loud Music/Party | Club/Bar/Restaurant | 10452.0 | EAST 170 STREET | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.839882 | -73.916783 | (40.839882158779105, -73.91678321635897) |
| 499999 | 43622817 | 08/24/2019 01:57:58 AM | NaN | NYPD | New York City Police Department | Noise - Commercial | Loud Music/Party | Store/Commercial | 10033.0 | 247 AUDUBON AVENUE | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.846376 | -73.934048 | (40.84637632367179, -73.93404825809533) |
500000 rows × 41 columns
.head() defaults to providing the first 5 if you don’t specify a number. Can also use .tail().
requests.head(10)
| Unique Key | Created Date | Closed Date | Agency | Agency Name | Complaint Type | Descriptor | Location Type | Incident Zip | Incident Address | ... | Vehicle Type | Taxi Company Borough | Taxi Pick Up Location | Bridge Highway Name | Bridge Highway Direction | Road Ramp | Bridge Highway Segment | Latitude | Longitude | Location | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 39885889 | 08/01/2018 12:05:13 AM | 08/01/2018 12:05:13 AM | DOT | Department of Transportation | Street Condition | Pothole | NaN | 11235 | 3143 SHORE PARKWAY | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.585156 | -73.959119 | (40.585155533520144, -73.95911915841708) |
| 1 | 39886470 | 08/01/2018 12:06:05 AM | 08/01/2018 12:06:05 AM | DOT | Department of Transportation | Street Condition | Pothole | NaN | 11235 | 3153 SHORE PARKWAY | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.585218 | -73.958608 | (40.58521848090658, -73.95860788382927) |
| 2 | 39893543 | 08/01/2018 12:06:16 AM | 08/03/2018 02:03:55 PM | HPD | Department of Housing Preservation and Develop... | HEAT/HOT WATER | ENTIRE BUILDING | RESIDENTIAL BUILDING | 11221 | 729 LAFAYETTE AVENUE | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.690733 | -73.943964 | (40.69073285353906, -73.943963521266) |
| 3 | 39886233 | 08/01/2018 12:06:29 AM | 08/01/2018 02:54:24 AM | NYPD | New York City Police Department | Noise - Residential | Banging/Pounding | Residential Building/House | 11693 | 82-01 BEACH CHANNEL DRIVE | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.589931 | -73.808896 | (40.58993080750793, -73.80889570815852) |
| 4 | 39880309 | 08/01/2018 12:06:51 AM | 08/01/2018 04:54:26 AM | NYPD | New York City Police Department | Noise - Residential | Loud Music/Party | Residential Building/House | 11216 | 64 HERKIMER STREET | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.679716 | -73.951234 | (40.67971590505359, -73.95123396494363) |
| 5 | 39878305 | 08/01/2018 12:07:41 AM | 08/01/2018 12:07:41 AM | DOT | Department of Transportation | Street Condition | Pothole | NaN | 11235 | 3167 SHORE PARKWAY | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.585254 | -73.958302 | (40.585254052666755, -73.95830184003603) |
| 6 | 39878821 | 08/01/2018 12:10:04 AM | 08/01/2018 02:04:49 AM | NYPD | New York City Police Department | Illegal Parking | Commercial Overnight Parking | Street/Sidewalk | 10465 | 245 THROGGS NECK BOULEVARD | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.818533 | -73.813294 | (40.8185333998848, -73.81329367027502) |
| 7 | 39881328 | 08/01/2018 12:11:00 AM | 08/09/2018 12:00:00 AM | DSNY | Department of Sanitation | Request Large Bulky Item Collection | Request Large Bulky Item Collection | Sidewalk | 11367 | 147-46 77 ROAD | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.721380 | -73.816326 | (40.72137977337607, -73.8163256713087) |
| 8 | 39885268 | 08/01/2018 12:14:41 AM | 08/01/2018 03:03:31 AM | NYPD | New York City Police Department | Noise - Residential | Loud Music/Party | Residential Building/House | 10459 | 951 HOE AVENUE | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.822088 | -73.890340 | (40.82208801817637, -73.89033972476959) |
| 9 | 39897573 | 08/01/2018 12:17:02 AM | 08/13/2018 01:05:32 PM | HPD | Department of Housing Preservation and Develop... | HEAT/HOT WATER | ENTIRE BUILDING | RESIDENTIAL BUILDING | 11101 | 33-01 38 AVENUE | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.752911 | -73.930358 | (40.752910688469996, -73.93035822987532) |
10 rows × 41 columns
requests.sample(5)
| Unique Key | Created Date | Closed Date | Agency | Agency Name | Complaint Type | Descriptor | Location Type | Incident Zip | Incident Address | ... | Vehicle Type | Taxi Company Borough | Taxi Pick Up Location | Bridge Highway Name | Bridge Highway Direction | Road Ramp | Bridge Highway Segment | Latitude | Longitude | Location | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 311150 | 42044618 | 03/25/2019 10:16:38 PM | 03/25/2019 11:47:09 PM | NYPD | New York City Police Department | Noise - Residential | Loud Talking | Residential Building/House | 10453.0 | 1800 POPHAM AVENUE | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.851389 | -73.917651 | (40.85138859491611, -73.91765079814148) |
| 32142 | 40107788 | 08/25/2018 10:24:40 AM | 08/25/2018 04:30:08 PM | NYPD | New York City Police Department | Blocked Driveway | No Access | Street/Sidewalk | 11419.0 | 133-05 107 AVENUE | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.687721 | -73.811536 | (40.6877211401399, -73.8115361385404) |
| 120849 | 40692417 | 10/29/2018 07:23:28 AM | 11/01/2018 09:52:41 PM | HPD | Department of Housing Preservation and Develop... | HEAT/HOT WATER | APARTMENT ONLY | RESIDENTIAL BUILDING | 10040 | 660 FT WASHINGTON AVENUE | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.856431 | -73.936092 | (40.856431325229096, -73.93609157070478) |
| 471582 | 43386392 | 07/26/2019 11:17:21 PM | 07/27/2019 02:36:26 AM | NYPD | New York City Police Department | Noise - Residential | Loud Music/Party | Residential Building/House | 10303.0 | 351 HARBOR ROAD | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 40.627510 | -74.160446 | (40.62751009515846, -74.16044618563123) |
| 455213 | 43253780 | 07/11/2019 03:23:28 PM | 07/12/2019 11:25:43 AM | NYPD | New York City Police Department | Illegal Parking | Double Parked Blocking Traffic | Street/Sidewalk | 11204.0 | 61 STREET | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 41 columns
DataFrame information#
requests.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500000 entries, 0 to 499999
Data columns (total 41 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unique Key 500000 non-null int64
1 Created Date 500000 non-null object
2 Closed Date 476156 non-null object
3 Agency 500000 non-null object
4 Agency Name 500000 non-null object
5 Complaint Type 500000 non-null object
6 Descriptor 492534 non-null object
7 Location Type 392590 non-null object
8 Incident Zip 480411 non-null object
9 Incident Address 434544 non-null object
10 Street Name 434519 non-null object
11 Cross Street 1 300838 non-null object
12 Cross Street 2 299635 non-null object
13 Intersection Street 1 107383 non-null object
14 Intersection Street 2 107048 non-null object
15 Address Type 451018 non-null object
16 City 476649 non-null object
17 Landmark 32521 non-null object
18 Facility Type 134923 non-null object
19 Status 500000 non-null object
20 Due Date 171559 non-null object
21 Resolution Description 457389 non-null object
22 Resolution Action Updated Date 488801 non-null object
23 Community Board 500000 non-null object
24 BBL 407355 non-null float64
25 Borough 500000 non-null object
26 X Coordinate (State Plane) 470830 non-null float64
27 Y Coordinate (State Plane) 470830 non-null float64
28 Open Data Channel Type 500000 non-null object
29 Park Facility Name 499973 non-null object
30 Park Borough 500000 non-null object
31 Vehicle Type 37 non-null object
32 Taxi Company Borough 403 non-null object
33 Taxi Pick Up Location 4475 non-null object
34 Bridge Highway Name 697 non-null object
35 Bridge Highway Direction 766 non-null object
36 Road Ramp 759 non-null object
37 Bridge Highway Segment 1027 non-null object
38 Latitude 470830 non-null float64
39 Longitude 470830 non-null float64
40 Location 470830 non-null object
dtypes: float64(5), int64(1), object(35)
memory usage: 156.4+ MB
Analysis#
Which complaints are most common?#
requests["Complaint Type"].value_counts().head()
Complaint Type
Noise - Residential 41311
HEAT/HOT WATER 39095
Illegal Parking 34297
Request Large Bulky Item Collection 30939
Blocked Driveway 25530
Name: count, dtype: int64
Equivalent to:
requests.groupby("Complaint Type").size().nlargest()
Complaint Type
Noise - Residential 41311
HEAT/HOT WATER 39095
Illegal Parking 34297
Request Large Bulky Item Collection 30939
Blocked Driveway 25530
dtype: int64
What’s the most frequent request per agency?#
requests.groupby(["Agency", "Complaint Type"]).size().to_frame(name="count")
| count | ||
|---|---|---|
| Agency | Complaint Type | |
| ACS | Forms | 56 |
| COIB | Forms | 1 |
| DCA | Consumer Complaint | 2892 |
| DCA / DOH New License Application Request | 186 | |
| DCAS | Comments | 13 |
| ... | ... | ... |
| TLC | Lost Property | 472 |
| Taxi Complaint | 2416 | |
| Taxi Compliment | 41 | |
| Taxi Licensee Complaint | 5 | |
| Taxi Report | 318 |
256 rows × 1 columns