Lecture 1: Working with data#
Please sign attendance sheet; close devices
How was the homework? 👍/👎
Questions?
Reminder about the between-class participation
Challenge#
Complete the demos and exercise today with generative AI only.
Allowed
Prompts
Copy-pasting
Not allowed
Googling
Editing
Spoiler: This is an example of what not to do 😉
I’ll be using Gemini, built into Colab; you can use a different tool if you prefer.
Working with CSVs in pure Python#
We will use Python’s CSV DictReader. We’ll open the file, parse it as a CSV, then operate row by row.
# our code here
In-class exercise#
311 requests#
Who’s called 311 before?
311 data#
Today’s goal#
Which 311 complaints are most common?
Which agencies are responsible for handling them?
Pandas#
A Python package (bundled up code that you can reuse)
Very common for data science in Python
-
Both organize around “data frames”
Load data#
Pull data from:
https://storage.googleapis.com/python-public-policy2/data/311_requests_2018-19_sample.csv.zip
We’re using a sample to make it easier/faster to work with. This will take a while (~30 seconds).
# our code here
If you see a DtypeWarning, ignore it for now. We’ll come back to it.
Preview the data#
# our code here
Pandas data structures#
DataFrame information#
# our code here
Demo#
Analysis#
Which complaints are most common?#
# code goes here
What’s the most frequent request per agency?#
# code goes here
groupby()similar to pivot tables in spreadsheets
Exclude bad records from the DataFrame#
Let’s look at the complaint types.
# code goes here
How should we go about cleaning those up?
# code goes here
Reflections?#
What worked well?
What didn’t work well?
Did this change how you’re thinking about generative AI?