Lecture 1: Working with data

Lecture 1: Working with data#

Please sign attendance sheet; close devices

How was the homework? 👍/👎
Questions?
Reminder about the between-class participation

Challenge#

Complete the demos and exercise today with generative AI only.

Allowed
- Prompts
- Copy-pasting
Not allowed
- Googling
- Editing

Spoiler: This is an example of what not to do 😉

I’ll be using Gemini, built into Colab; you can use a different tool if you prefer.

Working with CSVs in pure Python#

We will use Python’s CSV DictReader. We’ll open the file, parse it as a CSV, then operate row by row.

# our code here

In-class exercise #

311 requests#

Who’s called 311 before?

NYC 311 homepage

311 data #

Today’s goal#

Which 311 complaints are most common?
Which agencies are responsible for handling them?

Pandas#

A Python package (bundled up code that you can reuse)
Very common for data science in Python
A lot like R
- Both organize around “data frames”

Load data#

Pull data from:

https://storage.googleapis.com/python-public-policy2/data/311_requests_2018-19_sample.csv.zip

We’re using a sample to make it easier/faster to work with. This will take a while (~30 seconds).

# our code here

If you see a DtypeWarning, ignore it for now. We’ll come back to it.

Preview the data#

# our code here

Pandas data structures#

DataFrame information#

# our code here

Demo#

Analysis#

Which complaints are most common?#

# code goes here

What’s the most frequent request per agency?#

# code goes here

groupby() similar to pivot tables in spreadsheets
to_frame()
reset_index()

Exclude bad records from the DataFrame#

Let’s look at the complaint types.

# code goes here

How should we go about cleaning those up?

# code goes here

Reflections?#

What worked well?
What didn’t work well?
Did this change how you’re thinking about generative AI?

Lecture 1: Working with data

Contents

Lecture 1: Working with data#

Challenge#

Working with CSVs in pure Python#

In-class exercise #

311 requests#

311 data #

Today’s goal#

Pandas#

Load data#

Preview the data#

Pandas data structures#

DataFrame information#

Demo#

Analysis#

Which complaints are most common?#

What’s the most frequent request per agency?#

Exclude bad records from the DataFrame#

Reflections?#

Best practices #

Homework 1 #

Lecture 1: Working with data

Contents

Lecture 1: Working with data#

Challenge#

Working with CSVs in pure Python#

In-class exercise#

311 requests#

311 data#

Today’s goal#

Pandas#

Load data#

Preview the data#

Pandas data structures#

DataFrame information#

Demo#

Analysis#

Which complaints are most common?#

What’s the most frequent request per agency?#

Exclude bad records from the DataFrame#

Reflections?#

Best practices#

Homework 1#

In-class exercise #

311 data #

Best practices #

Homework 1 #