Homework 2#

General assignment information


The goal here is to practice joining datasets through pandas. Hint: The instructions here are intentionally incomplete.

Step 1#

Find an NYC dataset with a borough column.

  • Use Scout to filter by column name.

  • Don’t spend too long on this step.

  • Keep the dataset small (under 500,000-ish rows) to make it easier to work with.

What’s the URL of your dataset?


Step 2#

Load it into Jupyter.

# your code here

Step 3#

Open the Population by Borough dataset and load it into Jupyter.

# your code here

Step 4#

Use merge() to combine the two, and output the resulting table.

# your code here


5 points

Using the two datasets above, use pandas to produce an aggregate per-capita statistic by borough.

The dataset you chose before may not work for this. That’s fine, pick another.


You’re creating a “number of [thing] per capita by borough” table.

  1. Do a groupby() on the original dataset.

  2. Join with the populations by borough.

  3. Compute the per-capita values as a new column.

# your code here

Now turn in the assignment.




Reminder about the between-class participation requirement.