Homework 3#

General assignment information

Coding#

We are going to look at the population count of different community districts over time.

Step 0#

Read the data from the New York City Population By Community Districts data set into a DataFrame called pop_by_cd. To get the URL:

  1. Visit the page linked above.

  2. Click Export.

  3. Right-click CSV.

  4. Click Copy Link Address (or Location, depending on your browser).

# your code here

Step 1#

Prepare the data. Use the following code to reshape the DataFrame to have one row per community district per Census year.

# turn the population columns into rows
populations = pd.melt(
    pop_by_cd,
    id_vars=["Borough", "CD Number", "CD Name"],
    var_name="year",
    value_name="population",
)

# turn the years into numbers
populations.year = populations.year.str.replace(" Population", "").astype(int)

populations

Step 2#

Create a line chart of the population over time for each community district in Manhattan. There should be one line for each.

# your code here

Tutorials#

  1. Go through the first third of Time Series Analysis with Pandas, up until the “Visualizing time series data” section.

  2. Read how to handle time series data in pandas.

  3. Read the Data Design Standards.

Optional#

Final Project Proposal#

Please submit your proposal.

Participation#

Reminder about the between-class participation requirement.