Homework 3#
General assignment information
Coding#
We are going to look at the population count of different community districts over time.
Step 0#
Read the data from the New York City Population By Community Districts data set into a DataFrame called pop_by_cd
. To get the URL:
Visit the page linked above.
Click
Export
.Right-click
CSV
.Click
Copy Link Address
(orLocation
, depending on your browser).
# your code here
Step 1#
Prepare the data. Use the following code to reshape the DataFrame to have one row per community district per Census year.
# turn the population columns into rows
populations = pd.melt(
pop_by_cd,
id_vars=["Borough", "CD Number", "CD Name"],
var_name="year",
value_name="population",
)
# turn the years into numbers
populations.year = populations.year.str.replace(" Population", "").astype(int)
populations
Step 2#
Create a line chart of the population over time for each community district in Manhattan. There should be one line for each.
# your code here
Tutorials#
Go through the first third of Time Series Analysis with Pandas, up until the “Visualizing time series data” section.
Read the Data Design Standards.
Optional#
Watch this talk on audification/sonification. We won’t be doing so in this class, but hopefully will provide some inspiration about different ways that data can be represented.
Read about other tools and techniques for visualization in Python.
Final Project Proposal#
Please submit your proposal.
Participation#
Reminder about the between-class participation requirement.