Open-ended assignments#

In Homework 1 and the Final Project, you will pick your own dataset(s).

  • Use at least one dataset that you aren’t familiar with.

    • Using data from a primary source is preferred.

  • Finding a dataset available in CSV or JSON is recommended, though pandas can read other formats.

Open data portals#

There are countless places to get data, notably:

Inspiration#

For starters, see the Final Project examples from past semesters.

Probably not realistic to make visualizations that are as fancy as these ones made by professionals, but they may give you ideas. Some also include links/downloads of the source data.

Storing data#

  1. Open the JupyterHub file browser.

  2. Navigate to the folder your notebook is in.

  3. Upload the data.

  4. From Python, use read_csv("./<filename>.csv").

Note that that file path should be to relative to the notebook within JupyterHub — ./ means “in the same directory”. JupyterHub cannot access the file on your local machine; in other words, the path shouldn’t start with C:\\ or anything like that. More info about file paths.

Limits#

JupyterHub has a disk storage limit of 1GB (a.k.a. 1,024 MB or 1,048,576 KB) across all your files, and a memory limit of 3GB.

Reducing data size#

You can make data smaller before uploading by filtering it through: