Open-ended assignments#

In Homework 1, Homework 4, and the Final Project, you will pick your own dataset(s).

Open data portals#

There are countless places to get data, notably:

Inspiration#

For starters, see the Final Project examples from past semesters.

Probably not realistic to make visualizations that are as fancy as these ones made by professionals, but they may give you ideas. Some also include links/downloads of the source data.

Storing data#

To work with uploaded files in Google Colab, you have two options.

Direct upload#

Fewer steps, but your file(s) will disappear when your session ends.

Steps to get data into Google Colab directly
  1. In the Google Colab sidebar, click the Files icon (A).

  2. Click the upload button (B).

  3. Select your file.

  4. You’ll use read_csv("MY_FILENAME.csv") in your code.

Google Drive#

More steps, but your file(s) are preserved between sessions.

Steps to get data into Google Colab via Drive

  1. Upload the file(s) somewhere in Drive.

  2. In the Google Colab sidebar, click the Files icon (A).

  3. Click the Mount Drive icon (B).

    • You may need to run the code it injects to authorize it (C).

    • Think of this as attaching your Drive to your Google Colab instance, as if you were plugging in a USB flash drive.

  4. Navigate to the file (D).

    • You may need to click into content, then drive.

  5. Next to the filename, click the three dots.

  6. Click Copy path (E).

    • The value should be something like /content/drive/My Drive/....

  7. Use this path with read_csv() (F).

Google Colab cannot access the file on your local machine; in other words, the path shouldn’t start with C:\\ or anything like that. More info about file paths.

Reducing data size#

You can make data smaller before uploading by filtering it through: