Homework 1#

General assignment information.

Tutorials#

Coding#

You’ll complete this assignment using pandas. Steps:

  1. Find a dataset.

    • It must have:

      • At least one numeric column

      • Between one thousand and one million rows

    • Don’t spend too long on this step.

  2. If there’s more than one numeric column, pick one.

  3. Create a new notebook.

  4. Read in the data.

  5. Compute:

    • The mean

    • The median

    • The mode

  6. Do a groupby() with an aggregation.

Now turn in the assignment.

Tutorials, continued#

  1. Read The Joys (and Woes) of the Craft of Software Engineering

    • Note not everything in there is applicable to data analysis

  2. Filtering/indexing DataFrames

  3. Learn about functions

  4. Coding Style Guides - Please skim these; I don’t expect you to understand and follow everything in them. The most important guidelines to pay attention to are indentation and keeping each statement on its own line.

  5. Guide to commenting your code

  6. Quartz Guide to Bad Data

Optional#

Participation#

Reminder about the between-class participation requirement.