# Homework 1

[General assignment information.](assignments.md) Note that this isn't a template notebook, hence there's no 🚀 above. You will create a blank notebook for this one.

## Tutorials

- [10 minutes to pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html)
- [Indexing Basics](https://pandas.pydata.org/docs/user_guide/indexing.html#basics)
- [Group by: split-apply-combine](https://pandas.pydata.org/docs/user_guide/groupby.html)
   - Beginning up to "GroupBy object attributes"
   - "Aggregation" up to "The `aggregate()` method"

## Coding

You'll do the following in a notebook. Make it [read like a blog post](assignments/open_ended.md#read-like-a-blog-post). Pretend you're explaining to a peer who hasn't taken this class. You don't need to teach them to code, but they should be able to follow what's going on.

### Steps

1. [Find a dataset.](assignments/open_ended.md)
   - It must have at least one numeric column.
   - Don't spend too long on this step.
1. If there's more than one numeric column, pick one.
1. Create a new notebook.
1. Using pandas:
   1. Read in the data.
   1. Compute:
      - The mean
      - The median
      - The mode
   1. Do a `groupby()` with an [aggregation](https://pandas.pydata.org/docs/user_guide/groupby.html#aggregation).
1. Do the same thing, but with pure Python (without pandas).
1. Write a conclusion, covering both:
   - The takeaways of the analysis
   - Reflecting on the process
1. Did you use an [external source](syllabus.md#academic-integrity), including generative AI? Please explain, or say that you didn't.

Now [turn in the assignment](assignments.md).

## Tutorials, continued

1. Read [The Joys (and Woes) of the Craft of Software Engineering](https://cs.calvin.edu/courses/cs/262/kvlinden/references/brooksJoysAndWoes.html)
   - Note not _everything_ in there is applicable to data analysis
1. Filtering/indexing `DataFrame`s
   - [Filter specific rows from a `DataFrame`](https://pandas.pydata.org/pandas-docs/stable/getting_started/intro_tutorials/03_subset_data.html#how-do-i-filter-specific-rows-from-a-dataframe)
   - [Boolean indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-indexing)
1. Learn about functions
   - [Video](https://www.youtube.com/watch?v=9Os0o3wzS_I&list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7&index=8)
   - [Blog post](https://python.land/introduction-to-python/functions)
1. [Brackets in Python and pandas](brackets.ipynb)
1. Coding Style Guides - Please skim these; I don't expect you to understand and follow everything in them. The most important guidelines to pay attention to are indentation and keeping each statement on its own line.
   - [The Hitchhiker’s Guide to Python](https://docs.python-guide.org/writing/style/)
   - [PEP 8](https://www.python.org/dev/peps/pep-0008/)
1. [Guide to commenting your code](https://realpython.com/python-comments-guide/)
1. [Quartz Guide to Bad Data](https://github.com/Quartz/bad-data-guide#readme)

### Optional

- [Learn about data dictionaries](https://analystanswers.com/what-is-a-data-dictionary-a-simple-thorough-overview/)
- Glance through pandas' [comparison with other tools](https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/index.html) for any you are familiar with
- Selecting Subsets of Data in Pandas: [Part 1](https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-6fcd0170be9c) and [Part 2](https://medium.com/dunder-data/selecting-subsets-of-data-in-pandas-39e811c81a0c)

## Participation

Reminder about the [between-class participation requirement](syllabus.md#participation).
