Class 0: Intro to coding#

NYU Wagner - Python Coding for Public Policy#

Aidan Feldman

Welcome!#

Elmo waving

Structure for today#

  1. Intros

  2. Going over course info like the syllabus, tools, etc.

  3. Intro to Python/code

About me#

  • Coding since 2005 🖥

  • Government since 2014 🦅

  • Also a modern dancer 💃 cyclist 🚲 and baker 🍞

  • Passionate about open source

Day jobs#

Past include:

grader intro

Access these slides#

You can get there through Brightspace.

Wait list#

  • There’s a lot of churn in enrollment, so be patient.

  • To be fair to everyone, it’s first come first served.

  • How the wait list works

While you’re waiting#

Once you get off the wait list#

…or if you register for the class late:

Extensions#

  • We will grant extensions up to the following, whichever comes first:

    • Nine days after the original due date

    • One week after you were enrolled in the course

  • If we accidentally mark you as late, let the grader know and we’ll get it corrected.

  • The late submission deadline will not be extended.

    • In other words: If you joined the class more than a week after the class starts, you can’t turn in Homework 0 late.

This is a short class, and these rules are in place to:

  • Ensure late-joiners get caught up quickly

  • Allow solutions for homeworks to be shared sooner than later (so that students can learn from them)

Introductions#

Share the following:

  • Name (what you go by)

  • Pronouns

  • What you’re studying

  • Fun fact

Who are you (as a whole)#

Survey

[This study] found that the modern language aptitude test was the strongest predictor of how quickly people would learn to code in Python … language aptitude explained 43 percent, fluid reasoning explained 12.8 percent, … and numeracy just 6 percent [of the variance].

https://www.psychologytoday.com/us/blog/brain-waves/202003/learning-code-requires-language-skills-not-math

In other words: Being good at learning foreign languages is the best known predictor for learning to code quickly, moreso than being good at math.

Accomodations#

  • Childcare, mental health issues, etc.

  • Reach out via email

Class structure#

Class materials walkthrough#

Important links

Disclaimers#

Me#

  • Here to teach you to:

    • Understand the power of code

    • Not be afraid of code

    • Do a lot with just a little code

    • Troubleshoot

    • Google stuff

  • Not a statistician

You#

  • Are not going to:

    • Be good at coding seven weeks in

    • Understand everything the first time

  • Will want to throw your computer out a window at one or many points in the class

    • Celebrate the little victories

  • Will get out of it what you put into it

Questions you might ask#

  • Can you remind us what that means?

  • Can you say that differently?

  • Can you give an example?

  • How might this show up in our jobs?

  • When did you first learn about this?

  • Why does this matter?

Stolen from Andrew Maier

Spreadsheets vs. programming languages#

What do you like about spreadsheets?

Why spreadsheets#

  • The easy stuff is easy

  • Lots of people know how to use them

  • Mostly just have to point, click, and scroll

  • Data and logic live together as one

Why programming languages#

  • Data and logic don’t live together

    • Why might this matter?

  • More powerful, flexible, and expressive than spreadsheet formulas; don’t have to cram into a single line

    =SUM(INDEX(C3:E9,MATCH(B13,C3:C9,0),MATCH(B14,C3:E3,0)))
    
  • Better at working with large data

    • Google Sheets and Excel have hard limits at 1-5 million rows, but get slow long before that

  • Reusable code (packages)

  • Automation

Side-by-side1#

Task

Spreadsheets

Programming Languages

Loading data

Easy

Medium

Viewing data

Easy

Medium

Filtering data

Easy

Medium

Manipulating data

Medium

Medium

Joining data

Hard

Medium

Complicated transforms

Impossible2

Medium

Automation

Impossible2

Medium

Making reusable

Limited3

Medium

Large datasets

Impossible

Hard

1 These ratings are obviously subjective
2 Not including scripting, including Excel’s new Python+pandas support
3 Google Sheets supports named functions

Python vs. other languages#

Why are you taking this class instead of R or whatever else?

Python logo

Python vs. other languages#

  • Good for general-purpose and data stuff

  • Widely used in both industry and academia

  • Relatively easy to learn

  • Open source

Python logo

What is Python?#

  • A general-purpose programming language

  • Text that your computer understands

    • Usually saved in a text file

    • This is true of most programming languages

  • Popular for data analysis and data science

Packages#

  • a.k.a. “libraries” or “modules”

  • Developers have create them to make code/functionality reusable and easily sharable

  • Software plugins that you import

  • Main packages we’ll use:

    • pandas

    • plotly

Where to Python#

Pyton can be run in:

Each can be on your computer (“local”), or in the cloud somewhere.

Trinity using the command line in the Matrix

Try it!#

  1. Go to python.org/shell

  2. Do some math (after typing each line, press Enter to submit)

    1. 1 + 1

    2. 10 / 4

    3. 10 / 3

    4. Calculate the number of minutes in a year

Try to break it!#

It’s ok, you won’t hurt it.

What happened?

Jupyter#

  • Web based programming environment

  • Supports Python by default, and other languages with plugins

  • Nicely displays output of your code so you can check and share the results

  • Avoids using the command line

  • Avoids installation problems across different computers and operating systems

We’re using JupyterHub, offered by NYU’s High Performance Computing (HPC) group.

Command line vs. Jupyter#

Command line vs. Jupyter output

Try it!#

  1. Go to JupyterHub

  2. Create a notebook

    1. Click New

    2. Under Notebook, click Python [conda env:python-public-policy]

  3. Paste in the following example

  4. Press the ▶️ button (or Control+Enter on your keyboard)

import plotly.io as pio
pio.renderers.default = "notebook_connected+pdf"

import plotly.express as px

df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", trendline="ols")
fig.show()

FYI px.data.tips() loads one of Plotly’s sample datasets.

Jupyter basics#

A “cell” can be either code or Markdown (text). Raw Markdown looks like this:

## A heading

Plain text

[A link](https://somewhere.com)

Running#

  • You “run” a cell by either:

    • Pressing the ▶️ button

    • Pressing Control+Enter on your keyboard

  • Cells don’t run unless you tell them to, in the order you do so

    • Generally, you want to do so from the top every time you open a notebook

Output#

  • The last thing in a code cell is what gets displayed when it’s run

  • The output gets saved as part of the notebook

  • Just because there’s existing output from a cell, doesn’t mean that cell has been run during this session

Computers are not smart.#

They do exactly what you tell them to do (not what you meant them to do) in the order you tell them to do it.

Homework 0#

  1. Walk through the assignment

  2. Make a copy of the assignment

  3. How to submit