Class 0: Intro to coding#

Columbia University - Python for Public Policy#

Aidan Feldman

Snakes

Structure for today#

  1. Intros

  2. Going over course info like the syllabus, tools, etc.

  3. Intro to Python/code

About me#

  • Coding since 2005 🖥

  • Government since 2014 🦅

  • Teaching since 2011 🎓

  • Also a modern dancer 💃 and cyclist 🚲

Day jobs#

Currently freelancing with the Colorado Behavioral Health Administration and Reinvent Albany. In the past, have worked for…

Government#

Tech companies#

Reader intro

Access these slides#

You can get there through CourseWorks.

Wait list#

Who are you (as a whole)#

Survey

[This study] found that the modern language aptitude test was the strongest predictor of how quickly people would learn to code in Python … language aptitude explained 43 percent, fluid reasoning explained 12.8 percent, … and numeracy just 6 percent [of the variance].

https://www.psychologytoday.com/us/blog/brain-waves/202003/learning-code-requires-language-skills-not-math

In other words: Being good at learning foreign languages is a better predictor for coding aptitude than being good at math.

Accomodations#

  • Illness, childcare, mental health issues, etc.

  • Reach out via email

Class structure#

Class materials walkthrough#

Important links

Disclaimers#

Me#

  • Here to teach you to:

    • Understand the power of code

    • Not be afraid of code

    • Do a lot with just a little code

    • Troubleshoot

    • Google stuff

  • Not a statistician

You#

  • Are not going to:

    • Be good at coding seven weeks in

    • Understand everything the first time

  • Will want to throw your computer out a window at one or many points in the class

    • Celebrate the little victories

  • Will get out of it what you put into it

PB&J exercise#

  1. Go to CourseWorks

  2. Click Assignments

  3. Click PB&J exercise

  4. Fill it out

  5. Submit

If you don’t have access (yet), write it out somewhere else and submit when you can.

We’ll come back to this.

Spreadsheets vs. programming languages#

What do you like about spreadsheets?

Why spreadsheets#

  • The easy stuff is easy

  • Lots of people know how to use them

  • Mostly just have to point, click, and scroll

  • Data and logic live together as one

Why programming languages#

  • Data and logic don’t live together

    • Why might this matter?

  • More powerful, flexible, and expressive than spreadsheet formulas; don’t have to cram into a single line

    =SUM(INDEX(C3:E9,MATCH(B13,C3:C9,0),MATCH(B14,C3:E3,0)))
    
  • Better at working with large data

    • Google Sheets and Excel have hard limits at 1-5 million rows, but get slow long before that

  • Reusable code (packages)

  • Automation

Side-by-side1#

Task

Spreadsheets

Programming Languages

Loading data

Easy

Medium

Viewing data

Easy

Medium

Filtering data

Easy

Medium

Manipulating data

Medium

Medium

Joining data

Hard

Medium

Complicated transforms

Impossible2

Medium

Automation

Impossible2

Medium

Making reusable

Limited3

Medium

Large datasets

Impossible

Hard

1 These ratings are obviously subjective
2 Not including scripting, including Excel’s new Python+pandas support
3 Google Sheets supports named functions

Python vs. other languages#

Why are you taking this class instead of R or whatever else?

Python logo

Python vs. other languages#

  • Good for general-purpose and data stuff

  • Widely used in both industry and academia

  • Relatively easy to learn

  • Open source

Python logo

What is Python?#

  • A general-purpose programming language

  • Text that your computer understands

    • Usually saved in a text file

    • This is true of most programming languages

  • Popular for data analysis and data science

Where to Python#

Pyton can be run in:

Each can be on your computer (“local”), or in the cloud somewhere.

Trinity using the command line in the Matrix

Try it!#

  1. Go to python.org/shell

  2. Do some math (after typing each line, press Enter to submit)

    1. 1 + 1

    2. 10 / 4

    3. 10 / 3

    4. Calculate the number of minutes in a year

Try to break it!#

It’s ok, you won’t hurt it.

What happened?

Jupyter#

  • Web based programming environment

  • Supports Python by default, and other languages with plugins

  • Nicely displays output of your code so you can check and share the results

  • Avoids using the command line

  • Avoids installation problems across different computers and operating systems

We’re using a service called Google Colab for their Jupyter functionality.

Command line vs. Jupyter#

Command line vs. Jupyter output

Try it!#

  1. Go to Google Colab

  2. Create a notebook

  3. Paste in the following example

  4. Press the ▶️ button (or Control+Enter on your keyboard)

import plotly.express as px

df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", trendline="ols")
fig.show()

FYI px.data.tips() loads one of Plotly’s sample datasets. You don’t need that when plotting other datasets.

Jupyter basics#

A “cell” can be either code or Markdown (text). Raw Markdown looks like this:

## A heading

Plain text

[A link](https://somewhere.com)

Running#

  • You “run” a cell by either:

    • Pressing the ▶️ button

    • Pressing Control+Enter on your keyboard

  • Cells don’t run unless you tell them to, in the order you do so

    • Generally, you want to do so from the top every time you open a notebook

Output#

  • The last thing in a code cell is what gets displayed when it’s run

  • The output gets saved as part of the notebook

  • Just because there’s existing output from a cell, doesn’t mean that cell has been run during this session

PB&J exercise, continued#

Let’s do this.

🍞🥜🍓🍴

Computers are not smart.#

They do exactly what you tell them to do (not what you meant them to do) in the order you tell them to do it.

Inspiration:

from IPython.display import IFrame

IFrame("https://www.youtube.com/embed/cDA3_5982h8", width=640, height=360)

Homework 0#

  1. Walk through the assignment

  2. Make a copy of the assignment

  3. How to submit