Class 0: Intro to coding#
NYU Wagner - Python Coding for Public Policy#
Aidan Feldman
Welcome!#
Structure for today#
Intros
Going over course info like the syllabus, tools, etc.
Intro to Python/code
About me#
Coding since 2005 🖥
Government since 2014 🦅
Also a modern dancer 💃 cyclist 🚲 and baker 🍞
Passionate about open source
Day jobs#
Past include:
Technology Director at TTS
GitHub
grader intro
Access these slides#
You can get there through Brightspace.
Wait list#
There’s a lot of churn in enrollment, so be patient.
To be fair to everyone, it’s first come first served.
While you’re waiting#
Attend class.
Look through the important links, including past lectures.
You won’t be able to access Brightspace or the Discussions.
Complete the Assignment(s) in JupyterHub as normal, waiting until you are registered to submit them.
Once you get off the wait list#
…or if you register for the class late:
Watch the recording(s) of past lectures if you weren’t present for them.
Make sure to read through the pinned threads that appear at the top of the list of Discussions, in case there’s anything you missed.
Submit all past assignments.
We will mark Participation for weeks prior to when a student is registered as Excused.
Email the grader if there’s a mistake.
Extensions#
We will grant extensions up to the following, whichever comes first:
Nine days after the original due date
One week after you were enrolled in the course
If we accidentally mark you as late, let the grader know and we’ll get it corrected.
The late submission deadline will not be extended.
In other words: If you joined the class more than a week after the class starts, you can’t turn in Homework 0 late.
This is a short class, and these rules are in place to:
Ensure late-joiners get caught up quickly
Allow solutions for homeworks to be shared sooner than later (so that students can learn from them)
Introductions#
Share the following:
Name (what you go by)
Pronouns
What you’re studying
Fun fact
Who are you (as a whole)#
Survey
[This study] found that the modern language aptitude test was the strongest predictor of how quickly people would learn to code in Python … language aptitude explained 43 percent, fluid reasoning explained 12.8 percent, … and numeracy just 6 percent [of the variance].
In other words: Being good at learning foreign languages is the best known predictor for learning to code quickly, moreso than being good at math.
Accomodations#
Childcare, mental health issues, etc.
Reach out via email
Class structure#
Class materials walkthrough#
Disclaimers#
Me#
Here to teach you to:
Understand the power of code
Not be afraid of code
Do a lot with just a little code
Troubleshoot
Google stuff
Not a statistician
You#
Are not going to:
Be good at coding seven weeks in
Understand everything the first time
Will want to throw your computer out a window at one or many points in the class
Celebrate the little victories
Will get out of it what you put into it
Questions you might ask#
Can you remind us what that means?
Can you say that differently?
Can you give an example?
How might this show up in our jobs?
When did you first learn about this?
Why does this matter?
Stolen from Andrew Maier
Spreadsheets vs. programming languages#
What do you like about spreadsheets?
Why spreadsheets#
The easy stuff is easy
Lots of people know how to use them
Mostly just have to point, click, and scroll
Data and logic live together as one
Why programming languages#
Data and logic don’t live together
Why might this matter?
More powerful, flexible, and expressive than spreadsheet formulas; don’t have to cram into a single line
=SUM(INDEX(C3:E9,MATCH(B13,C3:C9,0),MATCH(B14,C3:E3,0)))
Better at working with large data
Google Sheets and Excel have hard limits at 1-5 million rows, but get slow long before that
Reusable code (packages)
Automation
Side-by-side1#
Task |
Spreadsheets |
Programming Languages |
---|---|---|
Loading data |
Easy |
Medium |
Viewing data |
Easy |
Medium |
Filtering data |
Easy |
Medium |
Manipulating data |
Medium |
Medium |
Joining data |
Hard |
Medium |
Complicated transforms |
Impossible2 |
Medium |
Automation |
Impossible2 |
Medium |
Making reusable |
Limited3 |
Medium |
Large datasets |
Impossible |
Hard |
1 These ratings are obviously subjective
2 Not including scripting, including Excel’s new Python+pandas support
3 Google Sheets supports named functions
Python vs. other languages#
Why are you taking this class instead of R or whatever else?
Python vs. other languages#
Good for general-purpose and data stuff
Widely used in both industry and academia
Relatively easy to learn
Open source
What is Python?#
A general-purpose programming language
Text that your computer understands
Usually saved in a text file
This is true of most programming languages
Popular for data analysis and data science
Packages#
a.k.a. “libraries” or “modules”
Developers have create them to make code/functionality reusable and easily sharable
Software plugins that you
import
Main packages we’ll use:
pandas
plotly
Where to Python#
Pyton can be run in:
A text file, using the
python
commandA Jupyter notebook
Google Colab, Mode, Kaggle, and other sites/tools are built around it
What we’ll be using for this class
Each can be on your computer (“local”), or in the cloud somewhere.
Try it!#
Go to python.org/shell
Do some math (after typing each line, press
Enter
to submit)1 + 1
10 / 4
10 / 3
Calculate the number of minutes in a year
Try to break it!#
It’s ok, you won’t hurt it.
What happened?
Jupyter#
Web based programming environment
Supports Python by default, and other languages with plugins
Nicely displays output of your code so you can check and share the results
Avoids using the command line
Avoids installation problems across different computers and operating systems
We’re using JupyterHub, offered by NYU’s High Performance Computing (HPC) group.
Command line vs. Jupyter#
Try it!#
Go to JupyterHub
Create a notebook
Click
New
Under
Notebook
, clickPython [conda env:python-public-policy]
Paste in the following example
Press the ▶️ button (or
Control
+Enter
on your keyboard)
import plotly.io as pio
pio.renderers.default = "notebook_connected+pdf"
import plotly.express as px
df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", trendline="ols")
fig.show()
FYI px.data.tips()
loads one of Plotly’s sample datasets.
Jupyter basics#
A “cell” can be either code or Markdown (text). Raw Markdown looks like this:
## A heading
Plain text
[A link](https://somewhere.com)
Running#
You “run” a cell by either:
Pressing the ▶️ button
Pressing
Control
+Enter
on your keyboard
Cells don’t run unless you tell them to, in the order you do so
Generally, you want to do so from the top every time you open a notebook
Output#
The last thing in a code cell is what gets displayed when it’s run
The output gets saved as part of the notebook
Just because there’s existing output from a cell, doesn’t mean that cell has been run during this session
Computers are not smart.#
They do exactly what you tell them to do (not what you meant them to do) in the order you tell them to do it.
Homework 0#
Walk through the assignment
Make a copy of the assignment