Lecture 6: The Bigger Picture#

Please:

  • sign attendance sheet

  • put away devices

Guest speakers#

Joli Golden#

Joli is a U.S. Census Bureau Data Dissemination Specialist in the New York Region focused on New York City. Her current position involves educating the public on the availability, extraction and usage of the vast array of data on the Census Bureau’s website. Joli served as a Partnership Specialist during the 2020 Census and recently re-joined the Census Bureau.

She received a Bachelor of Arts from the University of Pennsylvania and a Master of Fine Arts from the UCLA School of Theater, Film and Television.

David Kraiker#

David Kraiker has worked at the Census Bureau for 27 years - first as a Geographer in the New York Regional Office, and more recently as a Data Dissemination Specialist. Previously he worked as a cartographer for private mapping companies as well as being a French instructor. David holds a BA from Clark University, an MSc from Rutgers-Newark and the premier degré diploma from the Université de Caen (France). He lives in Northern New Jersey with his family.

Questions?#

Final Project peer grading#

Ask Me Anything (AMA)#

Have slides on “Python beyond data analysis” as backup, but would rather talk about what you want to hear about.

Python beyond data analysis#

We’ve been focusing on using Python and pandas for data analysis. What else is Python used for?

Data engineering#

  • Automation / recurring processes

  • Copying/moving/processing/publishing data, especially Big Data

  • Monitoring/alerting

Web development#

  • Building web sites that are interactive (more than just content)

  • Forms

  • Presenting data

  • Workflows, such as:

    • Signing up for things

    • Paying for things

Machine learning#

  • Statistics, but fancy

  • Building models

  • Finding patterns

  • Recommendations

  • Detection

When people say “artificial intelligence,” they usually mean “machine learning.”

Diagram showing what type of machine learning may be useful, if at all

Source, with more thorough explanation

The process#

High-level

  1. Create a model

    1. Gather a bunch of data for training

    2. If supervised machine learning, label it (give it the right answers)

    3. Segment into training and test data

    4. Train the model against the training dataset (have it identify patterns)

    5. Test the model against the test dataset

  2. Run against new data

  3. If reinforcement learning, model refines itself

You have a head start: The fundamentals are applicable anywhere you’re using code.

Thanks to the Reader!

Thank you!#

Keep in touch.