Lecture 6: The Bigger Picture#

Please:

  • sign attendance sheet

  • put away devices

Guest speaker: Will Craft#

Will is the data editor for investigations at The Guardian US, specializing in government accountability. He uses public records, data analysis, and in-depth reporting to cover criminal justice, the environment, and the way our institutions function. In a past life, Will worked with APM Reports, a team of public radio reporters from 2015 to 2022. His work has appeared on NPR, Reveal from the Center for Investigative Reporting, public radio stations around the country, and the podcasts In The Dark and Sent Away. It has been honored with two Peabodies, a DuPont, and a George Polk award. More importantly, it has spurred policy changes that have made a difference in people’s lives.

Questions?#

Final Project#

How did it go?

Peer grading#

Ask Me Anything (AMA)#

Have slides on “Python beyond data analysis” as backup, but would rather talk about what you want to hear about.

Data warehousing#

Python beyond data analysis#

We’ve been focusing on using Python and pandas for data analysis. What else is Python used for?

Data engineering#

  • Automation / recurring processes

  • Copying/moving/processing/publishing data, especially Big Data

  • Monitoring/alerting

Web development#

  • Building web sites that are interactive (more than just content)

  • Forms

  • Presenting data

  • Workflows, such as:

    • Signing up for things

    • Paying for things

Machine learning#

  • Statistics, but fancy

  • Building models

  • Finding patterns

  • Recommendations

  • Detection

When people say “artificial intelligence,” they usually mean “machine learning.”

Diagram showing what type of machine learning may be useful, if at all

Source, with more thorough explanation

The process#

High-level

  1. Create a model

    1. Gather a bunch of data for training

    2. If supervised machine learning, label it (give it the right answers)

    3. Segment into training and test data

    4. Train the model against the training dataset (have it identify patterns)

    5. Test the model against the test dataset

  2. Run against new data

  3. If reinforcement learning, model refines itself

You have a head start: The fundamentals are applicable anywhere you’re using code.

Resources#

Thanks to the Reader!

Thank you!#

Keep in touch.