Lecture 3: Data reshaping and visualization#

“Data visualization”, “chart”, “graph”, and will be used interchangeably.

Please sign attendance sheet

Start by importing necessary packages#

import pandas as pd
import plotly.express as px

Fertility rates#

Jump ahead to line chart

Load data#

Create the clean folder for the files.

!mkdir -p tmp
!rm -rf tmp/fertility*
zsh:1: no matches found: tmp/fertility*

Download the CSV in a ZIP.

!wget -O tmp/fertility.zip -nc 'https://api.worldbank.org/v2/en/indicator/SP.DYN.TFRT.IN?downloadformat=csv'
--2025-02-03 14:33:13--  https://api.worldbank.org/v2/en/indicator/SP.DYN.TFRT.IN?downloadformat=csv
Resolving api.worldbank.org (api.worldbank.org)... 2606:4700:4400::6812:2ae7, 2606:4700:4400::ac40:9119, 172.64.145.25, ...
Connecting to api.worldbank.org (api.worldbank.org)|2606:4700:4400::6812:2ae7|:443... connected.
200 OKequest sent, awaiting response... 
Length: 70659 (69K) [application/zip]
Saving to: ‘tmp/fertility.zip’

tmp/fertility.zip   100%[===================>]  69.00K  --.-KB/s    in 0.002s  

2025-02-03 14:33:14 (33.6 MB/s) - ‘tmp/fertility.zip’ saved [70659/70659]
!unzip tmp/fertility.zip -d tmp/fertility
Archive:  tmp/fertility.zip
  inflating: tmp/fertility/Metadata_Indicator_API_SP.DYN.TFRT.IN_DS2_EN_csv_v2_2145.csv  
  inflating: tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2_2145.csv  
  inflating: tmp/fertility/Metadata_Country_API_SP.DYN.TFRT.IN_DS2_EN_csv_v2_2145.csv  

Remove the exact file version number, since that changes.

!mv tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2_*.csv tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2.csv
fertility = pd.read_csv(
    "tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2.csv",
    skiprows=3,
)
fertility
Country Name Country Code Indicator Name Indicator Code 1960 1961 1962 1963 1964 1965 ... 2015 2016 2017 2018 2019 2020 2021 2022 2023 Unnamed: 68
0 Aruba ABW Fertility rate, total (births per woman) SP.DYN.TFRT.IN 4.820000 4.655000 4.471000 4.271000 4.059000 3.842000 ... 1.972000 1.953000 1.839000 1.587000 1.486000 1.325000 1.180000 1.179000 NaN NaN
1 Africa Eastern and Southern AFE Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.723226 6.742124 6.762343 6.778133 6.787821 6.799767 ... 4.678619 4.616540 4.570828 4.526445 4.480001 4.412125 4.348564 4.281848 NaN NaN
2 Afghanistan AFG Fertility rate, total (births per woman) SP.DYN.TFRT.IN 7.282000 7.284000 7.292000 7.302000 7.304000 7.305000 ... 5.405000 5.262000 5.129000 5.002000 4.870000 4.750000 4.643000 4.523000 NaN NaN
3 Africa Western and Central AFW Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.459063 6.472099 6.492395 6.506528 6.525529 6.541022 ... 5.388622 5.333061 5.260538 5.191528 5.124368 5.055090 4.984759 4.903347 NaN NaN
4 Angola AGO Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.708000 6.790000 6.872000 6.954000 7.036000 7.116000 ... 5.774000 5.686000 5.600000 5.519000 5.442000 5.371000 5.304000 5.209000 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
261 Kosovo XKX Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.359000 6.350000 6.331000 6.296000 6.126000 5.927000 ... 1.751000 1.658000 1.649000 1.605000 1.546000 1.529000 1.522000 1.510000 NaN NaN
262 Yemen, Rep. YEM Fertility rate, total (births per woman) SP.DYN.TFRT.IN 7.938000 7.963000 7.963000 7.981000 8.038000 8.067000 ... 4.322000 4.214000 4.112000 4.043000 3.963000 3.886000 3.795000 3.716000 NaN NaN
263 South Africa ZAF Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.159000 6.138000 6.110000 6.077000 6.030000 5.967000 ... 2.359000 2.261000 2.334000 2.418000 2.475000 2.401000 2.374000 2.341000 NaN NaN
264 Zambia ZMB Fertility rate, total (births per woman) SP.DYN.TFRT.IN 7.115000 7.169000 7.214000 7.249000 7.274000 7.291000 ... 4.793000 4.707000 4.614000 4.536000 4.451000 4.379000 4.308000 4.242000 NaN NaN
265 Zimbabwe ZWE Fertility rate, total (births per woman) SP.DYN.TFRT.IN 7.220000 7.233000 7.255000 7.253000 7.261000 7.258000 ... 3.849000 3.771000 3.706000 3.659000 3.599000 3.545000 3.491000 3.437000 NaN NaN

266 rows × 69 columns

Over time#

Let’s just look at the USA, arbitrarily.

usa_fertility = fertility[fertility["Country Code"] == "USA"]
usa_fertility
Country Name Country Code Indicator Name Indicator Code 1960 1961 1962 1963 1964 1965 ... 2015 2016 2017 2018 2019 2020 2021 2022 2023 Unnamed: 68
251 United States USA Fertility rate, total (births per woman) SP.DYN.TFRT.IN 3.654 3.62 3.461 3.319 3.19 2.913 ... 1.8435 1.8205 1.7655 1.7295 1.706 1.6415 1.664 1.665 NaN NaN

1 rows × 69 columns

Too wide! Let’s make it long.

Reshaping#

Like pivot tables in spreadsheets

usa_fertility = usa_fertility.drop(
    columns=[
        "Country Name",
        "Country Code",
        "Indicator Name",
        "Indicator Code",
        "Unnamed: 68",
    ]
)
usa_fertility
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 ... 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023
251 3.654 3.62 3.461 3.319 3.19 2.913 2.721 2.558 2.464 2.456 ... 1.8625 1.8435 1.8205 1.7655 1.7295 1.706 1.6415 1.664 1.665 NaN

1 rows × 64 columns

melt()#

fertility_by_year = usa_fertility.melt(
    var_name="Year",
    value_name="Fertility Rate",
)

fertility_by_year
Year Fertility Rate
0 1960 3.6540
1 1961 3.6200
2 1962 3.4610
3 1963 3.3190
4 1964 3.1900
... ... ...
59 2019 1.7060
60 2020 1.6415
61 2021 1.6640
62 2022 1.6650
63 2023 NaN

64 rows × 2 columns

Line chart#

fig = px.line(
    fertility_by_year,
    x="Year",
    y="Fertility Rate",
    title="Global fertility rate over time",
)
fig.show()