Lecture 3 demo solution#

Load data#

This downloads data in an automated way; can jump ahead to reading the data.

Create the clean folder for the files.

!mkdir -p tmp
!rm -rf tmp/fertility*

Download the CSV in a ZIP.

!wget -O tmp/fertility.zip -nc 'https://api.worldbank.org/v2/en/indicator/SP.DYN.TFRT.IN?downloadformat=csv'
--2025-04-19 17:40:56--  https://api.worldbank.org/v2/en/indicator/SP.DYN.TFRT.IN?downloadformat=csv
Resolving api.worldbank.org (api.worldbank.org)... 2606:4700:4400::6812:2ae7, 2606:4700:4400::ac40:9119, 104.18.42.231, ...
Connecting to api.worldbank.org (api.worldbank.org)|2606:4700:4400::6812:2ae7|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 73012 (71K) [application/zip]
Saving to: ‘tmp/fertility.zip’

tmp/fertility.zip   100%[===================>]  71.30K  --.-KB/s    in 0.002s  

2025-04-19 17:40:56 (36.2 MB/s) - ‘tmp/fertility.zip’ saved [73012/73012]
HTTP request sent, awaiting response... 
200 OK
Length: 70659 (69K) [application/zip]
Saving to: ‘tmp/fertility.zip’


tmp/fertility.zip     0%[                    ]       0  --.-KB/s               
tmp/fertility.zip   100%[===================>]  69.00K  --.-KB/s    in 0.002s  

2025-03-17 00:00:54 (33.0 MB/s) - ‘tmp/fertility.zip’ saved [70659/70659]
!unzip tmp/fertility.zip -d tmp/fertility
Archive:  tmp/fertility.zip
  inflating: tmp/fertility/Metadata_Indicator_API_SP.DYN.TFRT.IN_DS2_en_csv_v2_19650.csv  
  inflating: tmp/fertility/API_SP.DYN.TFRT.IN_DS2_en_csv_v2_19650.csv  
  inflating: tmp/fertility/Metadata_Country_API_SP.DYN.TFRT.IN_DS2_en_csv_v2_19650.csv  
  inflating: tmp/fertility/Metadata_Country_API_SP.DYN.TFRT.IN_DS2_EN_csv_v2_2145.csv  

Remove the exact file version number, since that changes.

!mv tmp/fertility/API_SP.DYN.TFRT.IN_DS2_*.csv tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2.csv

Read data#

import pandas as pd

fertility = pd.read_csv(
    "tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2.csv",
    skiprows=3,
)
fertility
Country Name Country Code Indicator Name Indicator Code 1960 1961 1962 1963 1964 1965 ... 2016 2017 2018 2019 2020 2021 2022 2023 2024 Unnamed: 69
0 Aruba ABW Fertility rate, total (births per woman) SP.DYN.TFRT.IN 4.567000 4.422000 4.262000 4.107000 3.940000 3.797000 ... 1.848000 1.785000 1.732000 1.701000 1.662000 1.631000 1.615000 1.602000 NaN NaN
1 Africa Eastern and Southern AFE Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.650330 6.667308 6.688246 6.709226 6.724930 6.737459 ... 4.615744 4.569884 4.521443 4.471338 4.412973 4.350683 4.287033 4.223771 NaN NaN
2 Afghanistan AFG Fertility rate, total (births per woman) SP.DYN.TFRT.IN 7.282000 7.284000 7.292000 7.302000 7.304000 7.305000 ... 5.542000 5.433000 5.327000 5.238000 5.145000 5.039000 4.932000 4.840000 NaN NaN
3 Africa Western and Central AFW Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.468887 6.478351 6.492277 6.500229 6.516739 6.532766 ... 5.228976 5.098885 4.962571 4.829134 4.707399 4.637741 4.563354 4.497707 NaN NaN
4 Angola AGO Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.708000 6.790000 6.872000 6.954000 7.036000 7.116000 ... 5.686000 5.600000 5.519000 5.442000 5.371000 5.304000 5.209000 5.124000 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
261 Kosovo XKX Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.359000 6.314000 6.244000 6.176000 6.112000 6.035000 ... 1.641000 1.618000 1.581000 1.575000 1.567000 1.561000 1.555000 1.545000 NaN NaN
262 Yemen, Rep. YEM Fertility rate, total (births per woman) SP.DYN.TFRT.IN 7.988000 8.000000 8.010000 8.028000 8.071000 8.101000 ... 4.613000 4.610000 4.607000 4.603000 4.600000 4.597000 4.593000 4.590000 NaN NaN
263 South Africa ZAF Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.105000 6.080000 6.046000 6.012000 5.955000 5.892000 ... 2.261000 2.283000 2.270000 2.264000 2.257000 2.248000 2.227000 2.216000 NaN NaN
264 Zambia ZMB Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.947000 6.982000 7.012000 7.060000 7.085000 7.116000 ... 4.669000 4.567000 4.492000 4.418000 4.323000 4.246000 4.175000 4.101000 NaN NaN
265 Zimbabwe ZWE Fertility rate, total (births per woman) SP.DYN.TFRT.IN 7.195000 7.212000 7.231000 7.230000 7.230000 7.231000 ... 3.828000 3.768000 3.744000 3.748000 3.754000 3.765000 3.767000 3.724000 NaN NaN

266 rows × 70 columns

Let’s look at the USA (arbitrarily) over time.

usa_fertility = fertility[fertility["Country Code"] == "USA"]
usa_fertility
Country Name Country Code Indicator Name Indicator Code 1960 1961 1962 1963 1964 1965 ... 2016 2017 2018 2019 2020 2021 2022 2023 2024 Unnamed: 69
251 United States USA Fertility rate, total (births per woman) SP.DYN.TFRT.IN 3.654 3.62 3.461 3.319 3.19 2.913 ... 1.8205 1.7655 1.7295 1.706 1.6415 1.664 1.6565 1.6165 NaN NaN

1 rows × 70 columns

Clean up#

Let’s get rid of columns we don’t need.

usa_fertility = usa_fertility.drop(
    columns=[
        "Country Name",
        "Country Code",
        "Indicator Name",
        "Indicator Code",
        "Unnamed: 69",
    ]
)
usa_fertility
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 ... 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
251 3.654 3.62 3.461 3.319 3.19 2.913 2.721 2.558 2.464 2.456 ... 1.8435 1.8205 1.7655 1.7295 1.706 1.6415 1.664 1.6565 1.6165 NaN

1 rows × 65 columns

Too wide! Let’s make it long.

melt()#

fertility_by_year = usa_fertility.melt(
    var_name="Year",
    value_name="Fertility Rate",
)

fertility_by_year
Year Fertility Rate
0 1960 3.6540
1 1961 3.6200
2 1962 3.4610
3 1963 3.3190
4 1964 3.1900
... ... ...
60 2020 1.6415
61 2021 1.6640
62 2022 1.6565
63 2023 1.6165
64 2024 NaN

65 rows × 2 columns

Line chart#

import plotly.express as px
fig = px.line(
    fertility_by_year,
    x="Year",
    y="Fertility Rate",
    title="USA fertility rate over time",
)
fig.show()