Lecture 3 demo solution#

Ensure the visualizations render properly across VSCode, Jupyter Book, etc. You can ignore this.

import plotly.io as pio

pio.renderers.default = "colab+notebook_connected+plotly_mimetype"

Load data#

This downloads data in an automated way; can jump ahead to reading the data.

Create the clean folder for the files.

!mkdir -p tmp
!rm -rf tmp/fertility*

Download the CSV in a ZIP.

!wget -O tmp/fertility.zip -nc 'https://api.worldbank.org/v2/en/indicator/SP.DYN.TFRT.IN?downloadformat=csv'
--2026-04-14 17:29:11--  https://api.worldbank.org/v2/en/indicator/SP.DYN.TFRT.IN?downloadformat=csv
Resolving api.worldbank.org (api.worldbank.org)... 172.64.145.25, 104.18.42.231
Connecting to api.worldbank.org (api.worldbank.org)|172.64.145.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 74530 (73K) [application/zip]
Saving to: ‘tmp/fertility.zip’

tmp/fertility.zip   100%[===================>]  72.78K  --.-KB/s    in 0.007s  

2026-04-14 17:29:12 (10.8 MB/s) - ‘tmp/fertility.zip’ saved [74530/74530]
!unzip tmp/fertility.zip -d tmp/fertility
Archive:  tmp/fertility.zip
  inflating: tmp/fertility/Metadata_Indicator_API_SP.DYN.TFRT.IN_DS2_en_csv_v2_114.csv  
  inflating: tmp/fertility/API_SP.DYN.TFRT.IN_DS2_en_csv_v2_114.csv  
  inflating: tmp/fertility/Metadata_Country_API_SP.DYN.TFRT.IN_DS2_en_csv_v2_114.csv  

Remove the exact file version number, since that changes.

!mv tmp/fertility/API_SP.DYN.TFRT.IN_DS2_*.csv tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2.csv

Read data#

import pandas as pd

fertility = pd.read_csv(
    "tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2.csv",
    skiprows=3,
)
fertility
Country Name Country Code Indicator Name Indicator Code 1960 1961 1962 1963 1964 1965 ... 2017 2018 2019 2020 2021 2022 2023 2024 2025 Unnamed: 70
0 Aruba ABW Fertility rate, total (births per woman) SP.DYN.TFRT.IN 4.567000 4.422000 4.262000 4.107000 3.940000 3.797000 ... 1.785000 1.732000 1.701000 1.662000 1.631000 1.615000 1.602000 1.606000 NaN NaN
1 Africa Eastern and Southern AFE Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.650310 6.667308 6.688246 6.709226 6.724930 6.737459 ... 4.569915 4.521454 4.471351 4.412999 4.350691 4.287080 4.223861 4.164044 NaN NaN
2 Afghanistan AFG Fertility rate, total (births per woman) SP.DYN.TFRT.IN 7.282000 7.284000 7.292000 7.302000 7.304000 7.305000 ... 5.433000 5.327000 5.238000 5.145000 5.039000 4.932000 4.840000 4.761000 NaN NaN
3 Africa Western and Central AFW Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.468882 6.478345 6.492276 6.500230 6.516739 6.532771 ... 5.098890 4.962572 4.829142 4.707405 4.637738 4.563357 4.497714 4.415983 NaN NaN
4 Angola AGO Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.708000 6.790000 6.872000 6.954000 7.036000 7.116000 ... 5.600000 5.519000 5.442000 5.371000 5.304000 5.209000 5.124000 5.048000 NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
261 Kosovo XKX Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.359000 6.314000 6.244000 6.176000 6.112000 6.035000 ... 1.618000 1.581000 1.575000 1.567000 1.561000 1.555000 1.545000 1.538000 NaN NaN
262 Yemen, Rep. YEM Fertility rate, total (births per woman) SP.DYN.TFRT.IN 7.988000 8.000000 8.010000 8.028000 8.071000 8.101000 ... 4.610000 4.607000 4.603000 4.600000 4.597000 4.593000 4.590000 4.499000 NaN NaN
263 South Africa ZAF Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.105000 6.080000 6.046000 6.012000 5.955000 5.892000 ... 2.283000 2.270000 2.264000 2.257000 2.248000 2.227000 2.216000 2.205000 NaN NaN
264 Zambia ZMB Fertility rate, total (births per woman) SP.DYN.TFRT.IN 6.947000 6.982000 7.012000 7.060000 7.085000 7.116000 ... 4.567000 4.492000 4.418000 4.323000 4.246000 4.175000 4.101000 4.036000 NaN NaN
265 Zimbabwe ZWE Fertility rate, total (births per woman) SP.DYN.TFRT.IN 7.195000 7.212000 7.231000 7.230000 7.230000 7.231000 ... 3.768000 3.744000 3.748000 3.754000 3.765000 3.767000 3.724000 3.674000 NaN NaN

266 rows × 71 columns

Let’s look at the USA (arbitrarily) over time.

usa_fertility = fertility[fertility["Country Code"] == "USA"]
usa_fertility
Country Name Country Code Indicator Name Indicator Code 1960 1961 1962 1963 1964 1965 ... 2017 2018 2019 2020 2021 2022 2023 2024 2025 Unnamed: 70
251 United States USA Fertility rate, total (births per woman) SP.DYN.TFRT.IN 3.654 3.62 3.461 3.319 3.19 2.913 ... 1.7655 1.7295 1.706 1.6415 1.664 1.6565 1.6165 1.6265 NaN NaN

1 rows × 71 columns

Clean up#

Let’s get rid of columns we don’t need.

cols_to_drop = [
    "Country Name",
    "Country Code",
    "Indicator Name",
    "Indicator Code",
]
# Add any columns with "Unnamed: " in their name
cols_to_drop += [col for col in usa_fertility.columns if "Unnamed: " in col]

usa_fertility = usa_fertility.drop(columns=cols_to_drop)
usa_fertility
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 ... 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025
251 3.654 3.62 3.461 3.319 3.19 2.913 2.721 2.558 2.464 2.456 ... 1.8205 1.7655 1.7295 1.706 1.6415 1.664 1.6565 1.6165 1.6265 NaN

1 rows × 66 columns

Too wide! Let’s make it long.

melt()#

fertility_by_year = usa_fertility.melt(
    var_name="Year",
    value_name="Fertility Rate",
)

fertility_by_year
Year Fertility Rate
0 1960 3.6540
1 1961 3.6200
2 1962 3.4610
3 1963 3.3190
4 1964 3.1900
... ... ...
61 2021 1.6640
62 2022 1.6565
63 2023 1.6165
64 2024 1.6265
65 2025 NaN

66 rows × 2 columns

Line chart#

import plotly.express as px
fig = px.line(
    fertility_by_year,
    x="Year",
    y="Fertility Rate",
    title="USA fertility rate over time",
)
fig.show()

Chart improvements#

Best practice is to have the Y axis start at zero. Set the range:

max_fertility = fertility_by_year["Fertility Rate"].max()
fig.update_yaxes(range=[0, max_fertility])
fig.show()

All the years showing up at the bottom is a hint that those x values are strings.

fertility_by_year.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Data columns (total 2 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Year            66 non-null     object 
 1   Fertility Rate  65 non-null     float64
dtypes: float64(1), object(1)
memory usage: 1.2+ KB
fertility_by_year["Year"] = fertility_by_year["Year"].astype(int)

fig = px.line(
    fertility_by_year,
    x="Year",
    y="Fertility Rate",
    title="USA fertility rate over time",
)
fig.update_yaxes(range=[0, max_fertility])
fig.show()

Mapping#

column = "2023"

fig = px.choropleth_map(
    fertility,  # source data
    locations="Country Code",  # column name to match on
    geojson="https://raw.githubusercontent.com/nvkelso/natural-earth-vector/refs/heads/master/geojson/ne_50m_admin_0_countries.geojson",  # shapes
    featureidkey="properties.ADM0_ISO",  # GeoJSON property to match on
    color=column,  # column name for values
    labels={column: "Fertility rate"},  # change the name of the measurement
    title="Worldwide fertility rates in 2023",
    hover_name="Country Name",
    zoom=1,
    height=1000,
)
fig.show()