Lecture 3 demo solution

Lecture 3 demo solution#

Ensure the visualizations render properly across VSCode, Jupyter Book, etc. You can ignore this.

import plotly.io as pio

pio.renderers.default = "colab+notebook_connected+plotly_mimetype"

Load data#

This downloads data in an automated way; can jump ahead to reading the data.

Create the clean folder for the files.

!mkdir -p tmp
!rm -rf tmp/fertility*

Download the CSV in a ZIP.

!wget -O tmp/fertility.zip -nc 'https://api.worldbank.org/v2/en/indicator/SP.DYN.TFRT.IN?downloadformat=csv'

--2026-04-14 17:29:11--  https://api.worldbank.org/v2/en/indicator/SP.DYN.TFRT.IN?downloadformat=csv
Resolving api.worldbank.org (api.worldbank.org)... 172.64.145.25, 104.18.42.231
Connecting to api.worldbank.org (api.worldbank.org)|172.64.145.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 74530 (73K) [application/zip]
Saving to: ‘tmp/fertility.zip’

tmp/fertility.zip   100%[===================>]  72.78K  --.-KB/s    in 0.007s  

2026-04-14 17:29:12 (10.8 MB/s) - ‘tmp/fertility.zip’ saved [74530/74530]

!unzip tmp/fertility.zip -d tmp/fertility

Archive:  tmp/fertility.zip
  inflating: tmp/fertility/Metadata_Indicator_API_SP.DYN.TFRT.IN_DS2_en_csv_v2_114.csv  
  inflating: tmp/fertility/API_SP.DYN.TFRT.IN_DS2_en_csv_v2_114.csv  
  inflating: tmp/fertility/Metadata_Country_API_SP.DYN.TFRT.IN_DS2_en_csv_v2_114.csv  

Remove the exact file version number, since that changes.

!mv tmp/fertility/API_SP.DYN.TFRT.IN_DS2_*.csv tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2.csv

Read data#

import pandas as pd

fertility = pd.read_csv(
    "tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2.csv",
    skiprows=3,
)
fertility

	Country Name	Country Code	Indicator Name	Indicator Code	1960	1961	1962	1963	1964	1965	...	2017	2018	2019	2020	2021	2022	2023	2024	2025	Unnamed: 70
0	Aruba	ABW	Fertility rate, total (births per woman)	SP.DYN.TFRT.IN	4.567000	4.422000	4.262000	4.107000	3.940000	3.797000	...	1.785000	1.732000	1.701000	1.662000	1.631000	1.615000	1.602000	1.606000	NaN	NaN
1	Africa Eastern and Southern	AFE	Fertility rate, total (births per woman)	SP.DYN.TFRT.IN	6.650310	6.667308	6.688246	6.709226	6.724930	6.737459	...	4.569915	4.521454	4.471351	4.412999	4.350691	4.287080	4.223861	4.164044	NaN	NaN
2	Afghanistan	AFG	Fertility rate, total (births per woman)	SP.DYN.TFRT.IN	7.282000	7.284000	7.292000	7.302000	7.304000	7.305000	...	5.433000	5.327000	5.238000	5.145000	5.039000	4.932000	4.840000	4.761000	NaN	NaN
3	Africa Western and Central	AFW	Fertility rate, total (births per woman)	SP.DYN.TFRT.IN	6.468882	6.478345	6.492276	6.500230	6.516739	6.532771	...	5.098890	4.962572	4.829142	4.707405	4.637738	4.563357	4.497714	4.415983	NaN	NaN
4	Angola	AGO	Fertility rate, total (births per woman)	SP.DYN.TFRT.IN	6.708000	6.790000	6.872000	6.954000	7.036000	7.116000	...	5.600000	5.519000	5.442000	5.371000	5.304000	5.209000	5.124000	5.048000	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
261	Kosovo	XKX	Fertility rate, total (births per woman)	SP.DYN.TFRT.IN	6.359000	6.314000	6.244000	6.176000	6.112000	6.035000	...	1.618000	1.581000	1.575000	1.567000	1.561000	1.555000	1.545000	1.538000	NaN	NaN
262	Yemen, Rep.	YEM	Fertility rate, total (births per woman)	SP.DYN.TFRT.IN	7.988000	8.000000	8.010000	8.028000	8.071000	8.101000	...	4.610000	4.607000	4.603000	4.600000	4.597000	4.593000	4.590000	4.499000	NaN	NaN
263	South Africa	ZAF	Fertility rate, total (births per woman)	SP.DYN.TFRT.IN	6.105000	6.080000	6.046000	6.012000	5.955000	5.892000	...	2.283000	2.270000	2.264000	2.257000	2.248000	2.227000	2.216000	2.205000	NaN	NaN
264	Zambia	ZMB	Fertility rate, total (births per woman)	SP.DYN.TFRT.IN	6.947000	6.982000	7.012000	7.060000	7.085000	7.116000	...	4.567000	4.492000	4.418000	4.323000	4.246000	4.175000	4.101000	4.036000	NaN	NaN
265	Zimbabwe	ZWE	Fertility rate, total (births per woman)	SP.DYN.TFRT.IN	7.195000	7.212000	7.231000	7.230000	7.230000	7.231000	...	3.768000	3.744000	3.748000	3.754000	3.765000	3.767000	3.724000	3.674000	NaN	NaN

266 rows × 71 columns

Let’s look at the USA (arbitrarily) over time.

usa_fertility = fertility[fertility["Country Code"] == "USA"]
usa_fertility

	Country Name	Country Code	Indicator Name	Indicator Code	1960	1961	1962	1963	1964	1965	...	2017	2018	2019	2020	2021	2022	2023	2024	2025	Unnamed: 70
251	United States	USA	Fertility rate, total (births per woman)	SP.DYN.TFRT.IN	3.654	3.62	3.461	3.319	3.19	2.913	...	1.7655	1.7295	1.706	1.6415	1.664	1.6565	1.6165	1.6265	NaN	NaN

1 rows × 71 columns

Clean up#

Let’s get rid of columns we don’t need.

cols_to_drop = [
    "Country Name",
    "Country Code",
    "Indicator Name",
    "Indicator Code",
]
# Add any columns with "Unnamed: " in their name
cols_to_drop += [col for col in usa_fertility.columns if "Unnamed: " in col]

usa_fertility = usa_fertility.drop(columns=cols_to_drop)
usa_fertility

	1960	1961	1962	1963	1964	1965	1966	1967	1968	1969	...	2016	2017	2018	2019	2020	2021	2022	2023	2024	2025
251	3.654	3.62	3.461	3.319	3.19	2.913	2.721	2.558	2.464	2.456	...	1.8205	1.7655	1.7295	1.706	1.6415	1.664	1.6565	1.6165	1.6265	NaN

1 rows × 66 columns

Too wide! Let’s make it long.

`melt()`#

fertility_by_year = usa_fertility.melt(
    var_name="Year",
    value_name="Fertility Rate",
)

fertility_by_year

	Year	Fertility Rate
0	1960	3.6540
1	1961	3.6200
2	1962	3.4610
3	1963	3.3190
4	1964	3.1900
...	...	...
61	2021	1.6640
62	2022	1.6565
63	2023	1.6165
64	2024	1.6265
65	2025	NaN

66 rows × 2 columns

Line chart#

import plotly.express as px

fig = px.line(
    fertility_by_year,
    x="Year",
    y="Fertility Rate",
    title="USA fertility rate over time",
)
fig.show()

Chart improvements#

Best practice is to have the Y axis start at zero. Set the range:

max_fertility = fertility_by_year["Fertility Rate"].max()
fig.update_yaxes(range=[0, max_fertility])
fig.show()

All the years showing up at the bottom is a hint that those x values are strings.

fertility_by_year.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Data columns (total 2 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Year            66 non-null     object 
 1   Fertility Rate  65 non-null     float64
dtypes: float64(1), object(1)
memory usage: 1.2+ KB

fertility_by_year["Year"] = fertility_by_year["Year"].astype(int)

fig = px.line(
    fertility_by_year,
    x="Year",
    y="Fertility Rate",
    title="USA fertility rate over time",
)
fig.update_yaxes(range=[0, max_fertility])
fig.show()

Mapping#

column = "2023"

fig = px.choropleth_map(
    fertility,  # source data
    locations="Country Code",  # column name to match on
    geojson="https://raw.githubusercontent.com/nvkelso/natural-earth-vector/refs/heads/master/geojson/ne_50m_admin_0_countries.geojson",  # shapes
    featureidkey="properties.ADM0_ISO",  # GeoJSON property to match on
    color=column,  # column name for values
    labels={column: "Fertility rate"},  # change the name of the measurement
    title="Worldwide fertility rates in 2023",
    hover_name="Country Name",
    zoom=1,
    height=1000,
)
fig.show()