Lecture 3 demo solution#
Ensure the visualizations render properly across VSCode, Jupyter Book, etc. You can ignore this.
import plotly.io as pio
pio.renderers.default = "colab+notebook_connected+plotly_mimetype"
Load data#
This downloads data in an automated way; can jump ahead to reading the data.
Create the clean folder for the files.
!mkdir -p tmp
!rm -rf tmp/fertility*
Download the CSV in a ZIP.
!wget -O tmp/fertility.zip -nc 'https://api.worldbank.org/v2/en/indicator/SP.DYN.TFRT.IN?downloadformat=csv'
--2026-04-14 17:29:11-- https://api.worldbank.org/v2/en/indicator/SP.DYN.TFRT.IN?downloadformat=csv
Resolving api.worldbank.org (api.worldbank.org)... 172.64.145.25, 104.18.42.231
Connecting to api.worldbank.org (api.worldbank.org)|172.64.145.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 74530 (73K) [application/zip]
Saving to: ‘tmp/fertility.zip’
tmp/fertility.zip 100%[===================>] 72.78K --.-KB/s in 0.007s
2026-04-14 17:29:12 (10.8 MB/s) - ‘tmp/fertility.zip’ saved [74530/74530]
!unzip tmp/fertility.zip -d tmp/fertility
Archive: tmp/fertility.zip
inflating: tmp/fertility/Metadata_Indicator_API_SP.DYN.TFRT.IN_DS2_en_csv_v2_114.csv
inflating: tmp/fertility/API_SP.DYN.TFRT.IN_DS2_en_csv_v2_114.csv
inflating: tmp/fertility/Metadata_Country_API_SP.DYN.TFRT.IN_DS2_en_csv_v2_114.csv
Remove the exact file version number, since that changes.
!mv tmp/fertility/API_SP.DYN.TFRT.IN_DS2_*.csv tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2.csv
Read data#
import pandas as pd
fertility = pd.read_csv(
"tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2.csv",
skiprows=3,
)
fertility
| Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | 2025 | Unnamed: 70 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Aruba | ABW | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 4.567000 | 4.422000 | 4.262000 | 4.107000 | 3.940000 | 3.797000 | ... | 1.785000 | 1.732000 | 1.701000 | 1.662000 | 1.631000 | 1.615000 | 1.602000 | 1.606000 | NaN | NaN |
| 1 | Africa Eastern and Southern | AFE | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.650310 | 6.667308 | 6.688246 | 6.709226 | 6.724930 | 6.737459 | ... | 4.569915 | 4.521454 | 4.471351 | 4.412999 | 4.350691 | 4.287080 | 4.223861 | 4.164044 | NaN | NaN |
| 2 | Afghanistan | AFG | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 7.282000 | 7.284000 | 7.292000 | 7.302000 | 7.304000 | 7.305000 | ... | 5.433000 | 5.327000 | 5.238000 | 5.145000 | 5.039000 | 4.932000 | 4.840000 | 4.761000 | NaN | NaN |
| 3 | Africa Western and Central | AFW | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.468882 | 6.478345 | 6.492276 | 6.500230 | 6.516739 | 6.532771 | ... | 5.098890 | 4.962572 | 4.829142 | 4.707405 | 4.637738 | 4.563357 | 4.497714 | 4.415983 | NaN | NaN |
| 4 | Angola | AGO | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.708000 | 6.790000 | 6.872000 | 6.954000 | 7.036000 | 7.116000 | ... | 5.600000 | 5.519000 | 5.442000 | 5.371000 | 5.304000 | 5.209000 | 5.124000 | 5.048000 | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 261 | Kosovo | XKX | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.359000 | 6.314000 | 6.244000 | 6.176000 | 6.112000 | 6.035000 | ... | 1.618000 | 1.581000 | 1.575000 | 1.567000 | 1.561000 | 1.555000 | 1.545000 | 1.538000 | NaN | NaN |
| 262 | Yemen, Rep. | YEM | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 7.988000 | 8.000000 | 8.010000 | 8.028000 | 8.071000 | 8.101000 | ... | 4.610000 | 4.607000 | 4.603000 | 4.600000 | 4.597000 | 4.593000 | 4.590000 | 4.499000 | NaN | NaN |
| 263 | South Africa | ZAF | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.105000 | 6.080000 | 6.046000 | 6.012000 | 5.955000 | 5.892000 | ... | 2.283000 | 2.270000 | 2.264000 | 2.257000 | 2.248000 | 2.227000 | 2.216000 | 2.205000 | NaN | NaN |
| 264 | Zambia | ZMB | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.947000 | 6.982000 | 7.012000 | 7.060000 | 7.085000 | 7.116000 | ... | 4.567000 | 4.492000 | 4.418000 | 4.323000 | 4.246000 | 4.175000 | 4.101000 | 4.036000 | NaN | NaN |
| 265 | Zimbabwe | ZWE | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 7.195000 | 7.212000 | 7.231000 | 7.230000 | 7.230000 | 7.231000 | ... | 3.768000 | 3.744000 | 3.748000 | 3.754000 | 3.765000 | 3.767000 | 3.724000 | 3.674000 | NaN | NaN |
266 rows × 71 columns
Let’s look at the USA (arbitrarily) over time.
usa_fertility = fertility[fertility["Country Code"] == "USA"]
usa_fertility
| Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | 2025 | Unnamed: 70 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 251 | United States | USA | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 3.654 | 3.62 | 3.461 | 3.319 | 3.19 | 2.913 | ... | 1.7655 | 1.7295 | 1.706 | 1.6415 | 1.664 | 1.6565 | 1.6165 | 1.6265 | NaN | NaN |
1 rows × 71 columns
Clean up#
Let’s get rid of columns we don’t need.
cols_to_drop = [
"Country Name",
"Country Code",
"Indicator Name",
"Indicator Code",
]
# Add any columns with "Unnamed: " in their name
cols_to_drop += [col for col in usa_fertility.columns if "Unnamed: " in col]
usa_fertility = usa_fertility.drop(columns=cols_to_drop)
usa_fertility
| 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | 1966 | 1967 | 1968 | 1969 | ... | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | 2025 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 251 | 3.654 | 3.62 | 3.461 | 3.319 | 3.19 | 2.913 | 2.721 | 2.558 | 2.464 | 2.456 | ... | 1.8205 | 1.7655 | 1.7295 | 1.706 | 1.6415 | 1.664 | 1.6565 | 1.6165 | 1.6265 | NaN |
1 rows × 66 columns
Too wide! Let’s make it long.
melt()#
fertility_by_year = usa_fertility.melt(
var_name="Year",
value_name="Fertility Rate",
)
fertility_by_year
| Year | Fertility Rate | |
|---|---|---|
| 0 | 1960 | 3.6540 |
| 1 | 1961 | 3.6200 |
| 2 | 1962 | 3.4610 |
| 3 | 1963 | 3.3190 |
| 4 | 1964 | 3.1900 |
| ... | ... | ... |
| 61 | 2021 | 1.6640 |
| 62 | 2022 | 1.6565 |
| 63 | 2023 | 1.6165 |
| 64 | 2024 | 1.6265 |
| 65 | 2025 | NaN |
66 rows × 2 columns
Line chart#
import plotly.express as px
fig = px.line(
fertility_by_year,
x="Year",
y="Fertility Rate",
title="USA fertility rate over time",
)
fig.show()
Chart improvements#
Best practice is to have the Y axis start at zero. Set the range:
max_fertility = fertility_by_year["Fertility Rate"].max()
fig.update_yaxes(range=[0, max_fertility])
fig.show()
All the years showing up at the bottom is a hint that those x values are strings.
fertility_by_year.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 66 entries, 0 to 65
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Year 66 non-null object
1 Fertility Rate 65 non-null float64
dtypes: float64(1), object(1)
memory usage: 1.2+ KB
fertility_by_year["Year"] = fertility_by_year["Year"].astype(int)
fig = px.line(
fertility_by_year,
x="Year",
y="Fertility Rate",
title="USA fertility rate over time",
)
fig.update_yaxes(range=[0, max_fertility])
fig.show()
Mapping#
column = "2023"
fig = px.choropleth_map(
fertility, # source data
locations="Country Code", # column name to match on
geojson="https://raw.githubusercontent.com/nvkelso/natural-earth-vector/refs/heads/master/geojson/ne_50m_admin_0_countries.geojson", # shapes
featureidkey="properties.ADM0_ISO", # GeoJSON property to match on
color=column, # column name for values
labels={column: "Fertility rate"}, # change the name of the measurement
title="Worldwide fertility rates in 2023",
hover_name="Country Name",
zoom=1,
height=1000,
)
fig.show()