Lecture 3 demo solution#
Load data#
This downloads data in an automated way; can jump ahead to reading the data.
Create the clean folder for the files.
!mkdir -p tmp
!rm -rf tmp/fertility*
Download the CSV in a ZIP.
!wget -O tmp/fertility.zip -nc 'https://api.worldbank.org/v2/en/indicator/SP.DYN.TFRT.IN?downloadformat=csv'
--2025-04-19 17:40:56-- https://api.worldbank.org/v2/en/indicator/SP.DYN.TFRT.IN?downloadformat=csv
Resolving api.worldbank.org (api.worldbank.org)... 2606:4700:4400::6812:2ae7, 2606:4700:4400::ac40:9119, 104.18.42.231, ...
Connecting to api.worldbank.org (api.worldbank.org)|2606:4700:4400::6812:2ae7|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 73012 (71K) [application/zip]
Saving to: ‘tmp/fertility.zip’
tmp/fertility.zip 100%[===================>] 71.30K --.-KB/s in 0.002s
2025-04-19 17:40:56 (36.2 MB/s) - ‘tmp/fertility.zip’ saved [73012/73012]
HTTP request sent, awaiting response...
200 OK
Length: 70659 (69K) [application/zip]
Saving to: ‘tmp/fertility.zip’
tmp/fertility.zip 0%[ ] 0 --.-KB/s
tmp/fertility.zip 100%[===================>] 69.00K --.-KB/s in 0.002s
2025-03-17 00:00:54 (33.0 MB/s) - ‘tmp/fertility.zip’ saved [70659/70659]
!unzip tmp/fertility.zip -d tmp/fertility
Archive: tmp/fertility.zip
inflating: tmp/fertility/Metadata_Indicator_API_SP.DYN.TFRT.IN_DS2_en_csv_v2_19650.csv
inflating: tmp/fertility/API_SP.DYN.TFRT.IN_DS2_en_csv_v2_19650.csv
inflating: tmp/fertility/Metadata_Country_API_SP.DYN.TFRT.IN_DS2_en_csv_v2_19650.csv
inflating: tmp/fertility/Metadata_Country_API_SP.DYN.TFRT.IN_DS2_EN_csv_v2_2145.csv
Remove the exact file version number, since that changes.
!mv tmp/fertility/API_SP.DYN.TFRT.IN_DS2_*.csv tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2.csv
Read data#
import pandas as pd
fertility = pd.read_csv(
"tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2.csv",
skiprows=3,
)
fertility
Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | Unnamed: 69 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Aruba | ABW | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 4.567000 | 4.422000 | 4.262000 | 4.107000 | 3.940000 | 3.797000 | ... | 1.848000 | 1.785000 | 1.732000 | 1.701000 | 1.662000 | 1.631000 | 1.615000 | 1.602000 | NaN | NaN |
1 | Africa Eastern and Southern | AFE | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.650330 | 6.667308 | 6.688246 | 6.709226 | 6.724930 | 6.737459 | ... | 4.615744 | 4.569884 | 4.521443 | 4.471338 | 4.412973 | 4.350683 | 4.287033 | 4.223771 | NaN | NaN |
2 | Afghanistan | AFG | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 7.282000 | 7.284000 | 7.292000 | 7.302000 | 7.304000 | 7.305000 | ... | 5.542000 | 5.433000 | 5.327000 | 5.238000 | 5.145000 | 5.039000 | 4.932000 | 4.840000 | NaN | NaN |
3 | Africa Western and Central | AFW | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.468887 | 6.478351 | 6.492277 | 6.500229 | 6.516739 | 6.532766 | ... | 5.228976 | 5.098885 | 4.962571 | 4.829134 | 4.707399 | 4.637741 | 4.563354 | 4.497707 | NaN | NaN |
4 | Angola | AGO | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.708000 | 6.790000 | 6.872000 | 6.954000 | 7.036000 | 7.116000 | ... | 5.686000 | 5.600000 | 5.519000 | 5.442000 | 5.371000 | 5.304000 | 5.209000 | 5.124000 | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
261 | Kosovo | XKX | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.359000 | 6.314000 | 6.244000 | 6.176000 | 6.112000 | 6.035000 | ... | 1.641000 | 1.618000 | 1.581000 | 1.575000 | 1.567000 | 1.561000 | 1.555000 | 1.545000 | NaN | NaN |
262 | Yemen, Rep. | YEM | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 7.988000 | 8.000000 | 8.010000 | 8.028000 | 8.071000 | 8.101000 | ... | 4.613000 | 4.610000 | 4.607000 | 4.603000 | 4.600000 | 4.597000 | 4.593000 | 4.590000 | NaN | NaN |
263 | South Africa | ZAF | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.105000 | 6.080000 | 6.046000 | 6.012000 | 5.955000 | 5.892000 | ... | 2.261000 | 2.283000 | 2.270000 | 2.264000 | 2.257000 | 2.248000 | 2.227000 | 2.216000 | NaN | NaN |
264 | Zambia | ZMB | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.947000 | 6.982000 | 7.012000 | 7.060000 | 7.085000 | 7.116000 | ... | 4.669000 | 4.567000 | 4.492000 | 4.418000 | 4.323000 | 4.246000 | 4.175000 | 4.101000 | NaN | NaN |
265 | Zimbabwe | ZWE | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 7.195000 | 7.212000 | 7.231000 | 7.230000 | 7.230000 | 7.231000 | ... | 3.828000 | 3.768000 | 3.744000 | 3.748000 | 3.754000 | 3.765000 | 3.767000 | 3.724000 | NaN | NaN |
266 rows × 70 columns
Let’s look at the USA (arbitrarily) over time.
usa_fertility = fertility[fertility["Country Code"] == "USA"]
usa_fertility
Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | Unnamed: 69 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
251 | United States | USA | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 3.654 | 3.62 | 3.461 | 3.319 | 3.19 | 2.913 | ... | 1.8205 | 1.7655 | 1.7295 | 1.706 | 1.6415 | 1.664 | 1.6565 | 1.6165 | NaN | NaN |
1 rows × 70 columns
Clean up#
Let’s get rid of columns we don’t need.
usa_fertility = usa_fertility.drop(
columns=[
"Country Name",
"Country Code",
"Indicator Name",
"Indicator Code",
"Unnamed: 69",
]
)
usa_fertility
1960 | 1961 | 1962 | 1963 | 1964 | 1965 | 1966 | 1967 | 1968 | 1969 | ... | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
251 | 3.654 | 3.62 | 3.461 | 3.319 | 3.19 | 2.913 | 2.721 | 2.558 | 2.464 | 2.456 | ... | 1.8435 | 1.8205 | 1.7655 | 1.7295 | 1.706 | 1.6415 | 1.664 | 1.6565 | 1.6165 | NaN |
1 rows × 65 columns
Too wide! Let’s make it long.
melt()
#
fertility_by_year = usa_fertility.melt(
var_name="Year",
value_name="Fertility Rate",
)
fertility_by_year
Year | Fertility Rate | |
---|---|---|
0 | 1960 | 3.6540 |
1 | 1961 | 3.6200 |
2 | 1962 | 3.4610 |
3 | 1963 | 3.3190 |
4 | 1964 | 3.1900 |
... | ... | ... |
60 | 2020 | 1.6415 |
61 | 2021 | 1.6640 |
62 | 2022 | 1.6565 |
63 | 2023 | 1.6165 |
64 | 2024 | NaN |
65 rows × 2 columns
Line chart#
import plotly.express as px
fig = px.line(
fertility_by_year,
x="Year",
y="Fertility Rate",
title="USA fertility rate over time",
)
fig.show()