Lecture 3: Data reshaping and visualization#
“Data visualization”, “chart”, “graph”, and will be used interchangeably.
Please sign attendance sheet
Start by importing necessary packages#
import pandas as pd
import plotly.express as px
Fertility rates#
Jump ahead to line chart
Load data#
Create the clean folder for the files.
!mkdir -p tmp
!rm -rf tmp/fertility*
zsh:1: no matches found: tmp/fertility*
Download the CSV in a ZIP.
!wget -O tmp/fertility.zip -nc 'https://api.worldbank.org/v2/en/indicator/SP.DYN.TFRT.IN?downloadformat=csv'
--2025-02-03 14:33:13-- https://api.worldbank.org/v2/en/indicator/SP.DYN.TFRT.IN?downloadformat=csv
Resolving api.worldbank.org (api.worldbank.org)... 2606:4700:4400::6812:2ae7, 2606:4700:4400::ac40:9119, 172.64.145.25, ...
Connecting to api.worldbank.org (api.worldbank.org)|2606:4700:4400::6812:2ae7|:443... connected.
200 OKequest sent, awaiting response...
Length: 70659 (69K) [application/zip]
Saving to: ‘tmp/fertility.zip’
tmp/fertility.zip 100%[===================>] 69.00K --.-KB/s in 0.002s
2025-02-03 14:33:14 (33.6 MB/s) - ‘tmp/fertility.zip’ saved [70659/70659]
!unzip tmp/fertility.zip -d tmp/fertility
Archive: tmp/fertility.zip
inflating: tmp/fertility/Metadata_Indicator_API_SP.DYN.TFRT.IN_DS2_EN_csv_v2_2145.csv
inflating: tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2_2145.csv
inflating: tmp/fertility/Metadata_Country_API_SP.DYN.TFRT.IN_DS2_EN_csv_v2_2145.csv
Remove the exact file version number, since that changes.
!mv tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2_*.csv tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2.csv
fertility = pd.read_csv(
"tmp/fertility/API_SP.DYN.TFRT.IN_DS2_EN_csv_v2.csv",
skiprows=3,
)
fertility
Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | Unnamed: 68 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Aruba | ABW | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 4.820000 | 4.655000 | 4.471000 | 4.271000 | 4.059000 | 3.842000 | ... | 1.972000 | 1.953000 | 1.839000 | 1.587000 | 1.486000 | 1.325000 | 1.180000 | 1.179000 | NaN | NaN |
1 | Africa Eastern and Southern | AFE | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.723226 | 6.742124 | 6.762343 | 6.778133 | 6.787821 | 6.799767 | ... | 4.678619 | 4.616540 | 4.570828 | 4.526445 | 4.480001 | 4.412125 | 4.348564 | 4.281848 | NaN | NaN |
2 | Afghanistan | AFG | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 7.282000 | 7.284000 | 7.292000 | 7.302000 | 7.304000 | 7.305000 | ... | 5.405000 | 5.262000 | 5.129000 | 5.002000 | 4.870000 | 4.750000 | 4.643000 | 4.523000 | NaN | NaN |
3 | Africa Western and Central | AFW | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.459063 | 6.472099 | 6.492395 | 6.506528 | 6.525529 | 6.541022 | ... | 5.388622 | 5.333061 | 5.260538 | 5.191528 | 5.124368 | 5.055090 | 4.984759 | 4.903347 | NaN | NaN |
4 | Angola | AGO | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.708000 | 6.790000 | 6.872000 | 6.954000 | 7.036000 | 7.116000 | ... | 5.774000 | 5.686000 | 5.600000 | 5.519000 | 5.442000 | 5.371000 | 5.304000 | 5.209000 | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
261 | Kosovo | XKX | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.359000 | 6.350000 | 6.331000 | 6.296000 | 6.126000 | 5.927000 | ... | 1.751000 | 1.658000 | 1.649000 | 1.605000 | 1.546000 | 1.529000 | 1.522000 | 1.510000 | NaN | NaN |
262 | Yemen, Rep. | YEM | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 7.938000 | 7.963000 | 7.963000 | 7.981000 | 8.038000 | 8.067000 | ... | 4.322000 | 4.214000 | 4.112000 | 4.043000 | 3.963000 | 3.886000 | 3.795000 | 3.716000 | NaN | NaN |
263 | South Africa | ZAF | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 6.159000 | 6.138000 | 6.110000 | 6.077000 | 6.030000 | 5.967000 | ... | 2.359000 | 2.261000 | 2.334000 | 2.418000 | 2.475000 | 2.401000 | 2.374000 | 2.341000 | NaN | NaN |
264 | Zambia | ZMB | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 7.115000 | 7.169000 | 7.214000 | 7.249000 | 7.274000 | 7.291000 | ... | 4.793000 | 4.707000 | 4.614000 | 4.536000 | 4.451000 | 4.379000 | 4.308000 | 4.242000 | NaN | NaN |
265 | Zimbabwe | ZWE | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 7.220000 | 7.233000 | 7.255000 | 7.253000 | 7.261000 | 7.258000 | ... | 3.849000 | 3.771000 | 3.706000 | 3.659000 | 3.599000 | 3.545000 | 3.491000 | 3.437000 | NaN | NaN |
266 rows × 69 columns
Over time#
Let’s just look at the USA, arbitrarily.
usa_fertility = fertility[fertility["Country Code"] == "USA"]
usa_fertility
Country Name | Country Code | Indicator Name | Indicator Code | 1960 | 1961 | 1962 | 1963 | 1964 | 1965 | ... | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | Unnamed: 68 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
251 | United States | USA | Fertility rate, total (births per woman) | SP.DYN.TFRT.IN | 3.654 | 3.62 | 3.461 | 3.319 | 3.19 | 2.913 | ... | 1.8435 | 1.8205 | 1.7655 | 1.7295 | 1.706 | 1.6415 | 1.664 | 1.665 | NaN | NaN |
1 rows × 69 columns
Too wide! Let’s make it long.
Reshaping#
Like pivot tables in spreadsheets
usa_fertility = usa_fertility.drop(
columns=[
"Country Name",
"Country Code",
"Indicator Name",
"Indicator Code",
"Unnamed: 68",
]
)
usa_fertility
1960 | 1961 | 1962 | 1963 | 1964 | 1965 | 1966 | 1967 | 1968 | 1969 | ... | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
251 | 3.654 | 3.62 | 3.461 | 3.319 | 3.19 | 2.913 | 2.721 | 2.558 | 2.464 | 2.456 | ... | 1.8625 | 1.8435 | 1.8205 | 1.7655 | 1.7295 | 1.706 | 1.6415 | 1.664 | 1.665 | NaN |
1 rows × 64 columns
melt()
#
fertility_by_year = usa_fertility.melt(
var_name="Year",
value_name="Fertility Rate",
)
fertility_by_year
Year | Fertility Rate | |
---|---|---|
0 | 1960 | 3.6540 |
1 | 1961 | 3.6200 |
2 | 1962 | 3.4610 |
3 | 1963 | 3.3190 |
4 | 1964 | 3.1900 |
... | ... | ... |
59 | 2019 | 1.7060 |
60 | 2020 | 1.6415 |
61 | 2021 | 1.6640 |
62 | 2022 | 1.6650 |
63 | 2023 | NaN |
64 rows × 2 columns
Line chart#
fig = px.line(
fertility_by_year,
x="Year",
y="Fertility Rate",
title="Global fertility rate over time",
)
fig.show()