initial commit
This commit is contained in:
commit
cceb1d1ec0
6 changed files with 5987 additions and 0 deletions
2052
data/nuclear_explosions.csv
Normal file
2052
data/nuclear_explosions.csv
Normal file
File diff suppressed because it is too large
Load diff
54
notebooks/groupby_keep_zero-values.qmd
Normal file
54
notebooks/groupby_keep_zero-values.qmd
Normal file
|
@ -0,0 +1,54 @@
|
|||
<!-- TODO: Load missing data from main nuclear_explosions.qmd notebook -->
|
||||
|
||||
The following is a simple groupby, counting the len of country rows per date:
|
||||
|
||||
```{python}
|
||||
# | label: fig-percountry-drop
|
||||
# | fig-cap: "Nuclear explosions by country, 1945-98"
|
||||
per_country = df.group_by(pl.col("date", "country")).agg(pl.len()).sort("date")
|
||||
|
||||
g = sns.lineplot(data=per_country, x="date", y="len", hue="country")
|
||||
g.set_xlabel("Year")
|
||||
g.set_ylabel("Count")
|
||||
plt.setp(
|
||||
g.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor"
|
||||
) # ensure rotated right-anchor
|
||||
plt.show()
|
||||
```
|
||||
|
||||
This works well to group generally, but there is an issue:
|
||||
If there is a year where a country did not have any entries at all,
|
||||
the resulting df will not have `Date | Cty | 0` but instead will not have an entry at all.
|
||||
|
||||
This can be desirable for some applications, but for example if we then
|
||||
draw a line plot based on this it would interpolate between the
|
||||
country values and **not drop the line down to 0 for the years where a country does not have an entry**.
|
||||
|
||||
We can fix it by first doing a cross product of all keys we always want to have a row for.
|
||||
Then we do the group by but supply it to a left-join on this cross product.
|
||||
|
||||
End result is we keep all the rows from the cross-product, but we still aggregate and have a len
|
||||
column as before. For those where we don't have a len value we finally just fill in a 0 instead.
|
||||
|
||||
```{python}
|
||||
# | label: fig-percountry-keep
|
||||
# | fig-cap: "Nuclear explosions by country, 1945-98"
|
||||
keys = df.select("date").unique().join(df.select("country").unique(), how="cross")
|
||||
per_country = keys.join(
|
||||
df.group_by(["date", "country"], maintain_order=True).len(),
|
||||
on=["date", "country"],
|
||||
how="left",
|
||||
coalesce=True,
|
||||
).with_columns(pl.col("len").fill_null(0))
|
||||
|
||||
g = sns.lineplot(data=per_country, x="date", y="len", hue="country")
|
||||
g.set_xlabel("Year")
|
||||
g.set_ylabel("Count")
|
||||
plt.setp(
|
||||
g.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor"
|
||||
) # ensure rotated right-anchor
|
||||
plt.show()
|
||||
```
|
||||
|
||||
A more nicely function-based solution (though using the same solution under the hood) can be found
|
||||
here: https://github.com/pola-rs/polars/issues/15997#issuecomment-2089362557
|
75
notebooks/sns_objects-style.qmd
Normal file
75
notebooks/sns_objects-style.qmd
Normal file
|
@ -0,0 +1,75 @@
|
|||
|
||||
constructed with seaborn object-style plots instead.
|
||||
These kind of plots are much more structured for the workflow I use and the way I think about plotting,
|
||||
clearly delineating between a plot;
|
||||
some visual on the plot;
|
||||
some statistical transformation;
|
||||
some movement, labeling or scaling operation.
|
||||
They are also, however, fairly new and still considered experimental.
|
||||
|
||||
They also don't allow *quite* the customization that the other plots do,
|
||||
and seem either a little buggy or I have not fully understood them yet in regards to ticks and labels.
|
||||
|
||||
```{python}
|
||||
# | label: fig-groundlevel-so
|
||||
# | fig-cap: "Nuclear explosions, 1945-98"
|
||||
import seaborn.objects as so
|
||||
import matplotlib.dates as mdates
|
||||
|
||||
above_cat = pl.Series(
|
||||
[
|
||||
"ATMOSPH",
|
||||
"AIRDROP",
|
||||
"TOWER",
|
||||
"BALLOON",
|
||||
"SURFACE",
|
||||
"BARGE",
|
||||
"ROCKET",
|
||||
"SPACE",
|
||||
"SHIP",
|
||||
"WATERSUR",
|
||||
"WATER SU",
|
||||
]
|
||||
)
|
||||
df_groundlevel = (
|
||||
df.with_columns(
|
||||
above_ground=pl.col("type").map_elements(
|
||||
lambda x: True if x in above_cat else False, return_dtype=bool
|
||||
))
|
||||
.group_by(pl.col("year", "country", "above_ground"))
|
||||
.agg(count=pl.len())
|
||||
.sort("year")
|
||||
)
|
||||
|
||||
fig, ax = plt.subplots()
|
||||
ax.xaxis.set_tick_params(rotation=90)
|
||||
|
||||
from seaborn import axes_style
|
||||
p = (
|
||||
so.Plot(df_groundlevel, x="year", y="count", color="country")
|
||||
.add(
|
||||
so.Bars(),
|
||||
so.Stack(),
|
||||
data=df_groundlevel.filter(pl.col("above_ground") == True).sort("country"),
|
||||
)
|
||||
.add(
|
||||
so.Bars(),
|
||||
so.Stack(),
|
||||
data=df_groundlevel.filter(pl.col("above_ground") == False).with_columns(
|
||||
count=pl.col("count") * -1
|
||||
).sort("country"),
|
||||
)
|
||||
.label(x="Year", y="Count")
|
||||
.scale(
|
||||
x=so.Continuous().tick(locator=mdates.YearLocator(base=5), minor=4).label(like="{x:.0f}"),
|
||||
# x=so.Nominal().tick(locator=mdates.YearLocator(base=5), minor=4), # this might work in the future
|
||||
)
|
||||
.theme({
|
||||
**axes_style("darkgrid"),
|
||||
"xtick.bottom": True,
|
||||
"ytick.left": True
|
||||
})
|
||||
.on(ax)
|
||||
.plot()
|
||||
)
|
||||
```
|
382
nuclear_explosions.qmd
Normal file
382
nuclear_explosions.qmd
Normal file
|
@ -0,0 +1,382 @@
|
|||
---
|
||||
title: Nuclear Explosions
|
||||
author: Marty Oehme
|
||||
output-dir: out
|
||||
references:
|
||||
- type: techreport
|
||||
id: Bergkvist2000
|
||||
author:
|
||||
- family: Bergkvist
|
||||
given: Nils-Olov
|
||||
- family: Ferm
|
||||
given: Ragnhild
|
||||
issued:
|
||||
date-parts:
|
||||
- - 2000
|
||||
- 7
|
||||
- 1
|
||||
title: "Nuclear Explosions 1945 - 1998"
|
||||
page: 1-42
|
||||
issn: 1104-9154
|
||||
format:
|
||||
html:
|
||||
toc: true
|
||||
code-fold: true
|
||||
typst:
|
||||
toc: true
|
||||
echo: false
|
||||
docx:
|
||||
toc: true
|
||||
echo: false
|
||||
---
|
||||
|
||||
```{python}
|
||||
#| label: setup
|
||||
#| echo: false
|
||||
import matplotlib.dates as mdates
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import polars as pl
|
||||
import polars.selectors as cs
|
||||
import seaborn as sns
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
sns.set_theme(style="darkgrid")
|
||||
sns.set_context("notebook")
|
||||
|
||||
schema_overrides = (
|
||||
{
|
||||
col: pl.Categorical
|
||||
for col in ["type", "name", "purpose", "country", "source", "region"]
|
||||
}
|
||||
| {col: pl.Float64 for col in ["magnitude_body", "magnitude_surface"]}
|
||||
| {col: pl.String for col in ["year", "name"]}
|
||||
)
|
||||
|
||||
df = (
|
||||
pl.read_csv(
|
||||
"data/nuclear_explosions.csv",
|
||||
schema_overrides=schema_overrides,
|
||||
null_values=["NA"],
|
||||
)
|
||||
.with_columns(date=pl.col("year").str.strptime(pl.Date, "%Y"))
|
||||
.with_columns(year=pl.col("date").dt.year().cast(pl.Int32))
|
||||
)
|
||||
```
|
||||
|
||||
## Introduction
|
||||
|
||||
The following is a re-creation and expansion of some of the graphs found in the
|
||||
@Bergkvist2000 produced report on nuclear explosions between 1945 and 1998. It
|
||||
is primarily a reproduction of key plots from the original report.
|
||||
Additionally, it serves as a exercise in plotting with the python library
|
||||
seaborn and the underlying matplotlib. Lastly, it approaches some less well
|
||||
tread territory for data science in the python universe as it uses the python
|
||||
library polars-rs for data loading and transformation. All the code used to
|
||||
transform the data and create the plots is available directly within the full
|
||||
text document, and separately as well. PDF and Docx formats are available with
|
||||
the plotting results only.
|
||||
|
||||
Their original purpose was the collection of a long list of all the nuclear
|
||||
explosions occurring between those years, as well as analysing the responsible
|
||||
nations, tracking the types and purposes of the explosions, as well as
|
||||
connecting the rise and fall of nuclear explosion numbers to historical events
|
||||
throughout.
|
||||
|
||||
## Total numbers
|
||||
|
||||
::: {.callout-note}
|
||||
|
||||
## Nuclear devices
|
||||
|
||||
There are two main kinds of nuclear device: those based entirely, on fission,
|
||||
or the splitting of heavy atomic nucleii (previously known as atomic devices)
|
||||
and those in which the main energy is obtained by means of fusion, or of -light
|
||||
atomic nucleii (hydrogen or thermonuclear devices). A fusion explosion must
|
||||
however be initiated with the help of a fission device. The strength of a
|
||||
fusion explosion can be practically unlimited. The explosive power of a
|
||||
nuclear explosion is expressed in ktlotons, (kt) or megatons (Mt), which
|
||||
correspond to 1000 and i million'tonnes, of conventional explosive (TNT),
|
||||
respectively.
|
||||
|
||||
[@Bergkvist2000, 6]
|
||||
:::
|
||||
|
||||
We begin by investigating a table containing all the absolute counts and yields
|
||||
each country had explode, seen in @tbl-yields.
|
||||
|
||||
```{python}
|
||||
# | label: tbl-yields
|
||||
# | tbl-cap: "Total number and yields of explosions"
|
||||
|
||||
from great_tables import GT, md
|
||||
|
||||
df_yields = (
|
||||
df.select(["country", "id_no", "yield_lower", "yield_upper"])
|
||||
.with_columns(yield_avg=pl.mean_horizontal(pl.col(["yield_lower", "yield_upper"])))
|
||||
.group_by("country")
|
||||
.agg(
|
||||
pl.col("id_no").len().alias("count"),
|
||||
pl.col("yield_avg").sum(),
|
||||
)
|
||||
# .with_columns(country=pl.col("country").cast(pl.String).str.to_titlecase())
|
||||
.sort("count", descending=True)
|
||||
)
|
||||
|
||||
(
|
||||
GT(df_yields)
|
||||
.tab_source_note(
|
||||
source_note="Source: Author's elaboration based on Bergkvist and Ferm (2000)."
|
||||
)
|
||||
.tab_spanner(label="Totals", columns=["count", "yield_avg"])
|
||||
.tab_stub(rowname_col="country")
|
||||
.tab_stubhead(label="Country")
|
||||
.cols_label(
|
||||
count="Count",
|
||||
yield_avg="Yield in kt",
|
||||
)
|
||||
.fmt_integer(columns="count")
|
||||
.fmt_number(columns="yield_avg", decimals=1)
|
||||
)
|
||||
```
|
||||
|
||||
## Numbers over time
|
||||
|
||||
When investigating the nuclear explosions in the world, let us first start by
|
||||
looking at how many explosions occurred each year in total. This hides the
|
||||
specific details of who was responsible and which types were involved but
|
||||
instead paints a much stronger picture of the overall dimension of nuclear
|
||||
testing, as can be seen in @fig-total.
|
||||
|
||||
```{python}
|
||||
# | label: fig-total
|
||||
# | fig-cap: "Total Nuclear explosions, 1945-98"
|
||||
per_year = df.group_by(pl.col("year")).agg(pl.len()).sort("year")
|
||||
with sns.axes_style(
|
||||
"darkgrid", {"xtick.bottom": True, "ytick.left": True}
|
||||
):
|
||||
g = sns.barplot(data=per_year, x="year", y="len", order=range(1945, 1999), width=1)
|
||||
g.set_xlabel("Year")
|
||||
g.set_ylabel("Count")
|
||||
plt.setp(
|
||||
g.get_xticklabels(),
|
||||
rotation=90,
|
||||
ha="right",
|
||||
va="center",
|
||||
rotation_mode="anchor",
|
||||
) # ensure rotated right-anchor
|
||||
g.set_xticks(g.get_xticks(), minor=True) # enable minor ticks every entry
|
||||
g.set_xticks(g.get_xticks()[::2]) # enable major ticks every 2nd entry
|
||||
plt.show()
|
||||
del per_year
|
||||
```
|
||||
|
||||
As we can see, the numbers of explosions rise increasingly towards 1957 and
|
||||
sharply until 1958, before dropping off for a year in 1959. The reasons for
|
||||
this drop are not entirely clear, but it is very likely that the data are
|
||||
simply missing for these years.
|
||||
<!-- FIXME: The reasons for this are a non-proliferation pact, in article -->
|
||||
|
||||
<!-- TODO: Extract exact numbers from data on-the-fly -->
|
||||
There is another, very steep, rise in 1962 with over 175 recorded explosions,
|
||||
before an even sharper drop-off the following year down to just 50 explosions.
|
||||
|
||||
Afterwards the changes appear less sharp and the changes remain between 77 and
|
||||
24 explosions per year, with a slight downward tendency.
|
||||
|
||||
While these numbers show the overall proliferation of nuclear power, let us now
|
||||
instead turn to the contributions by individual countries. A split in the number
|
||||
of explosions over time by country can be seen in @fig-percountry.
|
||||
|
||||
```{python}
|
||||
# | label: fig-percountry
|
||||
# | fig-cap: "Nuclear explosions by country, 1945-98"
|
||||
keys = df.select("date").unique().join(df.select("country").unique(), how="cross")
|
||||
per_country = keys.join(
|
||||
df.group_by(["date", "country"], maintain_order=True).len(),
|
||||
on=["date", "country"],
|
||||
how="left",
|
||||
coalesce=True,
|
||||
).with_columns(pl.col("len").fill_null(0))
|
||||
|
||||
g = sns.lineplot(data=per_country, x="date", y="len", hue="country")
|
||||
g.set_xlabel("Year")
|
||||
g.set_ylabel("Count")
|
||||
plt.setp(
|
||||
g.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor"
|
||||
) # ensure rotated right-anchor
|
||||
plt.show()
|
||||
del per_country
|
||||
```
|
||||
|
||||
Once again we can see the visibly steep ramp-up to 1962, though it becomes
|
||||
clear that this was driven both by the USSR and the US. Of course the graph
|
||||
also makes visible the sheer unmatched number of explosions emenating from both
|
||||
of the countries, with only France catching up to the US numbers and China
|
||||
ultimately overtaking them in the 1990s.
|
||||
|
||||
However, here it also becomes more clear how the UK was responsible for some
|
||||
early explosions in the late 1950s and early 1960s already, as well as the rise
|
||||
in France's nuclear testing from the early 1960s onwards to around 1980, before
|
||||
slowly decreasing in intensity afterwards.
|
||||
|
||||
Let us turn to a cross-cut through the explosions in @fig-groundlevel, focusing
|
||||
on the number of explosions that have occurred underground and above-ground
|
||||
respectively.[^aboveground]
|
||||
|
||||
[^aboveground]: Detonations counted as above ground are made up of atmospheric,
|
||||
airdrop, tower, balloon, barge or ship, rocket and water surface detonations.
|
||||
Any other detonation is counted as below ground, primarily taking place in
|
||||
tunnels, shafts and galleries.
|
||||
|
||||
```{python}
|
||||
# | label: fig-groundlevel
|
||||
# | fig-cap: "Nuclear explosions above and below ground, 1945-98"
|
||||
from polars import Boolean
|
||||
|
||||
above_cat = pl.Series(
|
||||
[
|
||||
"ATMOSPH",
|
||||
"AIRDROP",
|
||||
"TOWER",
|
||||
"BALLOON",
|
||||
"SURFACE",
|
||||
"BARGE",
|
||||
"ROCKET",
|
||||
"SPACE",
|
||||
"SHIP",
|
||||
"WATERSUR",
|
||||
"WATER SU",
|
||||
]
|
||||
)
|
||||
df_groundlevel = (
|
||||
df.with_columns(
|
||||
above_ground=pl.col("type").map_elements(
|
||||
lambda x: True if x in above_cat else False, return_dtype=Boolean
|
||||
)
|
||||
)
|
||||
.group_by(pl.col("date", "country", "above_ground"))
|
||||
.agg(count=pl.len())
|
||||
.sort("date")
|
||||
)
|
||||
|
||||
with sns.axes_style("darkgrid", {"xtick.bottom": True, "ytick.left": True}):
|
||||
for above_ground in [True, False]:
|
||||
g = sns.histplot(
|
||||
data=df_groundlevel.filter(
|
||||
pl.col("above_ground") == above_ground
|
||||
).with_columns(
|
||||
count=pl.col("count") * (1 if above_ground else -1),
|
||||
),
|
||||
x="date",
|
||||
weights="count",
|
||||
hue="country",
|
||||
multiple="stack",
|
||||
binwidth=365,
|
||||
)
|
||||
|
||||
g.xaxis.set_major_locator(mdates.YearLocator(base=5))
|
||||
g.xaxis.set_minor_locator(mdates.YearLocator())
|
||||
plt.setp(
|
||||
g.get_xticklabels(), rotation=90, ha="right", va="top", rotation_mode="anchor"
|
||||
)
|
||||
# FIXME get dynamic range for yticks instead of hardcoding
|
||||
g.set_yticks(np.arange(-130, 140, 20))
|
||||
g.set_yticks(np.arange(-130, 140, 10), minor=True)
|
||||
plt.show()
|
||||
del df_groundlevel
|
||||
```
|
||||
|
||||
This plot paints a different picture yet again: while overall the number of
|
||||
explosions still rise and fall with some early sharp spikes, we can see a clear
|
||||
shift from above-ground to underground tests, starting with the year 1962.
|
||||
|
||||
## Locations
|
||||
|
||||
Finally, let's view a map of the world with the explosions marked.
|
||||
|
||||
```{python}
|
||||
# | label: fig-worldmap
|
||||
# | fig-cap: "World map of nuclear explosions, 1945-98"
|
||||
import folium
|
||||
import geopandas as gpd
|
||||
from shapely.geometry import Point
|
||||
|
||||
def set_style() -> pl.Expr:
|
||||
return (
|
||||
pl.when(pl.col("country") == "USSR")
|
||||
.then(pl.lit({"color": "red"}, allow_object=True))
|
||||
.otherwise(pl.lit({"color": "blue"}, allow_object=True))
|
||||
)
|
||||
|
||||
|
||||
geom = [Point(xy) for xy in zip(df["longitude"], df["latitude"])]
|
||||
# df_pd = df.with_columns(style=set_style()).to_pandas().set_index("date")
|
||||
df_pd = df.with_columns().to_pandas().set_index("date")
|
||||
gdf = gpd.GeoDataFrame(
|
||||
df_pd,
|
||||
crs="EPSG:4326",
|
||||
geometry=gpd.points_from_xy(x=df_pd["longitude"], y=df_pd["latitude"]),
|
||||
)
|
||||
del df_pd
|
||||
|
||||
country_colors = {
|
||||
"USA": "darkblue",
|
||||
"USSR": "darkred",
|
||||
"FRANCE": "pink",
|
||||
"UK": "black",
|
||||
"CHINA": "purple",
|
||||
"INDIA": "orange",
|
||||
"PAKIST": "green",
|
||||
}
|
||||
|
||||
m = folium.Map(tiles="cartodb positron")
|
||||
for country in country_colors.keys():
|
||||
fg = folium.FeatureGroup(name=country, show=True).add_to(m)
|
||||
folium.GeoJson(
|
||||
gdf[gdf["country"].str.contains(country)],
|
||||
name="Nuclear Explosions",
|
||||
marker=folium.Circle(radius=3, fill_opacity=0.4),
|
||||
style_function=lambda x: {
|
||||
"color": country_colors[x["properties"]["country"]],
|
||||
"radius": (
|
||||
x["properties"]["magnitude_body"]
|
||||
if x["properties"]["magnitude_body"] > 0
|
||||
else 1.0
|
||||
)
|
||||
* 10,
|
||||
},
|
||||
tooltip=folium.GeoJsonTooltip(fields=["year", "country", "type"]),
|
||||
highlight_function=lambda x: {"fillOpacity": 0.8},
|
||||
popup=folium.GeoJsonPopup(
|
||||
fields=[
|
||||
"year",
|
||||
"country",
|
||||
"region",
|
||||
"source",
|
||||
"latitude",
|
||||
"longitude",
|
||||
"magnitude_body",
|
||||
"magnitude_surface",
|
||||
"depth",
|
||||
"yield_lower",
|
||||
"yield_upper",
|
||||
"purpose",
|
||||
"name",
|
||||
"type",
|
||||
]
|
||||
),
|
||||
).add_to(fg)
|
||||
folium.LayerControl().add_to(m)
|
||||
m
|
||||
```
|
||||
|
||||
That is all for now.
|
||||
There are undoubtedly more explorations to undertake,
|
||||
but this is it for the time being.
|
||||
|
||||
<!-- Ideas TODO:
|
||||
- do not just use 'count' of explosions but yields
|
||||
- compare number to yields for ctrys
|
||||
- count up total number per country in table
|
||||
-->
|
3392
poetry.lock
generated
Normal file
3392
poetry.lock
generated
Normal file
File diff suppressed because it is too large
Load diff
32
pyproject.toml
Normal file
32
pyproject.toml
Normal file
|
@ -0,0 +1,32 @@
|
|||
[tool.poetry]
|
||||
name = "nuclear-explosions"
|
||||
version = "0.1.0"
|
||||
description = ""
|
||||
authors = ["Marty Oehme <marty.oehme@gmail.com>"]
|
||||
readme = "README.md"
|
||||
package-mode = false
|
||||
|
||||
[tool.poetry.dependencies]
|
||||
python = "^3.12"
|
||||
polars = "^0.20.31"
|
||||
seaborn = "^0.13.2"
|
||||
pipefunc = "^0.18.1"
|
||||
pyarrow = "^16.1.0"
|
||||
great-tables = "^0.9.0"
|
||||
geopandas = "^0.14.4"
|
||||
folium = "^0.17.0"
|
||||
|
||||
|
||||
[tool.poetry.group.dev.dependencies]
|
||||
pynvim = "^0.5.0"
|
||||
pyperclip = "^1.8.2"
|
||||
jupyter-client = "^8.6.2"
|
||||
jupyter = "^1.0.0"
|
||||
|
||||
[build-system]
|
||||
requires = ["poetry-core"]
|
||||
build-backend = "poetry.core.masonry.api"
|
||||
|
||||
[tool.pyright]
|
||||
typeCheckingMode = "basic"
|
||||
reportUnusedExpression = false
|
Loading…
Reference in a new issue