nuclear_explosions/index.qmd

---
title: Nuclear Explosions
subtitle: "Using python polars and seaborn to visualize global detonations"
description: "Using python polars and seaborn to visualize global detonations"
references:
    - type: techreport
      id: Bergkvist2000
      author:
        - family: Bergkvist
          given: Nils-Olov
        - family: Ferm
          given: Ragnhild
      issued:
        date-parts:
          - - 2000
            - 7
            - 1
      title: "Nuclear Explosions 1945 - 1998"
      page: 1-42
      issn: 1104-9154
pubDate: "2024-07-03T18:36:26"
weight: 10
tags:
  - python
---

```{python}
#| label: setup
#| echo: false
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
import polars as pl
import polars.selectors as cs
import seaborn as sns
from matplotlib import pyplot as plt

sns.set_theme(style="darkgrid")
sns.set_context("notebook")
cp=sns.color_palette()
country_colors = {
    "US": cp[0],
    "USSR": cp[3],
    "France": cp[6],
    "UK": cp[5],
    "China": cp[4],
    "India": cp[1],
    "Pakistan": cp[2],
}
```

```{python}
# | label: data-prep
# | code-fold: true
schema_overrides = (
    {
        col: pl.Categorical
        for col in ["type", "name", "purpose", "country", "source", "region"]
    }
    | {col: pl.Float64 for col in ["magnitude_body", "magnitude_surface"]}
    | {col: pl.String for col in ["year", "name"]}
)

cty_alias = {
    "PAKIST": "Pakistan",
    "FRANCE": "France",
    "CHINA": "China",
    "INDIA": "India",
    "USA": "US",
}


def cty_replace(name: str) -> str:
    if name in cty_alias:
        return cty_alias[name]
    return name


df = (
    pl.read_csv(
        "data/nuclear_explosions.csv",
        schema_overrides=schema_overrides,
        null_values=["NA"],
    )
    .with_columns(
        date=pl.col("year").str.strptime(pl.Date, "%Y"),
        country=pl.col("country").map_elements(cty_replace, return_dtype=pl.String),
    )
    .with_columns(year=pl.col("date").dt.year().cast(pl.Int32))
)
```

## Introduction

The following is a re-creation and expansion of some of the graphs found in the
@Bergkvist2000 produced report on nuclear explosions between 1945 and 1998. It
is primarily a reproduction of key plots from the original report.
Additionally, it serves as an exercise in plotting with the python library
seaborn and the underlying matplotlib. Lastly, it approaches some less well
tread territory for data science in the python universe as it uses the python
library polars-rs for data loading and transformation. All the code used to
transform the data and create the plots is available directly within the full
text document, and separately as well. PDF and Docx formats are available with
the plotting results only.

The authors' original purpose was the collection of a long list of all the
nuclear explosions occurring between those years, as well as analysing the
responsible nations, tracking the types and purposes of the explosions and
connecting the rise and fall of nuclear explosion numbers to historical events
throughout.

## Total numbers

::: {.callout-note}

## Nuclear devices

There are two main kinds of nuclear device: those based entirely, on fission,
or the splitting of heavy atomic nuclei (previously known as atomic devices)
and those in which the main energy is obtained by means of fusion, or of -light
atomic nuclei (hydrogen or thermonuclear devices). A fusion explosion must
however be initiated with the help of a fission device. The strength of a
fusion explosion can be practically unlimited. The explosive power of a
nuclear explosion is expressed in kilotons, (kt) or megatons (Mt), which
correspond to 1000 and 1 million tonnes, of conventional explosive (TNT),
respectively.

[@Bergkvist2000, 6]
:::

We begin by investigating a table containing all the absolute counts and yields
each country had explode, seen in @tbl-yields.

```{python}
# | label: tbl-yields
# | tbl-cap: "Total number and yields of explosions"
# | output: asis

from great_tables import GT

df_yields = (
    df.select(["country", "id_no", "yield_lower", "yield_upper"])
    .with_columns(yield_avg=pl.mean_horizontal(pl.col(["yield_lower", "yield_upper"])))
    .group_by("country")
    .agg(
        pl.col("id_no").len().alias("count"),
        pl.col("yield_avg").sum(),
    )
    .with_columns(yield_per_ex=pl.col("yield_avg") / pl.col("count"))
    .sort("count", descending=True)
)

us_row = df_yields.filter(pl.col("country") == "US")
yields_above_us = df_yields.filter(
    pl.col("yield_per_ex") > us_row["yield_per_ex"]
).sort("yield_per_ex", descending=True)
assert len(yields_above_us) == 3, "Yield per explosion desc needs updating!"

tab=(
    GT(df_yields)
    .tab_source_note(
        source_note="Source: Author's elaboration based on Bergkvist and Ferm (2000)."
    )
    .tab_spanner(label="Totals", columns=["count", "yield_avg"])
    .tab_stub(rowname_col="country")
    .tab_stubhead(label="Country")
    .cols_label(count="Count", yield_avg="Yield in kt", yield_per_ex="Yield average")
    .fmt_integer(columns="count")
    .fmt_number(columns="yield_avg", decimals=1)
    .fmt_number(columns="yield_per_ex", decimals=1)
)
del df_yields
tab
```

It is interesting to note that while the US undoubtedly had the highest raw
number of explosions, it did not, in fact, output the highest estimated
detonation yields.
In fact, `{python} len(yields_above_us)` countries have a higher average
explosion yield per detonation than the US:
`{python} yields_above_us[0]["country"].item()` leads with an average of
`{python} f"{yields_above_us[0]['yield_per_ex'].item():.2f}"` kt,
before
`{python} yields_above_us[1]["country"].item()` with an average of
`{python} f"{yields_above_us[1]['yield_per_ex'].item():.2f}"` kt.

## Numbers over time

In the examination of global nuclear detonations, our initial focus shall be
quantifying the annual incidence of the events in aggregate. While it obscures
the specific details of the responsible nations and which diversity of types
tested, it instead paints a much stronger picture of the overall abstracted
dimension of nuclear testing throughout history, as depicted in @fig-total.

```{python}
# | label: fig-total
# | fig-cap: "Total Nuclear explosions, 1945-98"
per_year = df.group_by(pl.col("year")).agg(pl.len()).sort("year")
with sns.axes_style(
    "darkgrid", {"xtick.bottom": True, "ytick.left": True}
):
    g = sns.barplot(data=per_year, x="year", y="len", order=range(1945, 1999), width=1)
    g.set_xlabel("Year")
    g.set_ylabel("Count")
    plt.setp(
        g.get_xticklabels(),
        rotation=90,
        ha="right",
        va="center",
        rotation_mode="anchor",
    )  # ensure rotated right-anchor
    g.set_xticks(g.get_xticks(), minor=True) # enable minor ticks every entry
    g.set_xticks(g.get_xticks()[::2])  # enable major ticks every 2nd entry
    plt.show()
del per_year
```

As we can see, the number of explosions rises increasingly towards 1957 and
sharply until 1958, before dropping off for a year in 1959. The reason for this
drop should primarily be found in the start of the 'Treaty of Test Ban' which
put limits and restraints on the testing of above-ground nuclear armaments, as
discussed in the original article. Above all the contract signals the
prohibition of radioactive debris to fall beyond a nation's respective
territorial bounds.

However, this contract should perhaps not be viewed as the only reason: With
political and cultural shifts throughout the late 1950s and early 1960s
increasingly focusing on the fallout and horror of nuclear warfare a burgeoning
public opposition to nuclear testing and instead a push towards disarmament was
taking hold. The increased focus on the space race between the US and USSR may
have detracted from the available funds, human resources and agenda attention
for nuclear testing. Lastly, with nuclear testing policies strongly shaped by
the political dynamics of the Cold War, a period of improved diplomatic
relations such as the late 1950s prior to the Cuban missile crisis may directly
affect the output of nuclear testing facilities between various powers.

<!-- TODO: Extract exact numbers from data on-the-fly -->
There is another, very steep, rise in 1962 with over 175 recorded explosions,
before an even sharper drop-off the following year down to just 50 explosions.
Afterward the changes appear less sharp and the changes remain between 77 and
24 explosions per year, with a slight downward tendency.

While these numbers show the overall proliferation of nuclear power, let us now
instead turn to the contributions by individual countries. A split in the number
of explosions over time by country can be seen in @fig-percountry.

```{python}
# | label: fig-percountry
# | fig-cap: "Nuclear explosions by country, 1945-98"

keys = df.select("date").unique().join(df.select("country").unique(), how="cross")
per_country = keys.join(
    df.group_by(["date", "country"], maintain_order=True).len(),
    on=["date", "country"],
    how="left",
    coalesce=True,
).with_columns(pl.col("len").fill_null(0))

g = sns.lineplot(data=per_country, x="date", y="len", hue="country", palette=country_colors)
g.set_xlabel("Year")
g.set_ylabel("Count")
plt.setp(
    g.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor"
)  # ensure rotated right-anchor
plt.show()
del per_country
```

Once again we can see the visibly steep ramp-up to 1962, though it becomes
clear that this was driven both by the USSR and the US. Of course the graph
also makes visible the sheer unmatched number of explosions emanating from both
of the countries, with only France catching up to the US numbers and China
ultimately overtaking them in the 1990s.

However, here it also becomes more clear how the UK was responsible for some
early explosions in the late 1950s and early 1960s already, as well as the rise
in France's nuclear testing from the early 1960s onwards to around 1980, before
slowly decreasing in intensity afterward.

Let us turn to a cross-cut through the explosions in @fig-groundlevel, focusing
on the number of explosions that have occurred underground and above-ground
respectively.[^aboveground]

[^aboveground]: Detonations counted as above ground are made up of atmospheric,
airdrop, tower, balloon, barge or ship, rocket and water surface detonations.
Any other detonation is counted as below ground, primarily taking place in
tunnels, shafts and galleries.

```{python}
# | label: fig-groundlevel
# | fig-cap: "Nuclear explosions above and below ground, 1945-98"
from polars import Boolean

above_cat = pl.Series(
    [
        "ATMOSPH",
        "AIRDROP",
        "TOWER",
        "BALLOON",
        "SURFACE",
        "BARGE",
        "ROCKET",
        "SPACE",
        "SHIP",
        "WATERSUR",
        "WATER SU",
    ]
)
df_groundlevel = (
    df.with_columns(
        above_ground=pl.col("type").map_elements(
            lambda x: True if x in above_cat else False, return_dtype=Boolean
        )
    )
    .group_by(pl.col("date", "country", "above_ground"))
    .agg(count=pl.len())
    .sort("date")
)

with sns.axes_style("darkgrid", {"xtick.bottom": True, "ytick.left": True}):
    for above_ground in [True, False]:
        g = sns.histplot(
            data=df_groundlevel.filter(
                pl.col("above_ground") == above_ground
            ).with_columns(
                count=pl.col("count") * (1 if above_ground else -1),
            ),
            x="date",
            weights="count",
            hue="country",
            multiple="stack",
            binwidth=365,
            palette=country_colors,
        )

    g.xaxis.set_major_locator(mdates.YearLocator(base=5))
    g.xaxis.set_minor_locator(mdates.YearLocator())
    plt.setp(
        g.get_xticklabels(), rotation=90, ha="right", va="top", rotation_mode="anchor"
    )
    # FIXME get dynamic range for yticks instead of hardcoding
    g.set_yticks(np.arange(-130, 140, 20))
    g.set_yticks(np.arange(-130, 140, 10), minor=True)
    plt.show()
del df_groundlevel
```

This plot paints a different picture yet again: while overall the number of
explosions still rise and fall with some early sharp spikes, we can see a clear
shift from above-ground to underground tests, starting with the year 1962.

## Locations

Finally, let's view a map of the world with the explosions marked, separated by country.

::: {.content-visible when-format="html"}
Hovering over individual explosions will show their year
while a click will open more information in a panel.
The map can be seen in @fig-worldmap-html.
:::

::: {.content-visible unless-format="html"}
The map can be seen in @fig-worldmap-static.
:::

```{python}
# | label: worldmap-setup
# | output: false
import folium
import geopandas as gpd

df_pd = df.with_columns().to_pandas().set_index("date")
gdf = gpd.GeoDataFrame(
    df_pd,
    crs="EPSG:4326",
    geometry=gpd.points_from_xy(x=df_pd["longitude"], y=df_pd["latitude"]),
)
del df_pd

def rgb_to_hex(rgb: tuple[float,float,float]) -> str:
    return "#" + "".join([format(int(c*255), '02x') for c in rgb])

m = folium.Map(tiles="cartodb positron")
for country in country_colors.keys():
    fg = folium.FeatureGroup(name=country, show=True).add_to(m)
    folium.GeoJson(
        gdf[gdf["country"] == country],
        name="Nuclear Explosions",
        marker=folium.Circle(radius=3, fill_opacity=0.4),
        style_function=lambda x: {
            "color": rgb_to_hex(country_colors[x["properties"]["country"]]),
            "radius": (
                x["properties"]["magnitude_body"]
                if x["properties"]["magnitude_body"] > 0
                else 1.0
            )
            * 10,
        },
        tooltip=folium.GeoJsonTooltip(fields=["year", "country", "type"]),
        highlight_function=lambda x: {"fillOpacity": 0.8},
        popup=folium.GeoJsonPopup(
            fields=[
                "year",
                "country",
                "region",
                "source",
                "latitude",
                "longitude",
                "magnitude_body",
                "magnitude_surface",
                "depth",
                "yield_lower",
                "yield_upper",
                "purpose",
                "name",
                "type",
            ]
        ),
    ).add_to(fg)
folium.LayerControl().add_to(m)
```

::: {.content-visible when-format="html"}

```{python}
# | label: fig-worldmap-html
# | fig-cap: World map of nuclear explosions, 1945-98
html_string= m.get_root().render()
with open("map.html", "w") as f:
    f.write(html_string)
m
```

:::

::: {.content-hidden when-format="html" width=80%}

```{python}
# | label: fig-worldmap-static
# | fig-cap: World map of nuclear explosions, 1945-98
# ENSURE SELENIUM IS INSTALLED
from PIL import Image
import io
img = m._to_png()

bimg = io.BytesIO(img)
Image.open(bimg).save("map.png")
```

![World map of nuclear explosions, 1945-98](./map.png){#fig-worldmap-static}

:::

While there are undoubtedly more aspects of the data that provide interesting
patterns for analysis, this shall be the extent of review for the time being
for this reproduction.

We can see how the combination of python polars and seaborn makes the process
relatively approachable, understandable and, combined with the rendering output
by quarto, fully reproducible.

Additionally, we can see how additional projects can be included to produce
interactive graphs and maps with tools such as folium and geopandas.

::: {.callout-note .content-visible when-format="markdown"}

Accessing other views

Feel free to download and peruse the full repo with source documents
[here](https://git.martyoeh.me/datasci/nuclear_explosions),
as well as the [pdf](2024-07-02-nuclear-explosions-analysis/index.pdf)
or [docx](2024-07-02-nuclear-explosions-analysis/index.docx) versions.

:::

## References

::: {#refs}
:::