Update article and rename to index.qmd

2024-07-03 16:29:23 +02:00 · 2024-07-03 16:29:23 +02:00 · 7d1d929b0e
parent fed35fcfd2
commit 7d1d929b0e
1 changed files with 145 additions and 76 deletions
--- a/nuclear_explosions.qmd
+++ b/nuclear_explosions.qmd
@ -1,7 +1,5 @@
 ---
 title: Nuclear Explosions
 author: Marty Oehme
 output-dir: out
 references:
    - type: techreport
      id: Bergkvist2000
@ -18,16 +16,6 @@ references:
      title: "Nuclear Explosions 1945 - 1998"
      page: 1-42
      issn: 1104-9154
 format:
    html:
        toc: true
        code-fold: true
    typst:
        toc: true
        echo: false
    docx:
        toc: true
        echo: false
 ---
 ```{python}
@ -43,7 +31,21 @@ from matplotlib import pyplot as plt
 sns.set_theme(style="darkgrid")
 sns.set_context("notebook")
 cp=sns.color_palette()
 country_colors = {
    "US": cp[0],
    "USSR": cp[3],
    "France": cp[6],
    "UK": cp[5],
    "China": cp[4],
    "India": cp[1],
    "Pakistan": cp[2],
 }
 ```
 ```{python}
 # | label: data-prep
 # | echo: false
 schema_overrides = (
    {
        col: pl.Categorical
@ -53,13 +55,31 @@ schema_overrides = (
    | {col: pl.String for col in ["year", "name"]}
 )
 cty_alias = {
    "PAKIST": "Pakistan",
    "FRANCE": "France",
    "CHINA": "China",
    "INDIA": "India",
    "USA": "US",
 }
 def cty_replace(name: str) -> str:
    if name in cty_alias:
        return cty_alias[name]
    return name
 df = (
    pl.read_csv(
        "data/nuclear_explosions.csv",
        schema_overrides=schema_overrides,
        null_values=["NA"],
    )
-    .with_columns(date=pl.col("year").str.strptime(pl.Date, "%Y"))
+    .with_columns(
        date=pl.col("year").str.strptime(pl.Date, "%Y"),
        country=pl.col("country").map_elements(cty_replace, return_dtype=pl.String),
    )
    .with_columns(year=pl.col("date").dt.year().cast(pl.Int32))
 )
 ```
@ -69,7 +89,7 @@ df = (
 The following is a re-creation and expansion of some of the graphs found in the
@Bergkvist2000 produced report on nuclear explosions between 1945 and 1998. It
 is primarily a reproduction of key plots from the original report.
-Additionally, it serves as a exercise in plotting with the python library
+Additionally, it serves as an exercise in plotting with the python library
 seaborn and the underlying matplotlib. Lastly, it approaches some less well
 tread territory for data science in the python universe as it uses the python
 library polars-rs for data loading and transformation. All the code used to
@ -77,9 +97,9 @@ transform the data and create the plots is available directly within the full
 text document, and separately as well. PDF and Docx formats are available with
 the plotting results only.
-Their original purpose was the collection of a long list of all the nuclear
+The authors' original purpose was the collection of a long list of all the
-explosions occurring between those years, as well as analysing the responsible
+nuclear explosions occurring between those years, as well as analysing the
-nations, tracking the types and purposes of the explosions, as well as
+responsible nations, tracking the types and purposes of the explosions and
 connecting the rise and fall of nuclear explosion numbers to historical events
 throughout.
@ -90,13 +110,13 @@ throughout.
 ## Nuclear devices
 There are two main kinds of nuclear device: those based entirely, on fission,
-or the splitting of heavy atomic nucleii (previously known as atomic devices)
+or the splitting of heavy atomic nuclei (previously known as atomic devices)
 and those in which the main energy is obtained by means of fusion, or of -light
-atomic nucleii (hydrogen or thermonuclear devices). A fusion explosion must
+atomic nuclei (hydrogen or thermonuclear devices). A fusion explosion must
 however be initiated with the help of a fission device. The strength of a
 fusion explosion can be practically unlimited. The explosive power of a
-nuclear explosion is expressed in ktlotons, (kt) or megatons (Mt), which
+nuclear explosion is expressed in kilotons, (kt) or megatons (Mt), which
-correspond to 1000 and i million'tonnes, of conventional explosive (TNT),
+correspond to 1000 and 1 million tonnes, of conventional explosive (TNT),
 respectively.
 [@Bergkvist2000, 6]
@ -108,8 +128,9 @@ each country had explode, seen in @tbl-yields.
 ```{python}
 # | label: tbl-yields
 # | tbl-cap: "Total number and yields of explosions"
 # | output: asis
-from great_tables import GT, md
+from great_tables import GT
 df_yields = (
    df.select(["country", "id_no", "yield_lower", "yield_upper"])
@ -119,11 +140,17 @@ df_yields = (
        pl.col("id_no").len().alias("count"),
        pl.col("yield_avg").sum(),
    )
-    # .with_columns(country=pl.col("country").cast(pl.String).str.to_titlecase())
+    .with_columns(yield_per_ex=pl.col("yield_avg") / pl.col("count"))
    .sort("count", descending=True)
 )
-(
+us_row = df_yields.filter(pl.col("country") == "US")
 yields_above_us = df_yields.filter(
    pl.col("yield_per_ex") > us_row["yield_per_ex"]
 ).sort("yield_per_ex", descending=True)
 assert len(yields_above_us) == 3, "Yield per explosion desc needs updating!"
 tab=(
    GT(df_yields)
    .tab_source_note(
        source_note="Source: Author's elaboration based on Bergkvist and Ferm (2000)."
@ -131,22 +158,33 @@ df_yields = (
    .tab_spanner(label="Totals", columns=["count", "yield_avg"])
    .tab_stub(rowname_col="country")
    .tab_stubhead(label="Country")
-    .cols_label(
+    .cols_label(count="Count", yield_avg="Yield in kt", yield_per_ex="Yield average")
        count="Count",
        yield_avg="Yield in kt",
    )
    .fmt_integer(columns="count")
    .fmt_number(columns="yield_avg", decimals=1)
    .fmt_number(columns="yield_per_ex", decimals=1)
 )
 del df_yields
 tab
 ```
 It is interesting to note that while the US undoubtedly had the highest raw
 number of explosions, it did not, in fact, output the highest estimated
 detonation yields.
 In fact, `{python} len(yields_above_us)` countries have a higher average
 explosion yield per detonation than the US:
 `{python} yields_above_us[0]["country"].item()` leads with an average of
 `{python} f"{yields_above_us[0]['yield_per_ex'].item():.2f}"` kt,
 before
 `{python} yields_above_us[1]["country"].item()` with an average of
 `{python} f"{yields_above_us[1]['yield_per_ex'].item():.2f}"` kt.
 ## Numbers over time
-When investigating the nuclear explosions in the world, let us first start by
+In the examination of global nuclear detonations, our initial focus shall be
-looking at how many explosions occurred each year in total. This hides the
+quantifying the annual incidence of the events in aggregate. While it obscures
-specific details of who was responsible and which types were involved but
+the specific details of the responsible nations and which diversity of types
-instead paints a much stronger picture of the overall dimension of nuclear
+tested, it instead paints a much stronger picture of the overall abstracted
-testing, as can be seen in @fig-total.
+dimension of nuclear testing throughout history, as depicted in @fig-total.
 ```{python}
 # | label: fig-total
@ -171,17 +209,29 @@ with sns.axes_style(
 del per_year
 ```
-As we can see, the numbers of explosions rise increasingly towards 1957 and
+As we can see, the number of explosions rises increasingly towards 1957 and
-sharply until 1958, before dropping off for a year in 1959. The reasons for
+sharply until 1958, before dropping off for a year in 1959. The reason for this
-this drop are not entirely clear, but it is very likely that the data are
+drop should primarily be found in the start of the 'Treaty of Test Ban' which
-simply missing for these years.
+put limits and restraints on the testing of above-ground nuclear armaments, as
-<!-- FIXME: The reasons for this are a non-proliferation pact, in article -->
+discussed in the original article. Above all the contract signals the
 prohibition of radioactive debris to fall beyond a nation's respective
 territorial bounds.
 However, this contract should perhaps not be viewed as the only reason: With
 political and cultural shifts throughout the late 1950s and early 1960s
 increasingly focusing on the fallout and horror of nuclear warfare a burgeoning
 public opposition to nuclear testing and instead a push towards disarmament was
 taking hold. The increased focus on the space race between the US and USSR may
 have detracted from the available funds, human resources and agenda attention
 for nuclear testing. Lastly, with nuclear testing policies strongly shaped by
 the political dynamics of the Cold War, a period of improved diplomatic
 relations such as the late 1950s prior to the Cuban missile crisis may directly
 affect the output of nuclear testing facilities between various powers.
 <!-- TODO: Extract exact numbers from data on-the-fly -->
 There is another, very steep, rise in 1962 with over 175 recorded explosions,
 before an even sharper drop-off the following year down to just 50 explosions.
-
+Afterward the changes appear less sharp and the changes remain between 77 and
 Afterwards the changes appear less sharp and the changes remain between 77 and
 24 explosions per year, with a slight downward tendency.
 While these numbers show the overall proliferation of nuclear power, let us now
@ -191,6 +241,7 @@ of explosions over time by country can be seen in @fig-percountry.
 ```{python}
 # | label: fig-percountry
 # | fig-cap: "Nuclear explosions by country, 1945-98"
 keys = df.select("date").unique().join(df.select("country").unique(), how="cross")
 per_country = keys.join(
    df.group_by(["date", "country"], maintain_order=True).len(),
@ -199,7 +250,7 @@ per_country = keys.join(
    coalesce=True,
 ).with_columns(pl.col("len").fill_null(0))
-g = sns.lineplot(data=per_country, x="date", y="len", hue="country")
+g = sns.lineplot(data=per_country, x="date", y="len", hue="country", palette=country_colors)
 g.set_xlabel("Year")
 g.set_ylabel("Count")
 plt.setp(
@ -211,14 +262,14 @@ del per_country
 Once again we can see the visibly steep ramp-up to 1962, though it becomes
 clear that this was driven both by the USSR and the US. Of course the graph
-also makes visible the sheer unmatched number of explosions emenating from both
+also makes visible the sheer unmatched number of explosions emanating from both
 of the countries, with only France catching up to the US numbers and China
 ultimately overtaking them in the 1990s.
 However, here it also becomes more clear how the UK was responsible for some
 early explosions in the late 1950s and early 1960s already, as well as the rise
 in France's nuclear testing from the early 1960s onwards to around 1980, before
-slowly decreasing in intensity afterwards.
+slowly decreasing in intensity afterward.
 Let us turn to a cross-cut through the explosions in @fig-groundlevel, focusing
 on the number of explosions that have occurred underground and above-ground
@ -273,6 +324,7 @@ with sns.axes_style("darkgrid", {"xtick.bottom": True, "ytick.left": True}):
            hue="country",
            multiple="stack",
            binwidth=365,
            palette=country_colors,
        )
    g.xaxis.set_major_locator(mdates.YearLocator(base=5))
@ -293,25 +345,19 @@ shift from above-ground to underground tests, starting with the year 1962.
 ## Locations
-Finally, let's view a map of the world with the explosions marked.
+Finally, let's view a map of the world with the explosions marked, separated by country.
 ::: {.content-visible when-format="html"}
 Hovering over individual explosions will show their year
 while a click will open more information in a panel.
 :::
 The map can be seen in @fig-worldmap.
 ```{python}
-# | label: fig-worldmap
+# | label: worldmap-setup
-# | fig-cap: "World map of nuclear explosions, 1945-98"
+# | output: false
 import folium
 import geopandas as gpd
 from shapely.geometry import Point
 def set_style() -> pl.Expr:
    return (
        pl.when(pl.col("country") == "USSR")
        .then(pl.lit({"color": "red"}, allow_object=True))
        .otherwise(pl.lit({"color": "blue"}, allow_object=True))
    )
 geom = [Point(xy) for xy in zip(df["longitude"], df["latitude"])]
 # df_pd = df.with_columns(style=set_style()).to_pandas().set_index("date")
 df_pd = df.with_columns().to_pandas().set_index("date")
 gdf = gpd.GeoDataFrame(
    df_pd,
@ -320,25 +366,18 @@ gdf = gpd.GeoDataFrame(
 )
 del df_pd
-country_colors = {
+def rgb_to_hex(rgb: tuple[float,float,float]) -> str:
-    "USA": "darkblue",
+    return "#" + "".join([format(int(c*255), '02x') for c in rgb])
    "USSR": "darkred",
    "FRANCE": "pink",
    "UK": "black",
    "CHINA": "purple",
    "INDIA": "orange",
    "PAKIST": "green",
 }
 m = folium.Map(tiles="cartodb positron")
 for country in country_colors.keys():
    fg = folium.FeatureGroup(name=country, show=True).add_to(m)
    folium.GeoJson(
-        gdf[gdf["country"].str.contains(country)],
+        gdf[gdf["country"] == country],
        name="Nuclear Explosions",
        marker=folium.Circle(radius=3, fill_opacity=0.4),
        style_function=lambda x: {
-            "color": country_colors[x["properties"]["country"]],
+            "color": rgb_to_hex(country_colors[x["properties"]["country"]]),
            "radius": (
                x["properties"]["magnitude_body"]
                if x["properties"]["magnitude_body"] > 0
@ -368,15 +407,45 @@ for country in country_colors.keys():
        ),
    ).add_to(fg)
 folium.LayerControl().add_to(m)
 ```
 ::: {#fig-worldmap}
 :::: {.content-visible when-format="html"}
 ```{python}
 # | label: worldmap-html
 m
 ```
-That is all for now.
+::::
 There are undoubtedly more explorations to undertake,
 but this is it for the time being.
-<!-- Ideas TODO:
+:::: {.content-visible unless-format="html" width=80%}
- do not just use 'count' of explosions but yields
+
-    - compare number to yields for ctrys
+```{python}
- count up total number per country in table
+# | label: worldmap-non-html
-->
+# ENSURE SELENIUM IS INSTALLED
 m.png_enabled = True
 m
 ```
 ::::
 World map of nuclear explosions, 1945-98
 :::
 While there are undoubtedly more aspects of the data that provide interesting
 patterns for analysis, this shall be the extent of review for the time being
 for this reproduction.
 We can see how the combination of python polars and seaborn makes the process
 relatively approachable, understandable and, combined with the rendering output
 by quarto, fully reproducible.
 Additionally, we can see how additional projects can be included to produce
 interactive graphs and maps with tools such as folium and geopandas.
 ## References
 ::: {#refs}
 :::