initial commit

This commit is contained in:
Marty Oehme 2024-06-22 23:11:49 +02:00
commit cceb1d1ec0
Signed by: Marty
GPG key ID: EDBF2ED917B2EF6A
6 changed files with 5987 additions and 0 deletions

View file

@ -0,0 +1,54 @@
<!-- TODO: Load missing data from main nuclear_explosions.qmd notebook -->
The following is a simple groupby, counting the len of country rows per date:
```{python}
# | label: fig-percountry-drop
# | fig-cap: "Nuclear explosions by country, 1945-98"
per_country = df.group_by(pl.col("date", "country")).agg(pl.len()).sort("date")
g = sns.lineplot(data=per_country, x="date", y="len", hue="country")
g.set_xlabel("Year")
g.set_ylabel("Count")
plt.setp(
g.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor"
) # ensure rotated right-anchor
plt.show()
```
This works well to group generally, but there is an issue:
If there is a year where a country did not have any entries at all,
the resulting df will not have `Date | Cty | 0` but instead will not have an entry at all.
This can be desirable for some applications, but for example if we then
draw a line plot based on this it would interpolate between the
country values and **not drop the line down to 0 for the years where a country does not have an entry**.
We can fix it by first doing a cross product of all keys we always want to have a row for.
Then we do the group by but supply it to a left-join on this cross product.
End result is we keep all the rows from the cross-product, but we still aggregate and have a len
column as before. For those where we don't have a len value we finally just fill in a 0 instead.
```{python}
# | label: fig-percountry-keep
# | fig-cap: "Nuclear explosions by country, 1945-98"
keys = df.select("date").unique().join(df.select("country").unique(), how="cross")
per_country = keys.join(
df.group_by(["date", "country"], maintain_order=True).len(),
on=["date", "country"],
how="left",
coalesce=True,
).with_columns(pl.col("len").fill_null(0))
g = sns.lineplot(data=per_country, x="date", y="len", hue="country")
g.set_xlabel("Year")
g.set_ylabel("Count")
plt.setp(
g.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor"
) # ensure rotated right-anchor
plt.show()
```
A more nicely function-based solution (though using the same solution under the hood) can be found
here: https://github.com/pola-rs/polars/issues/15997#issuecomment-2089362557

View file

@ -0,0 +1,75 @@
constructed with seaborn object-style plots instead.
These kind of plots are much more structured for the workflow I use and the way I think about plotting,
clearly delineating between a plot;
some visual on the plot;
some statistical transformation;
some movement, labeling or scaling operation.
They are also, however, fairly new and still considered experimental.
They also don't allow *quite* the customization that the other plots do,
and seem either a little buggy or I have not fully understood them yet in regards to ticks and labels.
```{python}
# | label: fig-groundlevel-so
# | fig-cap: "Nuclear explosions, 1945-98"
import seaborn.objects as so
import matplotlib.dates as mdates
above_cat = pl.Series(
[
"ATMOSPH",
"AIRDROP",
"TOWER",
"BALLOON",
"SURFACE",
"BARGE",
"ROCKET",
"SPACE",
"SHIP",
"WATERSUR",
"WATER SU",
]
)
df_groundlevel = (
df.with_columns(
above_ground=pl.col("type").map_elements(
lambda x: True if x in above_cat else False, return_dtype=bool
))
.group_by(pl.col("year", "country", "above_ground"))
.agg(count=pl.len())
.sort("year")
)
fig, ax = plt.subplots()
ax.xaxis.set_tick_params(rotation=90)
from seaborn import axes_style
p = (
so.Plot(df_groundlevel, x="year", y="count", color="country")
.add(
so.Bars(),
so.Stack(),
data=df_groundlevel.filter(pl.col("above_ground") == True).sort("country"),
)
.add(
so.Bars(),
so.Stack(),
data=df_groundlevel.filter(pl.col("above_ground") == False).with_columns(
count=pl.col("count") * -1
).sort("country"),
)
.label(x="Year", y="Count")
.scale(
x=so.Continuous().tick(locator=mdates.YearLocator(base=5), minor=4).label(like="{x:.0f}"),
# x=so.Nominal().tick(locator=mdates.YearLocator(base=5), minor=4), # this might work in the future
)
.theme({
**axes_style("darkgrid"),
"xtick.bottom": True,
"ytick.left": True
})
.on(ax)
.plot()
)
```