initial commit
This commit is contained in:
commit
cceb1d1ec0
6 changed files with 5987 additions and 0 deletions
54
notebooks/groupby_keep_zero-values.qmd
Normal file
54
notebooks/groupby_keep_zero-values.qmd
Normal file
|
|
@ -0,0 +1,54 @@
|
|||
<!-- TODO: Load missing data from main nuclear_explosions.qmd notebook -->
|
||||
|
||||
The following is a simple groupby, counting the len of country rows per date:
|
||||
|
||||
```{python}
|
||||
# | label: fig-percountry-drop
|
||||
# | fig-cap: "Nuclear explosions by country, 1945-98"
|
||||
per_country = df.group_by(pl.col("date", "country")).agg(pl.len()).sort("date")
|
||||
|
||||
g = sns.lineplot(data=per_country, x="date", y="len", hue="country")
|
||||
g.set_xlabel("Year")
|
||||
g.set_ylabel("Count")
|
||||
plt.setp(
|
||||
g.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor"
|
||||
) # ensure rotated right-anchor
|
||||
plt.show()
|
||||
```
|
||||
|
||||
This works well to group generally, but there is an issue:
|
||||
If there is a year where a country did not have any entries at all,
|
||||
the resulting df will not have `Date | Cty | 0` but instead will not have an entry at all.
|
||||
|
||||
This can be desirable for some applications, but for example if we then
|
||||
draw a line plot based on this it would interpolate between the
|
||||
country values and **not drop the line down to 0 for the years where a country does not have an entry**.
|
||||
|
||||
We can fix it by first doing a cross product of all keys we always want to have a row for.
|
||||
Then we do the group by but supply it to a left-join on this cross product.
|
||||
|
||||
End result is we keep all the rows from the cross-product, but we still aggregate and have a len
|
||||
column as before. For those where we don't have a len value we finally just fill in a 0 instead.
|
||||
|
||||
```{python}
|
||||
# | label: fig-percountry-keep
|
||||
# | fig-cap: "Nuclear explosions by country, 1945-98"
|
||||
keys = df.select("date").unique().join(df.select("country").unique(), how="cross")
|
||||
per_country = keys.join(
|
||||
df.group_by(["date", "country"], maintain_order=True).len(),
|
||||
on=["date", "country"],
|
||||
how="left",
|
||||
coalesce=True,
|
||||
).with_columns(pl.col("len").fill_null(0))
|
||||
|
||||
g = sns.lineplot(data=per_country, x="date", y="len", hue="country")
|
||||
g.set_xlabel("Year")
|
||||
g.set_ylabel("Count")
|
||||
plt.setp(
|
||||
g.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor"
|
||||
) # ensure rotated right-anchor
|
||||
plt.show()
|
||||
```
|
||||
|
||||
A more nicely function-based solution (though using the same solution under the hood) can be found
|
||||
here: https://github.com/pola-rs/polars/issues/15997#issuecomment-2089362557
|
||||
75
notebooks/sns_objects-style.qmd
Normal file
75
notebooks/sns_objects-style.qmd
Normal file
|
|
@ -0,0 +1,75 @@
|
|||
|
||||
constructed with seaborn object-style plots instead.
|
||||
These kind of plots are much more structured for the workflow I use and the way I think about plotting,
|
||||
clearly delineating between a plot;
|
||||
some visual on the plot;
|
||||
some statistical transformation;
|
||||
some movement, labeling or scaling operation.
|
||||
They are also, however, fairly new and still considered experimental.
|
||||
|
||||
They also don't allow *quite* the customization that the other plots do,
|
||||
and seem either a little buggy or I have not fully understood them yet in regards to ticks and labels.
|
||||
|
||||
```{python}
|
||||
# | label: fig-groundlevel-so
|
||||
# | fig-cap: "Nuclear explosions, 1945-98"
|
||||
import seaborn.objects as so
|
||||
import matplotlib.dates as mdates
|
||||
|
||||
above_cat = pl.Series(
|
||||
[
|
||||
"ATMOSPH",
|
||||
"AIRDROP",
|
||||
"TOWER",
|
||||
"BALLOON",
|
||||
"SURFACE",
|
||||
"BARGE",
|
||||
"ROCKET",
|
||||
"SPACE",
|
||||
"SHIP",
|
||||
"WATERSUR",
|
||||
"WATER SU",
|
||||
]
|
||||
)
|
||||
df_groundlevel = (
|
||||
df.with_columns(
|
||||
above_ground=pl.col("type").map_elements(
|
||||
lambda x: True if x in above_cat else False, return_dtype=bool
|
||||
))
|
||||
.group_by(pl.col("year", "country", "above_ground"))
|
||||
.agg(count=pl.len())
|
||||
.sort("year")
|
||||
)
|
||||
|
||||
fig, ax = plt.subplots()
|
||||
ax.xaxis.set_tick_params(rotation=90)
|
||||
|
||||
from seaborn import axes_style
|
||||
p = (
|
||||
so.Plot(df_groundlevel, x="year", y="count", color="country")
|
||||
.add(
|
||||
so.Bars(),
|
||||
so.Stack(),
|
||||
data=df_groundlevel.filter(pl.col("above_ground") == True).sort("country"),
|
||||
)
|
||||
.add(
|
||||
so.Bars(),
|
||||
so.Stack(),
|
||||
data=df_groundlevel.filter(pl.col("above_ground") == False).with_columns(
|
||||
count=pl.col("count") * -1
|
||||
).sort("country"),
|
||||
)
|
||||
.label(x="Year", y="Count")
|
||||
.scale(
|
||||
x=so.Continuous().tick(locator=mdates.YearLocator(base=5), minor=4).label(like="{x:.0f}"),
|
||||
# x=so.Nominal().tick(locator=mdates.YearLocator(base=5), minor=4), # this might work in the future
|
||||
)
|
||||
.theme({
|
||||
**axes_style("darkgrid"),
|
||||
"xtick.bottom": True,
|
||||
"ytick.left": True
|
||||
})
|
||||
.on(ax)
|
||||
.plot()
|
||||
)
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue