Add meta observations

This commit is contained in:
Marty Oehme 2024-07-08 10:09:34 +02:00
parent bff3cb22fa
commit 910e72c0e2
Signed by: Marty
GPG key ID: EDBF2ED917B2EF6A

260
meta.md
View file

@ -1,7 +1,27 @@
This page documents some meta observations about my time recreating the nuclear explosions in this post, ---
mostly some little tips to work well with python polars and seaborn, or little tricks to integrate them and geopandas visualizations. title: Reproducible blog posts
subtitle: "Moving from Quarto manuscript to Astro markdown post output"
description: "Moving from Quarto manuscript to Astro markdown post output"
pubDate: "2024-07-08T10:06:38"
weight: 10
tags:
- python
- astro
---
## From a lat/long polars dataframe to geopandas This page documents some meta observations about my time recreating the nuclear
explosions in the last post.
It goes over some little tips to work well with python polars and seaborn,
as well as little tricks to integrate them and geopandas visualizations.
Finally, I make some observations on the actual process of transferring this
produced output into a blog post on my website, written in the Astro static
site builder framework.
## Data modelling and visualization
### From a lat/long polars dataframe to geopandas
To go from a polars frame to one we can use for GIS operations with geopandas is fairly simple: To go from a polars frame to one we can use for GIS operations with geopandas is fairly simple:
We first move from a polars to an indexed pandas frame, in this case I have indexed on the date of each explosion. We first move from a polars to an indexed pandas frame, in this case I have indexed on the date of each explosion.
@ -20,7 +40,7 @@ gdf = gpd.GeoDataFrame(
del df_pd del df_pd
``` ```
## Keeping the same seaborn color palette for the same categories ### Keeping the same seaborn color palette for the same categories
For the analysis, I have multiple plots which distinguish between the different countries undertaking nuclear detonations. For the analysis, I have multiple plots which distinguish between the different countries undertaking nuclear detonations.
The country category thus appears repeatedly, and with static values (i.e. it will always contain 'US', 'USSR', 'China', 'France' and so on). The country category thus appears repeatedly, and with static values (i.e. it will always contain 'US', 'USSR', 'China', 'France' and so on).
@ -101,7 +121,7 @@ folium.GeoJson(
).add_to(map) ).add_to(map)
``` ```
## Using dictionary keys to create folium map layers ### Using dictionary keys to create folium map layers
As a bonus we can even use our color category keys to create different layers on the folium map which can be turned on and off individually. As a bonus we can even use our color category keys to create different layers on the folium map which can be turned on and off individually.
Thus we can decide which country's detonations we want to visualize. Thus we can decide which country's detonations we want to visualize.
@ -134,6 +154,235 @@ for country in country_colors.keys():
folium.LayerControl().add_to(m) folium.LayerControl().add_to(m)
``` ```
## Output wrangling
### Multiple project profiles
During development and analysis I have only had a single project which then in turn targeted two formats:
`html` for previews and dynamic elements and `pdf` (in truth the new `typst`) for checking static elements.
The following `_quarto.yml` file describes a full working project:
```yml
author: Marty Oehme
csl: https://www.zotero.org/styles/apa
project:
type: default
output-dir: output
render:
- index.qmd
- meta.md
format:
html:
code-fold: true
toc: true
echo: true
typst:
toc: true
echo: false
citeproc: true
docx:
toc: true
echo: false
```
This works well for single-'target' deployments which may arrive in multiple formats but are fundamentally the same.
What happens, however, if we target something completely different (in my case this Astro blog)
which may not even reside in the same directory?
We can create what quarto calls 'project profiles', simply by creating additional `_quart-mypofile.yml` files in the project root.
They will Grab all the yaml data from the original `_quarto.yml` file and then add and overwrite it with their own file's data to create the overall profile.
So if we have the following two files:
```yml
# _quarto.yml
author: Marty Oehme
csl: https://www.zotero.org/styles/apa
```
```yml
# _quarto-local.yml
project:
type: default
output-dir: output
render:
- index.qmd
- meta.md
format:
html:
code-fold: true
toc: true
echo: true
typst:
toc: true
echo: false
citeproc: true
docx:
toc: true
echo: false
```
We have essentially recreated the above project, only as a project 'profile' to be invoked as `quarto render --profile local`.
Now, however, we can add a second `_quarto-remote.yml` profile:
```yml
# _quarto-remote.yml
project:
type: default
output-dir: some/remote/directory/maybe/even/over/nfs/or/sshfs
render:
- index.qmd
format:
hugo-md:
preserve-yaml: true
code-fold: true
keep-ipynb: true
wrap: none
typst:
toc: true
echo: false
citeproc: true
```
If we invoke this profile with `quarto render --profile remote` we output to a different directory altogether,
and have completely different render targets than in the local profile
(in this case the same `typst` format and the new `hugo-md` format, while not rendering to `docx` at all).
This way we can separate different deployments beyond just carrying different formats by actually extending
and overwriting all kinds of project options.[^projtypes]
[^projtypes]: It would for example even be conceivable to have one project profile targeting a locally output `book` project type while a second targets the deployment of a remote `website` type from the same source material.
If we don't invoke the profile we don't have explicit render or format targets and do not set an output dir.
However, we also have a way to set a 'default' project profile (for which we don't have to enter the option each time).
We can do so by slightly extending the base `_quarto.yml` file.
```yml
# _quarto.yml
author: Marty Oehme
csl: https://www.zotero.org/styles/apa
profile:
group:
- [local, remote]
```
The two profiles are now in a 'profile group', of which only one can ever be active and of which the
first one in the list will automatically be applied when invoking `quarto render` without any additional options.
This is how I have been doing it for the nuclear analysis: have a local (in my case I simply called it 'default')
profile which renders the current project to a local working directory using most of the usual quarto output,
such as html preview, and static outputs to double-check how everything is displayed.
Then, I added another profile on top which I called 'blog' and which outputs its renders directly into the
correct post directory of my blog.
There are, however, some remaining issues, detailed below.
### Static content in an Astro blog page
One issue arises in that quarto has its own way of stowing external fragments (like the PNGs of
visualizations) and this often does not automatically work with static site generators which
expect static files like this to reside in the 'static' (or 'public') directory instead of
next to the manuscript.
I have overcome this issue with a 'post-script' which runs after the main quarto processing is done,
by adding to the relevant _quarto.yml:
```yml
project:
type: default
output-dir: /path/to/my/blog/post/2024-07-02-directory
render:
- index.qmd
post-render:
- tools/move-static-to-blog.py /path/to/my/blog/static/dir
```
This way, we can first create all the necessary outputs in the normal quarto output directory and
afterwards have a script which takes the resulting static files and instead moves it to the
correct place in the blog's public directory.
I am not a huge fan of the amount of hard-coding this approach requires but it does seem like the
easiest way to just be able to hit render and have working results.
The following is one example of how to use python to move the required files to a specific static directory:
```python
#!/usr/bin/env python3
import os
import shutil
import sys
from pathlib import Path
# Safeguards to only move when necessary
if not os.getenv("QUARTO_PROJECT_RENDER_ALL"):
sys.exit(0)
q_output_dir = os.getenv("QUARTO_PROJECT_OUTPUT_DIR")
if not q_output_dir:
print(f"ERROR: Output dir {q_output_dir} given by Quarto *does not exist*.")
sys.exit(1)
args = sys.argv
files: list[Path] = []
# Get the correct dest and files from args
if len(args) < 2:
print("Static output file directory for blog post-render is required.")
sys.exit(1)
else:
dest = Path(args[1])
if len(args) > 2:
for f in args[2:]:
files.append(Path(f))
# Move safeguards
if not dest.is_dir():
print(f"ERROR: Static output directory {dest} *does not exist*.")
sys.exit(1)
if not files:
for dirname in os.listdir(dest):
if dirname.endswith("_files"):
dirpath = dest.joinpath(dirname)
for root, loc_dirs, loc_files in dirpath.walk():
for file in loc_files:
files.append(root.joinpath(file))
for f in files:
shutil.copy(f, dest)
```
It simply requires the target directory as the first argument and uses the `QUARTO_PROJECT_OUTPUT_DIR` env var
(which Quarto supplies to any post-render script) as the source.
Then it copies either all files that have been mentioned as additional arguments (safer) or
all files that it finds in directories ending in '_files' (more dangerous).
Now all additional files reside in the root of the static file dir.
If you instead want to 'rebuild' the same structure in the static dir as in the source dir for your assets,
you will have to adjust the script to move between the root-relative file paths in the two folders
(and autoamtically generate new directories if necessary).
This should take care of placing static files in the right places.
### Dynamic content in an Astro blog page
Getting the folium/leaflet map to work in a static site generator like [Astro](https://astro.build) was a bit of a pain.
Essentially, the concept of getting it to work is the same as for static content above:
We save the folium-produced html output as a static file and place that in the static file directory of the blog.
Then we integrate it into the page with an iframe html element.
However, some issues arise in producing the static html file in the first place.
<!-- TODO: Expand on which issues:
Displaying image elements.
Manually saving to html.
Doing both conditionally. -->
## Remaining issues ## Remaining issues
While working with polars is wonderful and seaborn takes a lot of the stress of creating half-way nicely formatted plots out of mind while first creating them, While working with polars is wonderful and seaborn takes a lot of the stress of creating half-way nicely formatted plots out of mind while first creating them,
@ -155,3 +404,4 @@ This project made use the fantastic python library [great tables]() which indeed
However, it primarily targets the html format. However, it primarily targets the html format.
Getting this format into shape for quarto to then translate it into the pandoc AST and ultimately whatever format is not pretty. Getting this format into shape for quarto to then translate it into the pandoc AST and ultimately whatever format is not pretty.
For example LaTeX routinely just crashes instead of rendering the table correctly into a PDF file. For example LaTeX routinely just crashes instead of rendering the table correctly into a PDF file.