From 053dbc397d315ab6f98273c9f8f407b475613ac7 Mon Sep 17 00:00:00 2001 From: Marty Oehme Date: Thu, 20 Nov 2025 17:57:41 +0100 Subject: [PATCH] Add top packages and conclusion --- index.md | 102 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) diff --git a/index.md b/index.md index aba21b4..4d5d5a8 100644 --- a/index.md +++ b/index.md @@ -415,3 +415,105 @@ Without skipping ahead too much, this makes sense to me looking at the wider pic as the `popcorn` statistics gathering was introduced in the middle of kernel 4's existence, and we are not yet anywhere near the end of the kernel 6 life-span, so version 5 probably had the most opportunity to have long-running installations. + +## Top packages + +Lastly, let's answer one more question: +Which packages have the highest _median_ daily installation counts across the whole period? + +This will be a little more easy again --- +we have all the necessary ingredients in the `packages.csv` file. +And with the tools we used so far, +it shouldn't be hard to create a pipeline which: +groups on the `package` column, +then aggregates the package `count` using the `median` method, +and finally sorts by the result of this aggregation. + +```nu +open input/popcorn/output/packages.csv | group-by --to-table package | update items { $in.count | math median } | sort-by items +``` + +Of course, we'll have to be a little more careful with our pipeline here while building it and _definitely_ resort to filtering like `| first 1000` or similar while building it, +since constantly running over 17 million lines through the pipeline will be a little too much for the machine otherwise (at least, definitely my machine with 8GB of RAM). + +In fact, running this full command completely saturated my memory and made heavy use of my swap memory so it wouldn't have to crash due to running out. +Of course, with so much swapping this also massively slowed down the process, so the above command took a little over 13 minutes to complete on my system. + +Here's the result of all that number crunching: + +| package | items | +| --- | --- | +| smartmontools | 25.0 | +| psmisc | 25.0 | +| base-system | 26.0 | +| ntfs-3g | 26 | +| void-repo-multilib | 27 | +| xorg-minimal | 28.0 | +| lvm2 | 29 | +| unzip | 29 | +| base-devel | 30 | +| void-repo-nonfree | 31.0 | +| neofetch | 33.0 | +| lm_sensors | 35 | +| zip | 42 | +| xmirror | 42 | +| socklog-void | 48.0 | + +So, what does that tell us? +I think there's a few interesting observations to be made here. + +First, remember that we are looking at the _median_ number installations of packages over the _whole_ time period. +So, even if a package was slow to get going with a few days of only having a single user, +it shows up here. +Similarly, however, if a package had one or multiple periods of intense use but is more erratic in its overall usage pattern, +this will not be reflected here. + +Second, we are looking at the _number of installations_, +so the daily report of who has this package installed on their system. +The most-installed package here is `socklog-void`, which makes sense as the main suggested package in the [documentation](https://docs.voidlinux.org/config/services/logging.html). +The high prevalence of `xmirror` is a little more surprising to me, +though it is, once again, the [suggested method](https://docs.voidlinux.org/xbps/repositories/mirrors/changing.html?highlight=xmirror#xmirror) of changing your installation's repository mirrors. + +`zip` being ahead of both `base-system` and `base-devel` is somewhat amusing to me, +as is the latter also being ahead of the former. + +But overall I think this distribution of packages makes sense, as they all describe long-lived utility programs which _any_ user of a distro may find useful (as opposed to more focused programs such as design software like `gimp` or text editors like `neovim`). +With one curious exception: +`neofetch` is on the 5th spot of packages, +which is a giant surprise to me. + +Personally, I don't use a `fetch`-like program, +as I think it just adds clutter to the terminal. + +But I am truly surprised at the amount of people having it installed. +I suppose it makes sense in the way of installing it once, +for a show-case or to check your system at a glance, +but then not uninstalling it since it is just so unobtrusive. +Nevertheless, this surprises me greatly. + +## Conclusion + +This was a fun first excursion into the package statistics of Void Linux. +As I said on the outset, I hope to have a more detailed article out at some point which looks at some of the changes over time a little more visually, +but this was a lot of fun. + +And I think it also really shows the power --- +and the limitations --- +of `nushell`. +I could quickly switch between a multitude of data sources, +and my data cleaning and transformation tools remained the same. + +The mental model behind operations is also much more akin to more data-oriented workflows and tools like `Python Pandas`, or `SQL` or even `R`, +which I think is a boon when first introducing the idea of using the shell to more data-oriented folks. + +However, I also stumbled onto the edges of what is possible with the shell on my machine. +There may be approaches that make use of data streaming which I haven't discovered, +but running transformations on the giant data for packages nearly brought my machine to its knees, +and would still be much better accomplished with `Python Polars` for me currently. + +In conclusion, use `nushell` for the right purposes: +the quick turn-around of exploring medium-sized datasets, +or taking a first look into parts of large datasets, +while always staying flexible and having the full power of an interactive shell at your fingertips. +Once you wrap your head around the more functional approach to how data streams through your pipelines (and I'm still in the process of doing so), +it just becomes plain _fun_ to explore all manner of datasets.