Add top packages and conclusion
This commit is contained in:
parent
2f6a7c9af6
commit
053dbc397d
1 changed files with 102 additions and 0 deletions
102
index.md
102
index.md
|
|
@ -415,3 +415,105 @@ Without skipping ahead too much, this makes sense to me looking at the wider pic
|
|||
as the `popcorn` statistics gathering was introduced in the middle of kernel 4's existence,
|
||||
and we are not yet anywhere near the end of the kernel 6 life-span,
|
||||
so version 5 probably had the most opportunity to have long-running installations.
|
||||
|
||||
## Top packages
|
||||
|
||||
Lastly, let's answer one more question:
|
||||
Which packages have the highest _median_ daily installation counts across the whole period?
|
||||
|
||||
This will be a little more easy again ---
|
||||
we have all the necessary ingredients in the `packages.csv` file.
|
||||
And with the tools we used so far,
|
||||
it shouldn't be hard to create a pipeline which:
|
||||
groups on the `package` column,
|
||||
then aggregates the package `count` using the `median` method,
|
||||
and finally sorts by the result of this aggregation.
|
||||
|
||||
```nu
|
||||
open input/popcorn/output/packages.csv | group-by --to-table package | update items { $in.count | math median } | sort-by items
|
||||
```
|
||||
|
||||
Of course, we'll have to be a little more careful with our pipeline here while building it and _definitely_ resort to filtering like `| first 1000` or similar while building it,
|
||||
since constantly running over 17 million lines through the pipeline will be a little too much for the machine otherwise (at least, definitely my machine with 8GB of RAM).
|
||||
|
||||
In fact, running this full command completely saturated my memory and made heavy use of my swap memory so it wouldn't have to crash due to running out.
|
||||
Of course, with so much swapping this also massively slowed down the process, so the above command took a little over 13 minutes to complete on my system.
|
||||
|
||||
Here's the result of all that number crunching:
|
||||
|
||||
| package | items |
|
||||
| --- | --- |
|
||||
| smartmontools | 25.0 |
|
||||
| psmisc | 25.0 |
|
||||
| base-system | 26.0 |
|
||||
| ntfs-3g | 26 |
|
||||
| void-repo-multilib | 27 |
|
||||
| xorg-minimal | 28.0 |
|
||||
| lvm2 | 29 |
|
||||
| unzip | 29 |
|
||||
| base-devel | 30 |
|
||||
| void-repo-nonfree | 31.0 |
|
||||
| neofetch | 33.0 |
|
||||
| lm_sensors | 35 |
|
||||
| zip | 42 |
|
||||
| xmirror | 42 |
|
||||
| socklog-void | 48.0 |
|
||||
|
||||
So, what does that tell us?
|
||||
I think there's a few interesting observations to be made here.
|
||||
|
||||
First, remember that we are looking at the _median_ number installations of packages over the _whole_ time period.
|
||||
So, even if a package was slow to get going with a few days of only having a single user,
|
||||
it shows up here.
|
||||
Similarly, however, if a package had one or multiple periods of intense use but is more erratic in its overall usage pattern,
|
||||
this will not be reflected here.
|
||||
|
||||
Second, we are looking at the _number of installations_,
|
||||
so the daily report of who has this package installed on their system.
|
||||
The most-installed package here is `socklog-void`, which makes sense as the main suggested package in the [documentation](https://docs.voidlinux.org/config/services/logging.html).
|
||||
The high prevalence of `xmirror` is a little more surprising to me,
|
||||
though it is, once again, the [suggested method](https://docs.voidlinux.org/xbps/repositories/mirrors/changing.html?highlight=xmirror#xmirror) of changing your installation's repository mirrors.
|
||||
|
||||
`zip` being ahead of both `base-system` and `base-devel` is somewhat amusing to me,
|
||||
as is the latter also being ahead of the former.
|
||||
|
||||
But overall I think this distribution of packages makes sense, as they all describe long-lived utility programs which _any_ user of a distro may find useful (as opposed to more focused programs such as design software like `gimp` or text editors like `neovim`).
|
||||
With one curious exception:
|
||||
`neofetch` is on the 5th spot of packages,
|
||||
which is a giant surprise to me.
|
||||
|
||||
Personally, I don't use a `fetch`-like program,
|
||||
as I think it just adds clutter to the terminal.
|
||||
|
||||
But I am truly surprised at the amount of people having it installed.
|
||||
I suppose it makes sense in the way of installing it once,
|
||||
for a show-case or to check your system at a glance,
|
||||
but then not uninstalling it since it is just so unobtrusive.
|
||||
Nevertheless, this surprises me greatly.
|
||||
|
||||
## Conclusion
|
||||
|
||||
This was a fun first excursion into the package statistics of Void Linux.
|
||||
As I said on the outset, I hope to have a more detailed article out at some point which looks at some of the changes over time a little more visually,
|
||||
but this was a lot of fun.
|
||||
|
||||
And I think it also really shows the power ---
|
||||
and the limitations ---
|
||||
of `nushell`.
|
||||
I could quickly switch between a multitude of data sources,
|
||||
and my data cleaning and transformation tools remained the same.
|
||||
|
||||
The mental model behind operations is also much more akin to more data-oriented workflows and tools like `Python Pandas`, or `SQL` or even `R`,
|
||||
which I think is a boon when first introducing the idea of using the shell to more data-oriented folks.
|
||||
|
||||
However, I also stumbled onto the edges of what is possible with the shell on my machine.
|
||||
There may be approaches that make use of data streaming which I haven't discovered,
|
||||
but running transformations on the giant data for packages nearly brought my machine to its knees,
|
||||
and would still be much better accomplished with `Python Polars` for me currently.
|
||||
|
||||
In conclusion, use `nushell` for the right purposes:
|
||||
the quick turn-around of exploring medium-sized datasets,
|
||||
or taking a first look into parts of large datasets,
|
||||
while always staying flexible and having the full power of an interactive shell at your fingertips.
|
||||
Once you wrap your head around the more functional approach to how data streams through your pipelines (and I'm still in the process of doing so),
|
||||
it just becomes plain _fun_ to explore all manner of datasets.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue