Overhaul text cells

This commit is contained in:
Marty Oehme 2025-09-29 21:27:27 +02:00
parent 08737e1baa
commit 094aa34758
Signed by: Marty
GPG key ID: 4E535BC19C61886E

View file

@ -75,28 +75,26 @@ def _():
r""" r"""
## Daily statistics file size ## Daily statistics file size
The simplest operation we can do is look at the overall file size for each The simplest operation we can do is look at the overall file size for each of the daily
of the daily statistics files over time. The files consist of a long list statistics files over time. The files consist of a long list of packages which have been checked
of packages which have been downloaded from the repositories that day, from the repositories that day, along with the number of package instances. It also consists of
along with the number of downloads. It also consists of the same list the same list separated by specifically installed versions of packages, so if somebody has
separated by specifically downloaded versions of packages, so if somebody v0.9.1 and somebody else v0.9.3 instead this would count both packages separately.
downloads v0.9.1 and somebody else downloads v0.9.3 this would count both
downloads separately.
Another count is the number of different Kernels that have been used to Another count is the number of different Kernels that have been used on that day, with their
download (or downloaded?) from the repositories. exact kernel name including major version, minor version and any suffix.
These are the major things that will lead to size increases in the file, These are the major things that will lead to size increases in the file, but not just for an
but not just for an increased amount of downloads --- we will get to those shortly. increased amount of absolute users, packages or uploads --- we will get to those shortly.
No, an increase in file size here mainly suggests an increase in the No, an increase in file size here mainly suggests an increase in the 'breadth' of files on offer
'breadth' of files on offer in the repository, whether that be a wider in the repository, whether that be a wider variety of program versions or more different
variety of program versions or more different packages that people are packages that people are interested in, and those that the community chooses to use.
interested in.
So while the overall amount of packages gives a general estimate of the interest in the
distribution, this can show a more 'distributor'-aligned view on how many different aisles of
the buffet people are eating from.
So while the overall amount of downloads gives a general estimate of the
interest in the distribution, this can show a more 'distributor'-aligned
view on how many different aisles of the buffet people are eating from.
""" """
) )
return return
@ -122,13 +120,18 @@ def _():
mo.md( mo.md(
r""" r"""
As we can see, the difference over time is massive. Especially early on, As we can see, the difference over time is massive. Especially early on, between 2019 and the
between 2019 and the start of 2021, the amount of different stuff start of 2021, the amount of different packages and package versions used grew rapidly, with the
downloaded grew rapidly, with the pace picking up again starting 2023. pace picking up once again starting 2023.
There are a few outliers with a size of 0 kB, which we will remove from the There are a few outlier days with a size of 0 kB, which we will remove from the data. In all
data. There are also a few days where the modification date of the file likelihood, those days were not reported correctly or there was some kind of issue on the
does not correspond to the represented statistical date. backend so the stats for those days are lost.
There are also a few days where the modification date of the file does not correspond to the
represented statistical date but those are kept. This rather points to certain times when the
files have been moved on the backend, or recreated externally but does not mean the data are
bad.
""" """
) )
@ -159,14 +162,15 @@ def _():
def _(): def _():
mo.md( mo.md(
r""" r"""
## Download statistics ## Package statistics
Now that we have an idea of how the overall interest in the distribution Now that we have an idea of how the overall interest in the distribution has changed over time,
has changed over time, let's look at the actual download statistics. let's look at the actual package statistics.
The popcorn files contain two main pieces of information: the number of installs per package
(e.g. how many people have rsync installed) and the number of unique installs (i.e. unique
machines providing statistics). We will look at both of these in turn.
The popcorn files contain two main pieces of information: the number of
unique installs (i.e. unique machines downloading packages) and the number
of downloads per package. We will look at both of these in turn.
""" """
) )
return return
@ -195,6 +199,18 @@ def _(df_pkg_lazy: pl.LazyFrame):
return return
@app.cell(hide_code=True)
def _():
mo.md(
r"""
The amount of packages installed on all machines increases strongly over time.
"""
)
return
@app.cell @app.cell
def _(df_pkg_lazy: pl.LazyFrame): def _(df_pkg_lazy: pl.LazyFrame):
def _(): def _():