Add appendix text
This commit is contained in:
parent
06ee312c80
commit
2c5cf37b2c
1 changed files with 62 additions and 16 deletions
78
popcorn.qmd
78
popcorn.qmd
|
|
@ -523,6 +523,57 @@ provides a more macro-level view on how big the statistics have grown to be over
|
||||||
We can see that, as each individually reported day adds up to 400KB nowadays, the
|
We can see that, as each individually reported day adds up to 400KB nowadays, the
|
||||||
cumulative size is up to almost 700MB currently.
|
cumulative size is up to almost 700MB currently.
|
||||||
|
|
||||||
|
### Packages monthwise and per weekday
|
||||||
|
|
||||||
|
Let's also look at the packages installed on systems for different time slices.
|
||||||
|
We'll start with a look at the packages per weekday.
|
||||||
|
|
||||||
|
```{python}
|
||||||
|
from notebooks.popcorn import plt_weekday_packages
|
||||||
|
pplot(plt_weekday_packages)
|
||||||
|
```
|
||||||
|
|
||||||
|
There is no significant difference between the individual weekdays, as we would
|
||||||
|
expect. It seems strange to have a specific day on which everybody decides to
|
||||||
|
install or uninstall new packages.
|
||||||
|
|
||||||
|
That said, there is some slight variation, with Wednesdays generally having a
|
||||||
|
few fewer total packages to boast than other days, especially Tuesdays which
|
||||||
|
are slightly above the curve.
|
||||||
|
|
||||||
|
Let's just imagine everybody gets bored on Tuesday, installs a new package and
|
||||||
|
drops it again by Wednesday, along with a slew of other packages. Try-out
|
||||||
|
Tuesdays and Wastebin Wednesdays if you will.
|
||||||
|
|
||||||
|
Alright, but let's also take a look at the package numbers per month instead.
|
||||||
|
|
||||||
|
```{python}
|
||||||
|
from notebooks.popcorn import plt_month_packages
|
||||||
|
pplot(plt_month_packages)
|
||||||
|
```
|
||||||
|
|
||||||
|
Here we can see a bit more variation. First it is important to note that I have
|
||||||
|
removed the first months of 2018 prior to October from the analysis cut off any
|
||||||
|
days after September 2025, to have only full years represented and avoid any
|
||||||
|
months being present more often than others.[^months-removed]
|
||||||
|
|
||||||
|
[^months-removed]: I chose the first couple of months in the data, rather than
|
||||||
|
the most recent months as fewer people were collecting data, thus we have less
|
||||||
|
of a loss. Additionally, I presume people are more interested in current
|
||||||
|
statistics than older ones, just generally.
|
||||||
|
|
||||||
|
It is quite surprising to me just how much variation is visible in the results:
|
||||||
|
months from October to February have markedly fewer packages than the spring
|
||||||
|
and summer months. Are people generally more willing to use and try out new
|
||||||
|
packages in the summer? Alternatively, were any of the major usage dips taking
|
||||||
|
place during winter, while the increases in usage occured more toward summer?
|
||||||
|
|
||||||
|
I have not delved deep into the interpretation of these questions, but it may
|
||||||
|
be interesting to do so. The last option, of course, is that the data itself,
|
||||||
|
the data collection or analysis contains an error that I am not aware of.
|
||||||
|
|
||||||
|
### Missing days and dates
|
||||||
|
|
||||||
There are some missing days in the statistics.
|
There are some missing days in the statistics.
|
||||||
|
|
||||||
```{python}
|
```{python}
|
||||||
|
|
@ -532,17 +583,19 @@ outp, defs = tab_missing_days.run()
|
||||||
outp
|
outp
|
||||||
```
|
```
|
||||||
|
|
||||||
### Packages monthwise and per weekday
|
These missing days are primarily occuring at the end of January 2019, and
|
||||||
|
throughout 2025. However, with over 2600 days where the statistics _are_
|
||||||
|
available, these rows represent an insignificant issue for the overall data.
|
||||||
|
|
||||||
```{python}
|
It would seem there was some kind of issue collecting or storing the collected
|
||||||
from notebooks.popcorn import plt_weekday_packages
|
data at that point in 2019, which means a few days in a row are missing. This
|
||||||
pplot(plt_weekday_packages)
|
skews absolute numbers for that week downwards, as well as any weekly averages
|
||||||
```
|
relying on this date-range.
|
||||||
|
|
||||||
```{python}
|
However, no significant visual differences stem from this fact, which is why it
|
||||||
from notebooks.popcorn import plt_month_packages
|
is not called out in the main article. As it is --- an interesting fact, and,
|
||||||
pplot(plt_month_packages)
|
where this a more rigorous investigation, perhaps worthy of taking into account
|
||||||
```
|
as biasing the result, but for our purposes not too bad.
|
||||||
|
|
||||||
## Outline
|
## Outline
|
||||||
|
|
||||||
|
|
@ -570,10 +623,3 @@ pplot(plt_month_packages)
|
||||||
- things we can't see (limitations)
|
- things we can't see (limitations)
|
||||||
- packages on offer in the repositories
|
- packages on offer in the repositories
|
||||||
- this could shed light on the bumps of users and relative package ownership
|
- this could shed light on the bumps of users and relative package ownership
|
||||||
|
|
||||||
Modified date != descriptive (named) date
|
|
||||||
|
|
||||||
```{python}
|
|
||||||
from notebooks.popcorn import plt_modified_times
|
|
||||||
pplot(plt_modified_times)
|
|
||||||
```
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue