Add appendix text
This commit is contained in:
parent
06ee312c80
commit
2c5cf37b2c
1 changed files with 62 additions and 16 deletions
78
popcorn.qmd
78
popcorn.qmd
|
|
@ -523,6 +523,57 @@ provides a more macro-level view on how big the statistics have grown to be over
|
|||
We can see that, as each individually reported day adds up to 400KB nowadays, the
|
||||
cumulative size is up to almost 700MB currently.
|
||||
|
||||
### Packages monthwise and per weekday
|
||||
|
||||
Let's also look at the packages installed on systems for different time slices.
|
||||
We'll start with a look at the packages per weekday.
|
||||
|
||||
```{python}
|
||||
from notebooks.popcorn import plt_weekday_packages
|
||||
pplot(plt_weekday_packages)
|
||||
```
|
||||
|
||||
There is no significant difference between the individual weekdays, as we would
|
||||
expect. It seems strange to have a specific day on which everybody decides to
|
||||
install or uninstall new packages.
|
||||
|
||||
That said, there is some slight variation, with Wednesdays generally having a
|
||||
few fewer total packages to boast than other days, especially Tuesdays which
|
||||
are slightly above the curve.
|
||||
|
||||
Let's just imagine everybody gets bored on Tuesday, installs a new package and
|
||||
drops it again by Wednesday, along with a slew of other packages. Try-out
|
||||
Tuesdays and Wastebin Wednesdays if you will.
|
||||
|
||||
Alright, but let's also take a look at the package numbers per month instead.
|
||||
|
||||
```{python}
|
||||
from notebooks.popcorn import plt_month_packages
|
||||
pplot(plt_month_packages)
|
||||
```
|
||||
|
||||
Here we can see a bit more variation. First it is important to note that I have
|
||||
removed the first months of 2018 prior to October from the analysis cut off any
|
||||
days after September 2025, to have only full years represented and avoid any
|
||||
months being present more often than others.[^months-removed]
|
||||
|
||||
[^months-removed]: I chose the first couple of months in the data, rather than
|
||||
the most recent months as fewer people were collecting data, thus we have less
|
||||
of a loss. Additionally, I presume people are more interested in current
|
||||
statistics than older ones, just generally.
|
||||
|
||||
It is quite surprising to me just how much variation is visible in the results:
|
||||
months from October to February have markedly fewer packages than the spring
|
||||
and summer months. Are people generally more willing to use and try out new
|
||||
packages in the summer? Alternatively, were any of the major usage dips taking
|
||||
place during winter, while the increases in usage occured more toward summer?
|
||||
|
||||
I have not delved deep into the interpretation of these questions, but it may
|
||||
be interesting to do so. The last option, of course, is that the data itself,
|
||||
the data collection or analysis contains an error that I am not aware of.
|
||||
|
||||
### Missing days and dates
|
||||
|
||||
There are some missing days in the statistics.
|
||||
|
||||
```{python}
|
||||
|
|
@ -532,17 +583,19 @@ outp, defs = tab_missing_days.run()
|
|||
outp
|
||||
```
|
||||
|
||||
### Packages monthwise and per weekday
|
||||
These missing days are primarily occuring at the end of January 2019, and
|
||||
throughout 2025. However, with over 2600 days where the statistics _are_
|
||||
available, these rows represent an insignificant issue for the overall data.
|
||||
|
||||
```{python}
|
||||
from notebooks.popcorn import plt_weekday_packages
|
||||
pplot(plt_weekday_packages)
|
||||
```
|
||||
It would seem there was some kind of issue collecting or storing the collected
|
||||
data at that point in 2019, which means a few days in a row are missing. This
|
||||
skews absolute numbers for that week downwards, as well as any weekly averages
|
||||
relying on this date-range.
|
||||
|
||||
```{python}
|
||||
from notebooks.popcorn import plt_month_packages
|
||||
pplot(plt_month_packages)
|
||||
```
|
||||
However, no significant visual differences stem from this fact, which is why it
|
||||
is not called out in the main article. As it is --- an interesting fact, and,
|
||||
where this a more rigorous investigation, perhaps worthy of taking into account
|
||||
as biasing the result, but for our purposes not too bad.
|
||||
|
||||
## Outline
|
||||
|
||||
|
|
@ -570,10 +623,3 @@ pplot(plt_month_packages)
|
|||
- things we can't see (limitations)
|
||||
- packages on offer in the repositories
|
||||
- this could shed light on the bumps of users and relative package ownership
|
||||
|
||||
Modified date != descriptive (named) date
|
||||
|
||||
```{python}
|
||||
from notebooks.popcorn import plt_modified_times
|
||||
pplot(plt_modified_times)
|
||||
```
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue