[Bioc-devel] 5mb limit for packfiles in .git is too harsh

Park, Adam Keebum @e|n@p@rk @end|ng |rom p@u@edu
Sat Jan 21 21:24:42 CET 2023


Dear community,

First, I want to appreciate Nathan's amazing help on my two previous inquiries. The answers effectively led me to pinpoint the issue.

 The final decision I made after hours of analysis was to remove all data files exceeding 50k sizes from the git history. However, such practice is not sustainable and actually is pathological because it invalidates virtually all previous data files and hence hampers reproducibility of previous commits, especially unit testing. Therefore, I want to leave a message here with a hope to reach administrators of bioconductor.

 I would claim that this policy should be relaxed at least for the git packfile. Most of us know that the .pack file residing in .git/objects/pack has frequently been accused by BiocChecker() for its large size (as in here<https://stat.ethz.ch/pipermail/bioc-devel/2019-February/014703.html> or here<https://stat.ethz.ch/pipermail/bioc-devel/2020-October/017273.html>), which is natural due to the purpose of packfiles: storing "all removal history" in a single compact space<https://git-scm.com/book/en/v2/Git-Internals-Packfiles#:~:text=The%20packfile%20is%20a%20single,seek%20to%20a%20specific%20object.>.
 Compressing the whole git history in a file is effective only until the majority of delta are sentence-based changes in a text source file for example. In my practice, however, a modification in blob files tended to contribute much more because of boosted delta after compressing datasets where some modification has shaken their bit patterns. Such changes were still kilobyte-level, but gradually impacted the whole pack file size so I had to remove those cases. The current policy therefore forces deletions of kilo-sized files in git history, not just 'large' files...
 I might not be the only one using multiple 100kb-sized experimental data in unit testing and vignettes. Containing dozens of such files in a 5mb package might be acceptable. I believe the same can hold for the pack file because it just represents a collection of previous files which are still less than 5mb. I guess the policy can relax such file size limit to allow safer and reproducible developer practices.

Sincerely,
Adam.


	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list