[Bioc-devel] BFG repo cleaner did not perfectly work

Nathan Sheffield n@he|| @end|ng |rom d@t@b|o@org
Fri Jan 20 18:23:23 CET 2023


Hi Adam,

I think the recommended way to remove large, inadvertently committed files from a git repo is no longer BFG or filter-branch, but a new approach called `filter-repo`. You might try it. You can read about it here: https://github.com/newren/git-filter-repo

I've found it easier to use and more effective and faster than BFG or git filter-branch. For example I have this in my notes...

First, use this script to identify large files:

```
git rev-list --objects --all \
| git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \
| sed -n 's/^blob //p' \
| sort --numeric-sort --key=2 \
| cut -c 1-12,41- \
| $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
```

Then I use this to remove the files from history. As of 2020, `filter-repo` has replaced `filter-branch` and `bfg` as the recommended way to change history, but it's a separate tool that you'll have to install (with *e.g.* `pip3 install git-filter-repo`).

```
git filter-repo --path-glob '*.RData' --invert-paths
```

Hope that helps.
-Nathan

On Mon, Jan 16, 2023, at 11:48 AM, Park, Adam Keebum wrote:
> Dear community,
> 
> This is a compact version of the same issue I sent last week, for asking a general advice.
> 
>   *   Running the recommended command below did not perfectly remove every such file.
> 
> bfg --strip-blobs-bigger-than 5M repo.git
> 
>   *   The BiocChecker still picks up a pack file and emits a warning (.git/objects/pack-xxx..xxx.pack).
> 
>   *   However, the reference is not detected by tools like git-branch-filter or bfg.
> 
> I would appreciate any kinds of an advice for digging into this problem.
> 
> Sincerely,
> Adam.
> 
> [[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
> 

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list