Should we use Git LFS to manage data?

I was also looking at git-lfs recently, and this seems to have changed:

https://github.com/SlicerDMRI/DMRITestData/blob/master/Tractography/fiber_ply_export_test.vtk?raw=true

from:

So, this would allow git-lfs to be used for data management, but not necessarily for runtime usage – clients would still download from a hash.

However, my personal opinion on git-lfs is a bit low right now because it broke all git pushes when I enabled it on a repo, due to this bug (can’t use github tokens with macOS credential helper). Uninstalling was easy and restored ability to push, but figuring out that I needed to do so took some time. Credential helper support should be a relatively simple fix, so somewhat concerning that it has not been fixed promptly.

I like the idea of reducing dependence on bespoke projects, but given that (1) on the client side, we are just pulling from raw URLs no matter what and (2) girder implements the S3 API (proprietary, but more-or-less a standard at this point), girder is only a soft dependency. We could move the data to any storage that implements S3-style buckets.

(to that end, it would be good to eventually put the data URLs and hashes in separate files rather that inline in cmake files – but not a priority)

1 Like