Should we use Git LFS to manage data?

ihnorton · March 28, 2018, 2:07pm

I was also looking at git-lfs recently, and this seems to have changed:

https://github.com/SlicerDMRI/DMRITestData/blob/master/Tractography/fiber_ply_export_test.vtk?raw=true

from:

So, this would allow git-lfs to be used for data management, but not necessarily for runtime usage – clients would still download from a hash.

However, my personal opinion on git-lfs is a bit low right now because it broke all git pushes when I enabled it on a repo, due to this bug (can’t use github tokens with macOS credential helper). Uninstalling was easy and restored ability to push, but figuring out that I needed to do so took some time. Credential helper support should be a relatively simple fix, so somewhat concerning that it has not been fixed promptly.

I like the idea of reducing dependence on bespoke projects, but given that (1) on the client side, we are just pulling from raw URLs no matter what and (2) girder implements the S3 API (proprietary, but more-or-less a standard at this point), girder is only a soft dependency. We could move the data to any storage that implements S3-style buckets.

(to that end, it would be good to eventually put the data URLs and hashes in separate files rather that inline in cmake files – but not a priority)

Topic		Replies	Views
Whitelisting build machines Development buildmachines	4	314	March 9, 2020
Actual data content of slicer.kitware.com-midas3-archive Support data-sets	2	215	August 2, 2023
Transition to Git Development	1	586	March 28, 2018
Latest build - VTKv9 step requires git-lfs Development	3	673	April 6, 2018
Recommended approach to store tutorial materials Support tutorial	6	582	November 26, 2019

Should we use Git LFS to manage data?

Related topics