Should we use Git LFS to manage data?

I’ve tested git-lfs and have mixed results.

Good:

  • easy to set up, just install git-lfs and specify what files should be stored using git-lfs (you can specify folders and/or file extensions)
  • works nicely and transparently when used on systems where git-lfs is installed
  • git-lfs files show up on Github’s web interface as regular files (e.g., you see the actual file content instead of pointer information)

Bad:

  • Github’s web interface always uploads files as regular files
  • If someone commits files who has not installed git-lfs, those files will be committed as regular files
  • Regular files can be converted to git-lfs files (there is an officially supported script for that), but it rewrites git history
  • Users reported various esoteric issues that were hard to understand and fix (see 1, 2, 3), even when users were careful and experienced. It is scary to think about how wrong things can go when we accept pull request from a larger community.
2 Likes