Blog

GitFinder goes LFS

August 01, 2019

No matter now good version control system git is, there is a small quirk it suffers from, ever since it was introduced. For what it’s worth, the quirk affects other version control systems as well and it is related to binary files diffs. Binary files are notorious for not allowing small diffs, no matter how small changes you make to them. Here is an example: suppose you have a large (100 MB or so) binary file in your project, let’s say a video as a part of important presentation. It is completely valid to have such files in your repository and track their changes. However, whenever you make a change to the video file, no matter how insignificant it is, committing that modification will put the complete large file in your repository, not just its diff. That’s the nature of binary files. After only a few changes to the fie, your local repository will grow madly in size. Let’s not forget you can have more such files. And the most annoying thing is, you rarely need all those versions of huge files. They just eat disk space for no good reason. Furthermore, once you push them to a remote repository, anyone who clones it will get all those rarely used binary blobs wasting their disk space too.

To solve this problem, some smart folks came up with, what they named, Git LFS (Large File Support). The basic idea is not to keep all versions of large binary files locally in the repository, but only those you really need (currently checked-out in the working tree). Everything else should be pushed to a remote LFS storage, provided by your git hosting server or elsewhere. When you change large binary files and commit changes, only pointers to those files will be kept in the repository. Since a single pointer is only a few hundreds bytes large, you can imagine immense savings in disk space for repository managed this way. I will not spend many words on how Git LFS actually works. If you aren't familiar and you want to find out more, this quick intro video will get you started before moving to other sources available on the web. Instead, I will write in more details about what it took to get support for it into GitFinder. I don’t use Git LFS much myself, but many users finds it indispensable, so it had to make its way into GitFinder eventually. And since it was rather significant undertaking, I thought it deserved a dedicated blog post.

Official Git LFS implementation consists of git-lfs command line tool and some accompanying files, much like the core-git does. Since it is meant to be widely used not only by end users, but also by other git clients, it would be ideal if it came accompanied by some kind of library and API, so other developers could simply link with it. Unfortunately, like it usually goes these days, the developers of Git LFS didn’t think in that direction. They provide just the tool, again very much like core-git does. So, what were the options to include Git LFS support in GitFinder? The first obvious one was to just use git-lfs tool and let it do all the necessary work. Even though GitFinder is a sandboxed application, going that way would not technically be a problem. Just pack the whole Git LFS suite inside the application bundle and NSTask it for all Git LFS operations. If you read this blog post, you could find out I’m a huge opponent to this approach and there you can also read why.  After all, I didn’t go that route with the core-git tool (deciding for libgit2 library instead), why would I ruin it now? Luckily, Git LFS is completely open sourced so I could easily take its code, modify it where necessary to make a library out of it and then link with it, right? Well, not really. As it turned out, git-lfs tool is written in Go. And as nice as it is, I don’t really “speak” that language fluently. Furthermore, making ObjC/C interoperable with Go is really a nightmare (at least for now) and one needs to provide a lot of hooks and seemingly dirty tricks to make them work together. You can read more about it in this nice article, and the things explained there are just the tip of an iceberg.

With both obvious choices gone, the only remaining option was to reimplement Git LFS functionality from scratch. I started off by using official git-lfs tool on a bunch of test repositories, trying to figure out how it operates and what changes to repositories it makes. It took trying out a lot of different use-case scenarios, as well as huge trial and error effort. That alone was not enough to understand all the things I needed to. Hence, I also had some extensive Q&A correspondence with Git LFS support people. They were very friendly and supportive responding to my questions and I am really THANKFUL for that. Without some insights they gave me I would not be able to figure it all out by myself. Once I understood how everything worked and behaved, I had to find the way to implement it all in such a manner, that it fits nicely into GitFinder architecture and design. Because, git-lfs tool is not just an independent tool executing some dedicated commands (although it does that too). It can be thought of as of a git extension, and some git commands need to “expand” their reach into Git LFS "domain" in order for the whole system to work as intended. That integration between two tools is made through git hooks. In short, a certain git operation can run a certain executable script/tool pre- or post-operation. And for the core-git + git-lfs combo to work properly, some hooks executed by core-git tool have to call out git-lfs tool. Needless to say, I did not want all that mockery. I wanted nice, tight and direct integration of two domains, not hooks nor other gules in between.

The task turned out not to be very complicated. The libgit2 library is well designed and making it work with my own implementation of Git LFS was smooth. However, it also wasn’t trivial. Git LFS does a lot of things and that shows in the number of lines of code I had to write. That is the reason it took rather long between GitFinder releases 1.2 and 1.3. But it was worth it. It ended up with, as far as I know, the only third party implementation of Git LFS functionality. It is also fast and pretty robust. Of course, some minor issues would probably eventually appear here and there, but they should be pretty easy to fix. I should also say not all Git LFS commands are implemented. Some of them make no sense in the context of GitFinder functionality, like update, or pointers and filters related commands. Git LFS file locking feature, available as of Git LFS version 2.0, is also not implemented, but it will find its way into GitFinder very soon. The rest of the core functionality is there and it should be sufficient for most users.

As a side note, I should also mentioned the version 1.3, which brings Git LFS support, also brings support for git hooks, a feature already mentioned above. In this first instance, supported hooks are: pre-commit, post-commit, commit-msg, post-checkout, post-merge, and pre-push. Making others work with libgit2 library will be a bit challenging, especially for update/receive related hooks, but that will eventually happen as well. And this all round up a new version of GitFinder. The development is a bit slow at times, but the initial aim remains the same: nice and full-featured, yet fast and lightweight git client, easily used directly from Finder and on top of it all, well-factored, sandboxed and secure. The journey continues…

← All articles