Some time back, my colleague Josh Johanning wrote a post about Migrating Git Repos with LFS Artifacts. He describes the process of performing a migration when Git Large File Storage (LFS) artifacts are involved. The basic process involves pulling the objects to a machine, then pushing it to a new remote. This is an excellent process that covers the majority of situations, but there is one case where more effort is required. When an .lfsconfig
is involved, there can be some additional steps required.
LFS and the endpoint
Git LFS is designed to make it easier to work with large files and binary content that otherwise might not be a good fit for Git. It does this by pushing the files to a separate endpoint, and then storing a pointer to that file in the repository. This allows the repository to remain small and fast, while still allowing the large files to be stored and retrieved as needed.
The way it is designed, LFS is very configurable. One part of that configurability is tied to how it determines the endpoint to use. By default, the HTTPS Git endpoint is used as the base URL for the LFS functionality. If the Git endpoint does not end with .git
, the URL is modified to append .git
. For example, if the URL is https://github.com/someorg/somerepo
, the base endpoint is corrected to https://github.com/someorg/somerepo.git
. After that, the path /info/lfs
is added to the URL (for example, https://github.com/someorg/somerepo.git/info/lfs
).
Changing the endpoint
Now that you understand the default behavior, it’s time to understand how it can be configured. Git LFS looks for configuration settings in a file called .lfsconfig
in the root of the repository. In addition, those settings can be overriden by any configuration in the Git configuration files. Settings in these files can change how the endpoint is resolved.
The first configuration that is examined is lfs.url
. Then this is present, it provides the base URL that should be used for LFS. Alternatively, remote.<remote>.lfsurl
can be used to specify the LFS URL for a specific remote. If the operation is an upload, then lfs.pushurl
and remote.<remote>.lfspushurl
are also used to determine if a different endpoint should be used for uploading the LFS objects. When there is a conflict, entries in the Git configuration always take precedence.
As a result, configuring the endpoint can change where LFS data is stored and where it is retrieved.
Migrating with LFS Configuration
When migrating a repository that uses LFS, the configuration can change the endpoint that is used. That can cause a challenge with the traditional migration approach. With GitHub, the steps are essentially:
1git clone --bare https://github.com/${sourceOwner}/${repo}
2cd ${repo}
3git lfs fetch --all
4git push --mirror https://github.com/${destinationOwner}/${repo}
5git lfs push --all https://github.com/${destinationOwner}/${repo}
This approach pushes the repository and its LFS data to the destination organization. If the repository contains an .lfsconfig
file, interactions with Git LFS could remain pointed to the wrong endpoint. Rather than using the default endpoint (the new post-migration repository location), Git could interact with the location specified in the configuration file. Despite the LFS data having been migrated to the new location, the configuration could continue to use the original repository storage.
1[lfs]
2 url = http://lfs-server.com
To resolve this, the repository needs to be updated to use the correct endpoint. If the goal is to make the new LFS endpoint use the Git repository’s endpoint, the .lfsconfig
file can be updated to remove the lfs.url
and lfs.pushurl
entries. The LFS interactions will fall back to the default endpoint. This can be handled programmatically with the following command:
1git config unset -f .lfsconfig lfs.url
Of course, this can also be done manually by editing the .lfsconfig
file. A modified version of the steps from above might look like this:
1git clone https://github.com/${sourceOwner}/${repo}
2cd ${repo}
3git lfs fetch --all
4git config unset -f .lfsconfig lfs.url
5git config unset -f .lfsconfig lfs.pushurl
6git add .lfsconfig
7git commit -m "Remove LFS configuration"
8git push --mirror https://github.com/${destinationOwner}/${repo}
9git lfs push --all https://github.com/${destinationOwner}/${repo}
Moving the remote source
In some cases, the goal is to move the LFS data to a new location or LFS server. This can also be done by updating the .lfsconfig
file. You can use the following command to update the .lfsconfig
file to point to the new endpoint:
1git config -f .lfsconfig lfs.url "http://my-server.com"
The result of this will be the updated .lfsconfig
file:
1[lfs]
2 url = http://my-server.com
As a practical example, you could configure the repository to use Artifactory for LFS by using an appropriately configured URL:
1git config lfs.url "https://${user}:${password}@${domain}.jfrog.io/artifactory/api/lfs/${repoKey}"
There’s one more thing to know. If the goal is just to update the endpoint used for pushing the LFS content, you can instead configure lfs.pushurl
. This is helpful if you want to use a cache server (or a special read endpoint) for reading the LFS data while using a different server for receiving pushed LFS data.
Completing the migration
As with any migration, it’s important to test and validate your changes. Take the time to test the migration and its configuration. Ensure that the LFS data is being stored and retrieved as expected. Adequate planning and testing is the key to a successful migration. It’s also the surest way to avoid any unexpected surprises (such as content ending up in the wrong LFS storage endpoint).