Branches and Tags Mapping
- 1. Overview
- 2. Branches and Tags in Subversion
- 3. Branches and Tags in Git
- 4. Configuration Options
- 4.1 Configuration Options: svn.url
- 4.2 Configuration Options: svn.trunk
- 4.3 Configuration Options: svn.branches
- 4.4 Configuration Options: svn.tags
- 4.5 Configuration Options: svn.shelves
- 4.6 Configuration Options: svn.excludeBranches
- 4.7 Configuration Options: svn.excludePath
- 5. SVN Layout Examples
- 5.1 Single Directory
- 5.2 Standard Layout
- 5.3 Multiple Projects per Branch
- 5.4 All Branches in a Single Directory
- 5.5 Layout Changed over History
- 6. Excluding Branches and Tags
- 7. Excluding paths
- 7.1 Excluding Particular Files or File Types
- 7.2 Importing Particular Files or Directories Only
Chapter 1. Configuration Overview
SubGit is a tool for smooth SVN to Git migration; it can either import SVN project into Git repository - by subgit import
command - or establish constant mirror by subgit install
command. Both processes require many details to run - such as SVN project location, authors mapping, daemon settings, authentication etc. - and probably the most crucial mapping settings, that establish how SVN entities match those on Git side. These mapping conjurations are what all the SVN-to-Git magic starts from: in fact, they are pretty simple although may look knotty sometimes - and this is what this article is intended for - explain all the mapping settings, shed light on them and thus uncover all their secrets to the very end.
Both import and mirror operations rely on SubGit configuration file where all the settings reside. This configuration file is being created by subgit configure
command and resides in subgit directory inside newly created Git repository. There are number of settings, but this article is intended to describe mapping settings in [svn] configurations section, namely:
i.e., those very settings that set SVN to Git entities matching.
Now, let’s discuss SVN and Git objects to sort out how they can match to each other.
Chapter 2. Branches and Tags in Subversion
Subversion is very mature and robust version control system, that has many abilities, like branching and merging, tags, properties etc. At the core of Subversion is a repository, which is the central store of that system’s data. The repository stores information in the form of a filesystem tree — a hierarchy of files and directories.
Commonly, a Subversion repository holds the files for several projects: e.g., software products projects, like, say, ‘Calc’, ‘Paint’, ‘Notepad’ and so on. Every project in a repository represented by its own filesystem subtree, that is, each project has subdirectory or subdirectories inside the repository where project’s files are being stored.
Subversion supports branching functionality - a feature that allows to create additional lines of development that exist independently of other lines, including main, yet still shares a common history. A branch always begins life as a copy of something, and moves on from there, generating its own history; Subversion creates branches by copying your project directory tree to another location and remembering that the copies are related to one another. Thus basically a branch is just another one subdirectory in a repository, usually within a project directory.
Besides that, Subversion supports tags: those are just a ‘snapshots’ of a project in time. The tags can be simple or complex, but inherently, they also are just a snapshots of the project or its part - that is, they also are just a subdirectories in the repository as branches are.
So, common SVN repository contains several projects, each of the projects has its own main development line directory and branches and tags directories, thereby forming repository filesystem tree. This tree may be organized in different ways - each project may have its own directory or all of them may be stored directly within repository root directory - Subversion provides the ultimate flexibility in terms of how to arrange data and hence you may arrange your repository in any way that you choose. It is recommended, though, to follow a repository layout convention in which a handful of strategically named Subversion repository directories convey valuable meaning about the data they hold: “main line”, or trunk, of development; some branches, which are divergent copies of development lines; and some tags, which are named, stable snapshots of a particular line of development. So, the first recommendation is to have a recognizable project root in the repository for each project and the second is that each project root contains a trunk
subdirectory for the main development line, a branches
subdirectory in which specific branches (or collections of branches) will be created, and a tags
subdirectory in which specific tags (or collections of tags) will be created. Thus the recommended layout for Subversion repository may look like:
/repository
/project_1
/trunk
/branches
/tags
/project_2
/trunk
/branches
/tags
…
…
/project_N
/trunk
/branches
/tags
This is just recommendation, though, this layout is not to be mandatory followed; but it is crucial to distinguish those entities in your project - “main line” (trunk), divergent lines (branches) and snapshots (tags) - since they are extremely important for import or mirror.
Chapter 3. Branches and Tags in Git
Git is also mature and robust version control system like SVN is. It has many abilities and features and, of course, it supports branches and tags, like SVN does. Git uses another data store approach, though, so branches and tags in Git are not just directories in filesystem structure as it is in SVN; instead, they are pointers to specific commits.
Git thinks of its data like a set of snapshots of a miniature filesystem. Every time you commit or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a commit object that contains a pointer to the snapshot of the content you staged. This object also contains the author’s name and email, the message that you typed, and pointers to the commit or commits that directly came before this commit.
A branch in Git is simply a lightweight movable pointer to one of these commits. The default branch name in Git is master - this is default branch that’s being created along with a Git repository. In fact, the master branch is just another branch, like any other; it even has not to be named master - it’s just a default name that you can change if you want. However, since this branch is being created with the repository and thus exists from the very beginning, many people use it as “main line” of development (that corresponds to SVN trunk entity). It is not mandatory, though - again, the master branch is just another branch and you can use it in the way you want.
Git also supports tags; they can be so-called light-weight or annotated: the first is very much like a branch that doesn’t change – it’s just a pointer to a specific commit. The latter, annotated tags, however, are stored as full objects in the Git database. They’re checksummed; contain the tagger name, email, and date; have a tagging message; and can be signed and verified with GNU Privacy Guard (GPG).
There’s obviously no such thing as ‘recommended layout’ for Git repository; for successful “SVN-to-Git mirror, though, it is crucial to understand which branch is “main line” of development and which branches are divergent lines of development, since SubGit needs to know how those entities match each other.
Chapter 4. Configuration Options
As it’s mentioned above, this article is intended to describe mapping settings in [svn] configurations section, namely:
Those settings describe how SVN and Git entities match each other; or, in other words, these settings tell SubGit how to establish a correspondence between SVN trunk, branches and tags and those in Git repository. Now let’s look at these settings more closely:
svn.url
This option sets URL to the SVN project that will be imported into Git repository. Note, that SubGit is supposed to translate SVN project into one single Git repository: i.e., SVN project corresponds to Git repository, not SVN repository to Git repository. To make it more descriptive, assume you have SVN repository with some projects in it:
/repository
/Calc
/Paint
/Notepad
/Wordpad
/Explorer
every single project - Calc, Paint, etc. - is supposed to be mirrored (or imported) into one single Git repository:
/repository
/Calc ----> Calc.git
/Paint ----> Paint.git
/Notepad ----> Notepad.git
/Wordpad ----> Wordpad.git
/Explorer ----> Explorer.git
Thus, the svn.url
settings is supposed to point to your SVN project root, not to SVN repository root; i.e., if you want to mirror your ‘Calc’ project to Git, then the svn.url must be set as follows:
http://example.com/svn/repository/Calc
but not:
http://example.com/svn/repository
Note, that most of the rest mapping settings rely on this path and their values are being set relative to this project root, so it is very important to set this URL correctly.
The next four options - trunk, branches, tags, shelves - actually tell SubGit how SVN to Git entities match each other. Generic mapping syntax is:
<Subversion-Path-Pattern>:<Git-Reference-Pattern>
Subversion-Path-Pattern is SVN directory path relative to svn.url
; Git-Reference-Pattern is a path inside Git repository where references will be stored. The mapping settings are rather flexible, but there are few rules you must follow:
- trunk must be mapped; the only exclusion is the configuration with no defined mapping - no trunk, branches, tags and shelves options at all - that may be used to import SVN projects with no branches.
- there must be only one trunk mapping.
- there might be only one shelves mapping.
- branches, tags and shelves are optional and may be omitted, one by one or all together.
- you can set any number of branches and tags you need.
- each mapping must be unique - that is, you can’t set different SVN branches paths to be mapped to the same Git reference.
svn.trunk
This setting establishes a correspondence between SVN “main line” of development path and Git branch that is supposed to be “main line”. SVN recommended layout advises to place “main line” into trunk
directory within the SVN project directory, and, from other hand, people often use master
Git branch as the “main line”: taking these considerations into account default trunk
mapping is being set as:
trunk = trunk:refs/heads/master
such setting maps SVN trunk
directory to Git master
branch; note that trunk
directory is relative to the svn.url: that is, if the svn.trunk is set to be trunk and svn.url is set to http://example.com/svn/repository/Calc, then SubGit will search your trunk
at http://example.com/svn/repository/Calc/trunk.
svn.branches
As its name implies, this setting establishes a correspondence between SVN and Git branches. This setting default value supposes recommended SVN layout - that is, SVN project branches reside in branches directory:
branches = branches/*:refs/heads/*
This setting supports wildcard because there may be a number of branches exist; on the other hand, you can set direct match with no wildcard if you need to mirror/import only one branch e.g. or if branches reside in different SVN directories. The svn.branches
settings is relative to svn.url
in the same way as svn.trunk
.
There might be more that one branches
settings in the configuration, so you can create as many branches
as you need. So if you have many SVN directories that store branches, then you can tell SubGit to translate them all:
branches = branches/*:refs/heads/*
branches = feature_*:refs/heads/features/*
branches = hotfix/*_*:refs/heads/hotfix/*/*
branches = special_branch:refs/heads/special_branch
svn.tags
This setting establishes a correspondence between SVN and Git tags. The default value assumes recommended SVN layout:
tags = tags/*:refs/tags/*
and it tells SubGit to mirror/import SVN tags
to Git lightweight tags
and, vice versa, translate any kind of tags
from Git to SVN. Like branches
, this setting supports wildcard or direct match and tags
path here is also relative to svn.url
.
Similarly to branches
, there might be more than one tags
settings:
tags = 1.0/*:refs/tags/1.0/*
tags = 1.5/*:refs/tags/1.5/*
tags = 2.0/*:refs/tags/2.0/*
so you can translate all the tags from SVN even if they reside in different SVN directories.
svn.shelves
Shelves are special entities being introduced by SubGit. SubGit creates shelves during Git-to_SVN commits translation when it cannot determine branch name where particular commit belongs to.
Such situation may appear when some commits were made to some Git branch (say, branch), then this branch was merged into another branch (e.g. master), but only second branch (master) was pushed into the repository. SubGit will see all the commits including those ones that were made in branch, but it won’t be able to recognize the branch name and therefore it won’t be able to create commits to suitable SVN branch.
In that case, SubGit will create some directory in SVN directory set to be shelves
and will place all the commits there so that all those commits will be shown in SVN log. If the svn.shelves
is not set (i.e. left commented in configuration file) then SubGit won’t create the shelf - in that case all the commits that were initially made in Git branch won’t be visible in SVN log, see our blog post for more details.
The setting is similar to branches
and tags
, it sets SVN directory where shelves will be stored:
shelves = shelves/*:refs/shelves/*
and the path to this directory is relative to svn.url
too.
That’s how it works and that’s what the shelves are intended for; this setting is not mandatory and can be omitted, but keep in mind that some of the Git commits history may not be mirrored to SVN then.
svn.excludeBranches
svn.excludeTags
These two settings make reverse job: while all the previous settings tell what to include into mirror/import, these tell SubGit which branches or tags to be left untouched. They are literally interchangeable, any of the two can be used to exclude both branches and tags.
The setting syntax as simple as:
excludeBranches = SIMPLE_PATTERN
excludeTags = SIMPLE_PATTERN
the SIMPLE_PATTERN is either full branch or tag name or a name pattern with one wildcard - i.e. the SIMPLE_PATTERN may contain zero or one ‘*’ but no more. During the import SubGit evaluates every branch or tags within SVN branches
and tags
directories (that is, the directories being set by branches
and tags
options) against this pattern and excludes any branches/tags that matches given SIMPLE_PATTERN - in other words, SubGit appends SIMPLE_PATTERN to given svn.url
and excludes any paths that match the resulting pattern. For example, say you have set branches like this:
branches = branches/*:refs/heads/*
and you want to import all the branches excluding those related to some feature and thus have certain name pattern. In such case you can exclude them from import/mirror by the following setting:
excludeBranches = branches/feature_*
assuming that the branches you don’t want to import have feature_* name pattern.
The same applies to tags; assume, that tags
mapping is set like following:
tags = tags/*:refs/heads/*
so, if you want to exclude some feature_* tags you can do it by the following:
excludeTags = tags/feature_*
or
excludeBranches = tags/feature_*
both excludeBranches
and excludeTags
do the same work, so it doesn’t really matter which one is used. What really does matter is the settings value.
svn.excludePath
svn.includePath
These are settings that provide a possibility to exclude (or vice versa, include) some file from being mirrored or imported. It may be useful if you have some big files in SVN that you don’t want to be present in Git repository. The syntax is simple:
excludePath = PATTERN
includePath = PATTERN
The excludePath
has a priority over includePath
: that is, if some particular file or path matches both patterns, then it will be excluded.
PATTERN is an expression that represents files to be excluded or included; PATTERN format is described below:
-
The PATTERN can be recursive and non-recursive. If the PATTERN does not contain slashes - “/” - it happens to be recursive - that means that SubGit will search all the SVN project directories recursively for this PATTERN. If the PATTERN does contain one or more slashes - such PATTERN is called non-recursive and that means that SubGit will treat the PATTERN as path pattern, not filename - and thus SubGit will search for path that match the PATTERN and will exclude all the files and directories whose path complies with the PATTERN. Therefore, if you set the
excludePath
to be:excludePath = *.html
SubGit will treat it as a recursive pattern and will exclude all the *.html files in all directories. But if you set it like this:
excludePath = */*.html
SubGit will treat it as non-recursive and thus will search *.html files within second level directories. Note, that this path pattern is relative to
trunk
,branches
andtags
paths, i.e. SubGit will exclude all the files whose paths match with the following path patterns (supposingsvn.url
is set to http://example.com/svn/project and the standard SVN project layout followed):http://example.com/svn/project/trunk/*/*.html http://example.com/svn/project/branches/*/*/*.html http://example.com/svn/project/tags/*/*/*.html
-
Leading slash “/” is non-recursive PATTERN particular case: it just represents particular path pattern to be excluded or included. E.g., if
svn.url
is set to http://example.com/svn/project and the standard SVN project layout followed, then the setting like that:includePath = /Docs
will include only ‘Docs’ directory within trunk, branches and tags directories: that means SubGit will translate only paths below:
http://example.com/svn/project/trunk/Docs http://example.com/svn/project/branches/*/Docs http://example.com/svn/project/tags/*/Docs
from other hand, the settings like:
excludePath = /Docs/*.html
will exclude all the *.html files within ‘Docs’ directory in trunk, branches and tags directories: i.e. such setting will exclude the following paths:
http://example.com/svn/project/trunk/Docs/*.html http://example.com/svn/project/branches/*/Docs/*.html http://example.com/svn/project/tags/*/Docs/*.html
-
Trailing slash “/” in non-recursive PATTERN is not allowed.
-
A wildcard “*” means as any number of legal symbols. It can be used both in recursive or non-recursive patterns.
-
A question mark “?” means any one legal symbol. Again, it can be used both in recursive or non-recursive patterns.
-
Brackets ‘[ ]’ set a list of possible characters in the PATTERN: e.g. the following:
excludePath = [aoe]*.html
will recursively exclude all the *.html files which name starts with ‘a’, ‘o’ or ‘e’.
-
Spaces within the PATTERN are allowed:
excludePath = some file name excludePath = some directory/*.exe
but leading and trailing spaces need to be quoted:
includePath = "filename " includePath = "directory /*.exe"
otherwise SubGit will ignore leading/trailing spaces.
Note that if you have two directories or files at the same SVN file structure level, whose names differ only by leading or trailing spaces - say, “Docs” and “Docs “ — that one containing trailing or leading spaces will not be imported/mirrored no matter how excludePath/includePath is set!
-
Two consecutive asterisks “**” in non-recursive PATTERN mean path of any lenght. That is, the PATTERN with leading asterisks:
excludePath = **/foo/*.exe
means foo directory with *.exe files inside anywhere in SVN project. Leading asterisks pattern has some peculiarity: if you have a directory where all the files match given PATTERN - then not only files will be excluded, the directory itself will not be mirrored/imported as well. Thus, if foo directory contains only *.exe files, then foo will not be imported/mirrored.
Trailing asterisks allows to create non-recursive pattern that will match to directories only, namely:
excludePath = Docs/**
will exclude Docs directory in trunk, branches and tags root, but won’t touch Docs files anywhere else.
Two consecutive asterisks within path pattern mean just that - path of any lenght; that is, setting like:
excludePath = Docs/**/*.exe
will exclude any *.exe files in Docs directory and its subdirectories.
When branches, tags or paths are excluded from the translation with the help of one of the above options, modifications to those branches or paths made in Subversion project would not be translated to Git, as well as changes in Git repository made to those locations would not appear in Subversion.
Chapter 5. SVN Layout Examples
So far, we were discussing mapping settings and mapping basics. This chapter describes most often used Subversion repositories layouts and provides some examples how to set mapping for the SVN project of those layouts.
First, we’ll examine how to map single-directory layout - i.e. a SVN project with no branches and tags, with only “main line” directory; the same mapping approach can apply even if there are branches and tags, but they are not needed and only trunk has to be translated to Git - or even if only one particualar branch has to be imported or mirrored to Git.
The second example descibes standard SVN layout - that is, the layout SVN team recommends to stick to. We will also examine some extensions to the standard layout: layout with different directories names, and one more, that has extra directories for branches or tags.
Thirdly, there’s a repository layout where multiple projects reside within every branch and tag. Standard SVN layout supposes each SVN project to reside in repository’s root and to have its own “trunk”, “branches” and “tags” directories - this layout is reversed: the “trunk”, “branches” and “tags” reside in repository’s root and there are multiple projects within every branch and tag.
Fourth SVN project layout, in fact, hasn’t any structure at all: all the directories live right in the project’s root - i.e., each project in SVN repository has its own directory in the repository root, but all the “trunk”, “branches” and “tags” directories inside project’s directory are situated at the same level.
The last case we will overview is not layout example actually – instead, we will examine how does SubGit import or mirror SVN project which layout was changed at some time back in the past. Of course, both initial and posterior layouts are combinations of the ones above, too, so mapping itself is no new, but there are some mirror/import specifics that would be better to consider.
Single Directory
This layout is probably the simplest possible case to map since there’s only one single directory, with no branches or tags or any nested directories that can make the mapping more complex. Actually, this approach can apply to several different layouts: it can be some relatively simple SVN repository that has no branches or tags and all the files live right in repository’s root:
/repository
file1
file2
…
fileN
In this case, all that is needed to translate the project into Git is to set svn.url
to point to SVN repository’s root directory:
http://example.com/svn/repository
And all the trunk
, branches
, tags
and shelves
setting in SubGit’s configuration file can be either commented:
#trunk = trunk:refs/heads/master
#branches = branches/*:refs/heads/*
#tags = tags/*:refs/tags/*
#shelves = shelves/*:refs/shelves/*
or just removed.
Note, that in such case (repository root is used) it’s impossible to add branches or tags later on.
The same approach applies to the case when all the files reside in project’s (not repository) root directory: that is, each project in repository has its own directory, but the project being translated has no branches and tags and all the files are situated right in project’s root:
/repository
/project_1
file1
file2
…
fileN
/project_2
…
…
/project_N
Similarly, we can just set svn.url
to point to project root:
http://example.com/svn/repository/project_1
and comment the mapping settings:
#trunk = trunk:refs/heads/master
#branches = branches/*:refs/heads/*
#tags = tags/*:refs/tags/*
#shelves = shelves/*:refs/shelves/*
or remove them at all.
Unlike the first case though, the mapping for this layout can be set in one more way: we can set the svn.url
to point the repository root directory:
http://example.com/svn/repository
and set project directory to be trunk
:
trunk = project_1:refs/heads/master
#branches = branches/*:refs/heads/*
#tags = tags/*:refs/tags/*
#shelves = shelves/*:refs/shelves/*
This approach will also translate the project to Git; but additionally, branches and tags can be created and added to translation later with almost no configuration effort since project’s directory acts as a trunk.
One more common case is that only one particular branch has to be translated to Git. The repository and project can be arbitrary complex, e.g.:
/svn
/repository
/project
/trunk
/branches
/branch_1
…
/bugs
…
/features
…
/tags
…
/major_releases
/minor_releases
…
but if only branch_1 needs to be translated - it can be done as simply as
url = http://example.com/svn/repository/project/branches
trunk = branch_1:refs/heads/master
#branches = branches/*:refs/heads/*
#tags = tags/*:refs/tags/*
#shelves = shelves/*:refs/shelves/*
that is, we set svn.url
to point to the branches directory inside the project directory and set branch_1 to be trunk
.
Another way is to set svn.url
to point directly to the branch_1 directory:
url = http://example.com/svn/repository/project/branches/branch_1
and comment (or remove) all the rest options:
#trunk = trunk:refs/heads/master
#branches = branches/*:refs/heads/*
#tags = tags/*:refs/tags/*
#shelves = shelves/*:refs/shelves/*
Both ways are equivalent - the branch_1 will be translated into Git’s master branch.
Standard Layout
As it was mentioned above, a SVN repository can contain many projects and SVN team recommends to follow a repository layout convention in which a handful of strategically named Subversion repository directories convey valuable meaning about the data they hold: “main line”, or trunk, of development; some branches, which are divergent copies of development lines; and some tags, which are named, stable snapshots of a particular line of development. Thus the recommended layout for Subversion repository may look like:
/repository
/project_1
/trunk
/branches
/tags
/project_2
/trunk
/branches
/tags
…
…
/project_N
/trunk
/branches
/tags
SubGit translates any given SVN project into Git repository and thus it translates SVN “main line”, “branches” and “tags” to Git “master” branch, Git branches and tags respectively. SubGit supposes recommended SVN layout (unless --trunk
and --layout auto
are used with subgit configure
command) so initial mapping layout it creates as simple as:
trunk = trunk:refs/heads/master
branches = branches/*:refs/heads/*
tags = tags/*:refs/tags/*
shelves = shelves/*:refs/shelves/*
and this scheme matches to recommended SVN layout perfectly: it will translate SVN “main line” into Git “master” branch, all the SVN branches and tags (that are being stored in according directories) into Git branches and tags and will further translate anonymous Git commits into SVN shelves.
Another SVN project layout doesn’t follow recommended naming convention, but follows recommended structure and meaning: that is, there are distinguishable “main line”, “divergent copies” and “stable snapshots” in SVN, but each directory has a name that differs from recommended, e.g.:
/svn
/repository
/project
/main
/offshoots
/snapshots
main directory here is the “main line” of development (and thus has the same meaning as trunk
does), the offshoots directory contains “divergent copies” (i.e. branches
) and the snapshots is for “stable snapshots” (that is, tags
). In such case the mapping scheme is still the same, but SVN directories name must be changed to reflect actual SVN directories names:
trunk = main:refs/heads/master
branches = offshoots/*:refs/heads/*
tags = snapshots/*:refs/tags/*
shelves = shelves/*:refs/shelves/*
and again, this mapping scheme matches to the SVN layout perfectly and SubGit will translate all the present SVN data.
There might be another SVN project layout which structure rather close to the recommended layout: it follows the naming convention, but there is more than one directory for SVN “divergent copies” and/or “stable snapshots”, e.g.:
/svn
/repository
/project
/trunk
/branches
/bugs
/features
/tags
/major_releases
/minor_releases
trunk directory is just trunk
; but there are three different directoires for “divergent copies” - branches, bugs and features - and the same for - “stable snapshots” - tags, major_releases and minor_releases. In such case the mapping scheme might look like that:
trunk = trunk:refs/heads/master
branches = branches/*:refs/heads/*
branches = features/*:refs/heads/features/*
branches = bugs/*:refs/bugs/*
tags = tags/*:refs/tags/*
tags = major_releases/*:refs/major_releases/*
tags = minor_releases/*:refs/minor_releases/*
shelves = shelves/*:refs/shelves/*
this scheme maps all the directories, so SubGit will translate all the branches and tags to Git. Actually some SVN directories might be omitted if they are not needed in Git. E.g., if “bugs” branches are not needed in Git this line could be omitted in the mapping scheme:
trunk = trunk:refs/heads/master
branches = branches/*:refs/heads/*
branches = features/*:refs/heads/features/*
tags = tags/*:refs/tags/*
tags = major_releases/*:refs/major_releases/*
tags = minor_releases/*:refs/minor_releases/*
shelves = shelves/*:refs/shelves/*
Having such mapping configuration SubGit will translate all the branches and tags excluding those in “bugs” directory.
Multiple Projects per Branch
The Subversion team recommends layout where each project has its own trunk, branches and tags directories where “main line”, “divergent copies” and “stable snapshots” of development reside. However, some users prefer another layout, where these directories exist in the SVN repository root and each branch and tag stores multiple projects, so the repository looks like that:
/repository
/trunk
/project_1
/project_2
…
/project_N
/branches
/branch_1
/project_1
/project_2
…
/project_N
/branch_2
/project_1
/project_2
…
/project_N
…
/branch_N
/project_1
/project_2
…
/project_N
/tags
/tag_1
/project_1
/project_2
…
/project_N
/tag_2
/project_1
/project_2
…
/project_N
…
/tag_N
/project_1
/project_2
…
/project_N
and forms the layout that can be called “inverted” as the directories order is reversed.
In this case, mapping scheme is a bit more complex than standard cases; however, the entities are perfectly distinguishable although they reside in nested directories. Here is one big difference though: earlier, we were talking that svn.url
must point to SVN project root - but there’s no projects’ root here since each project has its directories spread across the repository. So we have to set svn.url
to point to repository root, no matter which one project we want to mirror to Git. E.g., the svn.url
might look like:
http://example.com/svn/repository
and all the mapping paths must be set relative to this URL.
All the projects “main lines” of development reside in /repository/trunk directory, so project_1’s “main line” resides in /repository/trunk/project_1. Thus, if you want to mirror project_1 to Git, you can set trunk mapping line like that:
trunk = trunk/project_1:refs/heads/master
which means that full path to SVN trunk
is:
http://example.com/svn/repository/trunk/project_1
The same for branches
and tags
: project_1’s “divergent copies” and “stable snapshots” reside in /repository/branches and /repository/tags respectively and thus full mapping scheme might look like this:
trunk = trunk/project_1:refs/heads/master
branches = branches/*/project_1:refs/heads/*
tags = tags/*/project_1:refs/tags/*
shelves = shelves/*/project_1:refs/shelves/*
Another project should be mirrored to another Git repository, but mapping scheme and svn.url
still the same for all the project in such SVN repository. That is, for project_N the mapping scheme looks as follows:
trunk = trunk/project_N:refs/heads/master
branches = branches/*/project_N:refs/heads/*
tags = tags/*/project_N:refs/tags/*
shelves = shelves/*/project_N:refs/shelves/*
and svn.url
must be set the same:
http://example.com/svn/repository
In fact, actual directories names are not important, they can have any names you want to set; but it is crucial to distinguish SVN entities: i.e. trunk, branches and tags can be called main, offshoots and snapshots:
/repository
/main
/project_1
…
/project_N
/offshoots
/offshoot_1
/project_1
…
/project_N
/offshoot_2
/project_1
…
/project_N
…
/offshoot_N
/project_1
…
/project_N
/snapshots
/snapshot_1
/project_1
…
/project_N
/snapshot_2
/project_1
…
/project_N
…
/snapshot_N
/project_1
…
/project_N
but it is crucial they contain correct SVN entities: “main line”, “divergent copies” and “stable snapshots”. In such case mapping can be set in the same way:
trunk = main/project_N:refs/heads/master
branches = offshoots/*/project_N/*:refs/heads/*/*
tags = snapshots/*/project_N/*:refs/tags/*/*
All Branches in a Single Directory
The next common SVN projects layout is, in fact, no layout: all the SVN directories - branches, tags and all - just lay in project’s root or inside its subdirectories, not following any placement conventions or certain layouts. Having such a layout it’s a good idea to follow some naming convention at least, but generally, directories names don’t follow any conventions either. So the SVN project might look like this:
/repository
/project_N
/bug_1224 # branch
/issue1678 # branch
/features # second level directory
/f4536 # branch
/feat 4589 # branch
/main # "main line" (trunk)
/v.1.5.6_b3345 # some tag (snapshot)
/1.5.7 build 7890 # some tag (snapshot)
SubGit can handle such SVN layout, but it could be quite effortful to create correct mapping scheme to mirror SVN project of that design. If the directories placement doesn’t follow any layout and directories names don’t follow any conventions, then the only way to map each and every SVN entity to Git counterpart is to create excplicit mapping for every directory. Thus for the Project)N above the mapping scheme can be set like this:
trunk = main:refs/heads/master
branches = bug_1224:refs/heads/bug_1224
branches = issue1678:refs/heads/issue1678
branches = features/f4536:refs/heads/f4536
branches = features/feat 4589:refs/heads/feat 4589
tags = v.1.5.6_b3345:refs/tags/v.1.5.6_b3345
tags = 1.5.7 build 7890:refs/tags/1.5.7 build 7890
The svn.url
must point to the Project_N root, so it must be set as:
url = http://example.com/svn/repository/Project_N
Of course, it might be rather laborious to create an explicit mapping for big projects - but unfortunately, it’s the only way if no conventions followed. If, say, directories name do follow certain convention (or just have some similarities, even occasional), then wildcard can be used to map a bunch of branches (or tags) together and it can make the mapping process much easier. Suppose, the branches names follow name patterns “bug_xxxx”, “issue_xxxx” or “fxxxx”, and all the tags called like “v.x.x.x_xxxx” or “snapshot_xxxx”, i.e. the Project_N layout looks like this:
/repository
/project_N
/bug_1224 #branch
/bug_2379 #branch
/bug_2471 #branch
/issue_1678 #branch
/issue_1890 #branch
/issue_2021 #branch
/features #second level directory
/f4536 #branch
/f4589 #branch
/f4602 #branch
/main #"main line" (trunk)
/v.1.5.6_b3345 #some tag (snapshot)
/v.1.5.7_b7890 #some tag (snapshot)
/snapshot_3329 #some tag (snapshot)
/snapshot_3431 #some tag (snapshot)
then the mapping can be set like this:
trunk = main:refs/heads/master
branches = bug_*:refs/heads/bug_*
branches = issue_*:refs/heads/issue_*
branches = features/f*:refs/heads/f*
tags = v.*:refs/tags/v.*
tags = snapshot_*:refs/tags/snapshot_*
This will mirror all the present SVN branches and tags to Git repository.
Another approach to mirror such unordered layout and to make it ordered: that is, change the project layout so that the project follows some convention - e.g. SVN recommended layout. It can bring many advantages even besides SubGit mirroring facilitation, though such a relayout can be very hard. And additionally, such SVN project relayout brings some peculiarity to SubGit mirroring as well, it’s described in the next section.
Layout Changed over History
As it was mentioned, SVN project relayout brings a peculiarity to SubGit mirroring process. The peculiarity is that part of the SVN revisions history will be lost during import/mirror to Git. Not all the history, of course, but the history of relocated directories. Let’s take a look at the example: say, you created a branch called new_branch right in SVN project’s root:
/repository
/project
/new_branch
/branches
/tags
/trunk
you made some commit to that branch and then realized that new_branch was improper name so you renamed it to feat_1725:
/repository
/project
/feat_1725
/branches
/tags
/trunk
made some commit and finally decided that the branches had to reside in branches directory, so you moved it
/repository
/project
/branches
/feat_1725
/tags
/trunk
so SVN log show the following history for that branch:
$ svn log -v branches/feat_1725/
------------------------------------------------------------------------
r158 | user | 2017-06-05 12:38:19 +0500 (Mon, 05 Jun 2017) | 1 line
Changed paths:
A /kafka/branches/feat_1725 (from /kafka/feat_1725:157)
D /kafka/feat_1725
moved feat_1725 into branches directory
------------------------------------------------------------------------
r157 | user | 2017-06-05 12:37:41 +0500 (Mon, 05 Jun 2017) | 1 line
Changed paths:
M /kafka/feat_1725/main.c
edited main.c in feat_1725 in project's root
------------------------------------------------------------------------
r156 | user | 2017-06-05 12:36:56 +0500 (Mon, 05 Jun 2017) | 1 line
Changed paths:
A /kafka/feat_1725 (from /kafka/new_branch:155)
D /kafka/new_branch
renamed new_branch to feat_1725
------------------------------------------------------------------------
r155 | user | 2017-06-05 12:36:13 +0500 (Mon, 05 Jun 2017) | 1 line
Changed paths:
M /kafka/new_branch/main.c
changed main.c in new_branch in project's root
------------------------------------------------------------------------
r154 | user | 2017-06-05 12:35:03 +0500 (Mon, 05 Jun 2017) | 1 line
Changed paths:
A /kafka/new_branch (from /kafka/trunk:153)
new_branch created
At this point, if you import/mirror this SVN project to Git using standard mapping configuration:
trunk = trunk:refs/heads/master
branches = branches/*:refs/heads/*
tags = tags/*:refs/tags/*
shelves = shelves/*:refs/shelves/*
you won’t see all those changes in Git - the history will start from last moving operation:
$ git log
commit cec14db7c3e02e30b493456de4e6cf666eefcf3c
Author: user <user@example.com>
Date: Mon Jun 5 07:38:19 2017 +0000
moved feat_1725 into branches directory
Luckily, there is a way to retrieve whole history: to do this you need to add all the names this branch had into the mapping configuration. Thus, in this case we have to set the mapping configuration like this:
trunk = trunk:refs/heads/master
branches = branches/*:refs/heads/*
branches = feat_1725:refs/heads/feat_1725_1
branches = new_branch:refs/heads/new_branch
tags = tags/*:refs/tags/*
shelves = shelves/*:refs/shelves/*
Note, that neither new_branch
nor feat_1725
are present in the project’s root directory anymore, we add them into mapping configuration only to retrieve whole history. Now, having this mapping scheme we will see everything:
$ git log --pretty=oneline
224a955c70f8dac47cefc3eaa80e4d31d46a1a1c moved feat_1725 into branches directory
8dcc92d43b6a6457611f41b6672ff2c9a83c56da edited main.c in feat_1725 in project's root
219185494a74014984f5e0a5a79dfe23ec7f4095 renamed new_branch to feat_1725
f7c2fc95c20c675834d209c8b82bf364cb737d34 changed main.c in new_branch in project's root
a888f553cb7305251a167d547f5c2eafbc9624a9 new_branch created
Note that SubGit tool is smart enough and it’s mostly able to determine such cases and create correct mapping scheme when subgit configure
is used in conjunction with --layout auto
option.
This SubGit peculiarity doesn’t affect any other import/mirror aspects, all the rest features work fine, so if you don’t need to see whole the history in Git, then you can relayout the SVN project in any way you want. But if otherwise, it’s crucial for you to see the history in Git - it’s better not to change the layout and mirror the SVN project “as-is”.
Excluding Branches and Tags
Sometimes, not all the SVN entities are worth to be imported or mirrored to Git: branches or tags might lose their importance for a project or might be merged into other development line already, they might be just too old or they might be unneeded for those developers who will use Git repository. Whatever exclusion reason is, the branches and tags might be excluded from import/mirror by using two approaches depending on that whether explicit or wildcard mapping approach is used.
When explicit mapping is used, then particular branches and tags might just be omitted from mapping. For example, if the SVN project doesn’t follow any naming and placement conventions - i.e. “all branches in a single directory” layout is used - the only way to import/mirror the project to Git is to create an explicit mapping for all the branches and tags. In such case, to exclude any branch or tag from being imported/mirrored that very branch/tag just must be omitted from the mapping. E.g., for unordered SVN project we have considered earlier in the chapter 5.3:
/repository
/project_N
/bug_1224 #branch
/issue1678 #branch
/features #second level directory
/f4536 #branch
/feat 4589 #branch
/main #"main line" (trunk)
/v.1.5.6_b3345 #some tag (snapshot)
/1.5.7 build 7890 #some tag (snapshot)
and if we want to mirror features branches only, then we set mapping for those branches only:
trunk = main:refs/heads/master
#branches = bug_1224:refs/heads/bug_1224
#branches = issue1678:refs/heads/issue1678
branches = features/f4536:refs/heads/f4536
branches = features/feat 4589:refs/heads/feat 4589
tags = v.1.5.6_b3345:refs/tags/v.1.5.6_b3345
tags = 1.5.7 build 7890:refs/tags/1.5.7 build 7890
Commented bug and issue mapping line are left here just to visualize the difference, but actually, these lines can just be removed from the configuration.
Another approach to exclude branches and tags from import/mirror is to use
excludeBranches = SIMPLE_PATTERN
excludeTags = SIMPLE_PATTERN
configuration options. Actually, it’s the only way to exclude if wildcard mapping approach is used. Let’s consider a SVN project with several places where branches and tags are stored:
/repository
/project
/trunk
/branches
/july
/greg
/bugs
/bug_1920
/bug_2310
/bug_2450
/bug_2780
/bug_3310
/features
/f3425
/f3445
/f3450
/tags
/0.9
/1.0beta
/major_releases
/v1.0
/v1.5
/v2.0
/minor_releases
/v1.0.1
/v1.0.4
/v1.5.1
/v1.5.3
/v2.0.2
Whole the project structure can be mirrored to Git using the following mapping scheme:
trunk = trunk:refs/heads/master
branches = branches/*:refs/heads/*
branches = features/*:refs/heads/features/*
branches = bugs/*:refs/bugs/*
tags = tags/*:refs/tags/*
tags = major_releases/*:refs/major_releases/*
tags = minor_releases/*:refs/minor_releases/*
shelves = shelves/*:refs/shelves/*
If you want to import or mirror branches or tags from all the directories, but exclude certain few, the only way is to use the excludeBranches
option. Say, you don’t want to import/mirror greg branch and 1.0beta tag. The mapping scheme will be the same we created above, and exclusion can be done by the following options:
excludeBranches = branches/greg
excludeBranches = tags/1.0beta
Note, having these settings, SubGit will search given SVN directories for these patterns and will exclude all the directories that match them. Or, in other words, SubGit creates full paths using these patterns and exclude all the directories that match these paths. In this particular case SubGit will exclude all the directories found by the following paths:
http://example.com/svn/repository/project/branches/greg
http://example.com/svn/repository/project/tags/1.0beta
supposing the svn.url
is set to http://example.com/svn/repository/project.
These two patterns are explicit and actually point to certain SVN directories. However, if you want to exclude several branches that follow some naming pattern, then you can use one wildcard to created the pattern. Say, in addition to greg and 1.0beta we want to exclude some bugs branches:
excludeBranches = branches/greg
excludeBranches = tags/1.0beta
excludeBranches = bugs/bug_2*
Such a configuration will exclude bug_2310, bug_2450, bug_2780 branches that reside in bug directory.
Both the excludeBranches
and excludeTags
are equivalent, use can use either of the two to exclude both branches and tags. That is, settings
excludeBranches = branches/greg
excludeBranches = tags/1.0beta
excludeBranches = bugs/bug_2*
and
excludeTags = branches/greg
excludeTags = tags/1.0beta
excludeTags = bugs/bug_2*
do the exact same job, so you can use any of them.
Chapter 7. Excluding particular files and directories
There might be a need to exclude not only whole branches and tags, but particular files or directories from all the branches and tags. For example, there might be some binary files present in SVN repository or some auxiliary files or directories - all of them are present in SVN, but it doesn’t make any sense to pull them into new Git repository. That’s what excludePath
configuration option exists for - exclude files and directories from being imported/mirrored into Git.
From other hand, some files have to be included to import/mirror - that can be done by includePath
. Moreover, both options can be used together and such combination of the two allows to set mirroring with excellent granularity you can tell SubGit to exclude, say, all the binaries with the exclusion of certain file; or include all the source code files with the exclusion of those with certain names, etc.
Let’s take a closer look at these options and consider some examples.
Excluding Particular Files or File Types
It is worth sometimes to get rid of some unneeded files during SVN to Git translation, like, say, temporary files, some files OS creates, binaries, drive images - in short, all the files that aren’t needed for development and that aren’t worh to be copied. If such files have to be excluded from everywhere across the SVN project - that is, from any branch and tags - then the PATTERN has to be recursive and we can set the option like this:
excludePath = *.exe
excludePath = *.bin
excludePath = *.dl*
excludePath = ~*
excludePath = .DS_Store
Such configuration means that SubGit will ignore exe and bin files, all the files with extenstion dl* - e.g. dl, dl_, dll etc.; all the files that start with ~ and all the files named .DS_Store. All such files will be excluded, no matter where in SVN porject they reside - they won’t appear in Git at all.
If you need to exclude some files from particular SVN directory - that’s another story. There’s actually no choice but using non-recursive pattern and it’s a more complicated than recursive case since it requires all the paths where those files reside to be accounted. Suppose, you have a SVN project layout like this:
/svn
/repository
/project
/trunk
/branches
/branch_1
/source
/bin
/test
/prod
/opt
/docs
/html
/branch_2
/source
/bin
/test
/prod
/docs
/html
…
/tags
you set whole the project to be translated to Git, but you want not to include some file which reside in bin and docs/html directories - that is, all the files have to be translated but some files in those directories. That means we cannot use recursive PATTERN since it will exclude files everywhere and thus we have to set them exclicitly using non-recursive pattern and the configuration might be set like this:
excludePath = /bin/*.exe
excludePath = /bin/test/*.exe
excludePath = /bin/**/*.dl*
excludePath = /docs/html/*.pdf
excludePath = /**/~*
excludePath = .DS_Store
The non-recursive PATTERN provides very deep granularity so we can set exactly what we need. The configuration above tells SubGit not to translate exe files from bin and /bin/test directories, but those from /bin/opt directory will be translated - there’s no setting that would say anything about /bin/opt directory. At the same time, by that configuration SubGit is told not to translate dl* files from bin directory and all its subdirectories - the ** pattern portion means a path of any length so both bin itself and all its subdirectories are affected by that setting.
The next settings tells SubGit not to translate pdf files from /docs/html, but pdf from everywhere else will be translated.
Finally, there are the settings that tells not to translate ~* files from branches and DS_Store from everywhere in the SVN project - there’s no problem to use both recursive and non-recursive patterns together for different patterns at the same time so you can exclude some files from particular directories and other files from everywhere.
Importing Particular Files or Directories Only
Excluding files is extremely useful feature, but in some occasions it worth to include certain needed files, rather than exclude all the rest - e.g. you have some big SVN project with source code files, compiled binaries, libraries, installation packs, resources, documentation etc., all them are spread everywhere across the repository, but you want to have only source code files in you new Git repository. That’s the case the includePath
feature can solve.
Suppose, you have the following SVN project layout:
/repository
/project
/trunk
/bin
*.exe
*.dll
/source
*.cpp
*.cs
/docs
*.html
*.txt
*.exe
*.msi
*.cpp
*.cs
*.txt
/branches
…
/tags
…
There are several branches and tags and each of them has the structure similar to that trunk has - i.e. all the subdirectories trunk has and files of the same types. You want only source code files to be present in new Git repository - that is, every source code file from everywhere across the project and all the directories that contain source code files should be sent to Git, but no more. This can be done by the following configuration:
includePath = *.cpp
includePath = *.cs
After SubGit finishes import (or mirror) to Git, master Git branch (supposing trunk was mapped to master) will have the following layout:
/source
*.cpp
*.cs
*.cpp
*.cs
and that is exactly what we wanted to perform.
Note, that we didn’t explicitly tell SubGit to remove bin and docs directories - and in fact, it didn’t. Both those directories became empty since there were no included *.cpp and *.cs files in there, and that leads them to be deleted since Git cannot track empty directories. It’s kind of a hack - we have not to worry about which directories to be kept since with such configuration every directory containing included files will be preserved. That is, if we have *.cpp and *.cs files not only in source directory, but, say, in code too:
/repository
/project
/trunk
/bin
*.exe
*.dll
/source
*.cpp
*.cs
/code
*.cpp
*.cs
/docs
*.html
*.txt
*.exe
*.msi
*.cpp
*.cs
*.txt
/branches
…
/tags
…
then after the import finishes Git master will look like this:
/source
*.cpp
*.cs
/code
*.cpp
*.cs
*.cpp
*.cs
But what if we need to preserve source directory with all its files, but we don’t want to import code?
That’s another story that requires more configuration. In addition to includePath
directives, we need to explicitly tell SubGit to exclude unneeded directory by excludePath
directive, so final configuration will look like this:
excludePath = /code/**
includePath = *.cpp
includePath = *.cs
Code directory is set to be excluded explicitly, all the rest unneeded directories will be removed since they do not contain included files, so we receive intended outcome - source directory and *.cpp and *.cs files only.
Let’s go further and suppose we have several directories that contain files of included types and we want to include just some of them, e.g. we have the following SVN project:
/repository
/project
/trunk
/bin
*.cpp
*.cs
*.exe
*.dll
/lib
*.dll
*.so
*.cpp
*.cs
/framework
*.cpp
*.cs
*.py
/source
*.cpp
*.cs
/code
*.py
*.cpp
*.cs
/docs
*.html
*.txt
*.exe
*.msi
*.cpp
*.cs
*.txt
/branches
…
/tags
…
Say, we want to import source and code, but left all the rest directories behind. The first thought that comes in mind is to exclude all the directories and set those we need to be included - that is, set the configuration like this:
excludePath = /*/**
includePath = /source/**
includePath = /code/**
includePath = *.cpp
includePath = *.cs
Regretfully, it won’t work as we might intend. Such configuration will exclude all the directories and won’t include any: excludePath
has a priority over includePath
when their areas intersect. And thus we need to exclude all the unneeded directories excplicitly:
excludePath = /bin/**
excludePath = /lib/**
excludePath = /framework/**
includePath = /source/**
includePath = /code/**
includePath = *.cpp
includePath = *.cs
Note, that we may omit those directories that don’t contain files of included types - they will become empty and excluded automatically by Git - thus we have to explicitly mention only those directories that contain included files and, at the same time, that we want to exclude.
Being used together excludePath
and includePath
provide extremely flexible capabilities to configure what to be imported and what to be left in SVN. The only nuance is that excludePath
has a priority and it has to be accounted during configuration: sometimes it causes explicit directives to be set for many SVN entities.