First and foremost: Please note that depz is an experimental tool! It was hacked together pretty quickly and it shows, but with some luck it will evolve into something nice. For now: caveat emptor.
depz is a tool for managing groups of source code repositories. The intended use cases are similar to those addressed by git subtree and submodules -- you've got a project that incorporates code from other projects (libraries and so on). depz is different in that it tries to facilitate some flexible workflows, where different users may be working with their own forks of the sub-repositories, you may be simultaneously working with your own (or an organizational) fork as well as trying pulling in changes from upstream, etc.
Here's a quick feature list:
- Flexible config files let you define and group repositories
- You can have "local" config files which override settings (e.g., to add an additional remote for your own fork)
- Can clone git, Mercurial, and Bazaar repositories
- Can initialize a repository from a zip/tgz archive somewhere, including creating a stable first commit for it so that if you and others all do it and then make commits, you have shared history
- Can automatically generate and assist you with repository-specific (and optionally read-only) GitHub deploy keys, which are a nice option when you are working with private repositories on an untrusted machine where you don't want to put your credentials
- Can check to see which local repositories are ahead/behind remote repositories
- Fast-forward merge subrepositories with a single command
- Can monitor other local branches (so that you can fast-forward merge branches which were pushed from elsewhere)
A project will generally have one or more .depz files which describe the sub-repositories and sketch out how they should be worked with. This file will be stored in the project's main repository. After cloning that repo, the user can run depz to fetch the sub-repositories, and can subsequently run depz to check if they have become outdated, merge changes (in simple cases), and so on. It is not meant to completely automate arbitrarily complex workflows. For example, if you have local changes on some feature branch and the upstream project updates, you should expect to have to get in there and use git as usual. However, if you just get in the habit of running depz occasionally, you'll at least notice when sub-repositories go out of date without being forced to update them on someone else's timescale.
A common pattern will be for the main project .depz file to also "include" a local configuration file which is not under source control. This allows an individual developer to override project settings with their own (e.g., to stop add their own remote, stop a repository from updating because they are actively developing it, etc.).
Ideally, depz would work with many different source control systems. For now, it works with git. However, it can also be made to work with Mercurial and Bazaar repositories (see later sections in this document).
depz requires Python 3.5 and git 2.7, neither of which are particularly new anymore. If you want to use Mercurial repositories, you will also want git-remote-hg. If you want to work with Bazaar repositories, you will want git-remote-bzr. On Ubuntu, 16.04 and 19.04, you can install these with:
sudo apt install git-remote-hg git-remote-bzr
For some versions of Ubuntu (including 18.04, sadly), git-remote-hg is not packaged. In these cases, you can install it pretty easily by hand. The Debian repository is probably a good place to get it. Even easier, use pip:
sudo pip install git-remote-hg
Colored logs can be beneficial when viewing depz's output. If you'd like to turn them on, you'll need to tweak your configuration (see the next section), but first you'll need to install the coloredlogs package. For Ubuntu 18.04 and later, you can do this with:
sudo apt install python3-coloredlogs
Otherwise, you can install using pip, e.g. with:
pip3 install --user coloredlogs
You can install depz using the included setup.py. To install it from git for just your user account, you can do:
pip3 install --user git+https://github.com/MurphyMc/depz.git
You can also clone and install it separately like so:
git clone https://github.com/MurphyMc/depz
pip3 install --user ./depz
If all goes well, after doing the above, you'll have the depz command
available to you. You can check with depz --help
.
Notably, if you have a project that utilizes depz to manage its dependencies,
you should now be able to cd
into its directory, execute depz --init
, and
be on your way!
depz is invoked from the commandline. If you've installed it with pip, it is
hopefully in your path and just called depz
. If you just grabbed the .py
file, then you may have to invoke it as depz.py
.
When executed, depz does basically three things:
- Load one or more
.depz
configuration files that hopefully contain information about repositories. - Filter the list of repositories.
- Possibly execute one or more commands, which generally apply some operation to each repository.
The exact details of these three phases can be controlled with various
commandline options. The following subsection examines the first two, and the
subsection after that examines the various commands. You can get more brief
help by using the --help
commandline option.
By default, depz will load all .depz
configuration files in the current
directory, and will filter out any repository not in the "default" group.
You can adjust both of these behaviors.
Firstly, you can use the --depz
option to specify a particular .depz
file to load. You can also point it at a directory to load all the .depz
files in that directory. This option can be given more than once to select
multiple files or directories.
Secondly, you can switch which groups of repositories to act on by using the
--group
option (see the section on configuring repositories for more info
on groups in general). In the simplest case, you can simply do something like
--group=libraries
to select all repositories in the "libraries" group.
However, you can also specify multiple groups, and any repository in any of
those will be included. For example: --group=libraries,examples
. The list
of groups can contain glob-like wildcards, so if you were in the unfortunate
situation of having a group called "libs" and a group called "libraries", you
could do --group=lib*
. Lastly, you can subtract groups. So you could
do something like --group=*,-libraries
to select everything except
libraries.
You can also specify specific repositories by name using the --name
option.
While it was claimed above that the default group filter is "default", this
changes when --name
is specified so that the default group filter is "*"
instead (otherwise, you might specify a repository by name, but if it wasn't
in the "default" group, it wouldn't be matched). If you want to specify a
name and only want it to match the default group, you'll have to explicitly
state --group=default
.
As with groups, you can actually specify a list of names, and you can use
wildcards, so you could do something like --name=ro*
to match both
"rogue" and "rogo".
The whole point of depz is to execute various operations for various repositories mentioned in its configuration files. In the following subsections, we look at some of these commands.
The --dump
command is mostly meant for debugging. It lists the repositories
and their settings in a format similar to that used in the config files. It
can optionally be given an argument, which may be early
(the default),
late
, or all
. In the last case, it lists every repo it can find in the
config files. early
shows repos after filtering (e.g., via the --name
and
--group
options) -- the ones that should actually be acted upon by the
current invocation. --late
waits until they have been sanity checked before
listing them; for example, this means ones with no directory will not be
listed.
This command is used to actually create the local repositories described by the config files. That is, it will create the local directories, attempt to fetch the initial code, and set local repo to point at the desired initial branch (the "checkout" in depz terminology).
It is also at least supposed to be safe to run the --init
command at any
later time and can act as a bit of a sanity check of existing repositories
in that case.
This option fetches updated code from remotes (e.g., a "git fetch" for git repositories).
This option attempts to help you find repositories where the local code is now
out of date. For example, code where the remote tracking branch or a
monitored local branch (see the information on the monitor_branch
repo
option in the Configuring Repositories section for more) is ahead of the local
branch.
This option attempts to rectify things when your branch has become outdated by fast-forwarding it. Obviously this won't always be a suitable or workable option, but it should be for the easy cases.
When depz starts up, it tries to read a file called .depzconfig from your home directory. You can create this file with a text editor. At present, the most useful thing you can probably do with it is to customize the logging. Here's a sample .depzconfig:
[depz]
log_level=debug
color_log=true
The log levels are pretty self-explanatory and are the same as used by
the --log-level
commandline parameter (see depz --help
).
To learn more about depz configuration files, see the following sections.
Perhaps the key component of depz are its configuration files. If you are simply using depz to grab dependencies for a project, it's possible that you never really have to see or care about them. On the other hand, if you want to customize how you work with repositories or if you are the author (or maintainer) of a project that is going to manage sub-repositories with depz, understanding how the configuration files work may be crucial. The following subsections describe their workings in some detail. If you're a get-your-hands-dirty-first type, you might also skip ahead to the section on Configuring Repositories and refer back to this one to understand what is going on.
depz configuration files all use the same format and are all technically
equivalent. For organizational reasons, one might break them down into
files using the .depzconfig
extension for general configuration and files
using the .depz
extension for files which actually describe
sub-repositories. By default, depz looks for any such files in the current
working directory, though this can be altered by giving the paths to one or
more configuration file or directory containing .depz
or .depzconfig
files via the --repo
commandline option.
The files are text files in a fairly typical "INI"-type format that consists of sections that begin with a section header enclosed by brackets, and key=value pairs within those sections. You saw a simple example of such a file in the "Customizing Your Installation" subsection above.
Going a bit beyond typical INI files, there are a number of special features. We look at these in the rest of this section.
One config file can include other config files. This is done by using what
looks like a section header, except it is composed of two parts: the first is
the word "INCLUDE" and the second is a path to another configuration file.
An alternate form is the same except it uses the word "TRY-INCLUDE". The
difference is that if the specified file is unavailable, it causes an error
with the first form, and is happily ignored with the second form. As an
example, consider the following config file which requires that the file
mysubproject/subprojectsubprojects.depz
is available, and also tries to
include optional user settings from local.depzconfig
:
[INCLUDE mysubproject/subprojectsubprojects.depz]
[TRY-INCLUDE local.depzconfig]
It is safe for inclusions to create cycles -- after the first time a file is included, it will be ignored for subsequent inclusions.
When you have multiple files (either due to explicit inclusion using the
commandline or via INCLUDE
and TRY-INCLUDE
directives), a section may
appear more than once (i.e., in more than one file). When keys do not
overlap, this can simply be thought of as merging all the sections with
the same name into one big section. When the same key appears in sections
with the same name in different files, things get a little more complex.
Consider the following four configuration files.
A.depz:
[INCLUDE B.depz]
[INCLUDE C.depz]
[mysection]
myvalue=A's value
B.depz:
[INCLUDE D.depz]
[mysection]
myvalue=B's value
C.depz:
[mysection]
myvalue=C's value
D.depz:
[mysection]
myvalue=D's value
Assuming you load A.depz
, what is the effective value of
[mysection].myvalue
? It's "A's value". One can consider A.depz
to be the
root of a tree of inclusions, and the effective value is found by doing a
breadth-first traversal of that tree. If myvalue
did not exist in
A.depz
, the effective value would be from B (as B is the "leftmost" child
of A; e.g., the first one listed). If it did not exist in B, it would be C
(as B and C are siblings). If it did not exist in C, it would come from D
(as D is the deepest child). If multiple configuration files are included on
the commandline, they are all siblings (in left to right order).
This order of precedence makes sense in general: if you include a project's
configuration file and it in turn includes a subproject, the "topmost"
project should generally have the ultimate say about configuration options.
However, there are cases where this may not be sufficient. For example, a
project configuration file may with to TRY-INCLUDE a user's own config file
so that the user can customize their configuration. Under normal rules,
this would mean that the project could not provide a default value, as
it would always have higher precedence. Two special types of sections let
you address this type of situation: DEFAULT
sections and OVERRIDE
sections.
Imagine that we altered the A.depz
shown above so that instead of having
a [mysection]
section, it had a [DEFAULT mysection]
section (still
containing the same myvalue
key). This would change the effective value
of [mysection].myvalue
to be "B's value". Only if all of the other
configuration files (or at least their myvalue
keys) were removed would
the value be "A's value", as values in DEFAULT
sections have lower
precedence than ones in non-default values. However, note that the precedence
of DEFAULT
sections is the same as any section; if B.depz
also had
a [DEFAULT mysection]
section, it would have even lower precedence than
the [DEFAULT mysection]
in A.depz
.
OVERRIDE
sections are the conceptual opposite. Values in OVERRIDE
sections take precedence over values in normal (or DEFAULT
!) sections.
Moreover, the precedence of override sections themselves is reversed. That
is to say, that if both A.depz
and D.depz
have an [OVERRIDE mysection]
section, it is the one in D.depz
which takes ultimate precedence.
Along with all of the default and overriding mentioned in the previous section, configuration files support parent and child sections. Let's consider the following example:
[MyParent]
key1=key1 from parent
key2=key2 from parent
[MyParent MyChild1]
key1=key1 from child1
key2=key2 from child1
[MyParent MyChild2]
key2=key2 from child2
key3=key3 from child2
"MyParent" is, in fact, just a normal section, and it is subject to the same rules as usual concerning defaults and overrides. Because its name is used as a prefix for "MyChild1" (and "MyChild2", for that matter) it serves as its parent section. This means that "MyChild1" inherits values from "MyParent". If there is overlap between their keys, the child keys take precedence. Thus, the effective values for "MyChild1" are as follows:
Key | Value |
---|---|
key1 | key1 from child1 |
key2 | key2 from child1 |
And the effective values for "MyChild2" are as follows:
Key | Value |
---|---|
key1 | key1 from parent |
key2 | key2 from child2 |
key3 | key3 from child2 |
Values can incorporate other values. Consider the following config file:
[GLOBALS]
timeofday=morning
[favorites]
food=spam
weather=sunny
[phrases]
greeting=Good ${timeofday}
introduction=${greeting}! Lovely day to eat some ${favorites:food}!
Besides noting the basic value substitution syntax, there are a couple other
quick points to be made. First, a value can include a value that includes
another value (as introduction
includes greeting
which includes
timeofday
). Second, you can always refer to other keys within the same
section or keys within the special GLOBALS
section by just using their
name. Third, you can refer to keys in arbitrary other sections by prefixing
the key with the section name and separating the two with a colon.
Besides all of the values which actually exist within the configuration files being processed, there are a number of special values that you can refer to.
Name | Meaning |
---|---|
_FULL_NAME_ | The full name of the section in which the key appears |
_NAME_ | The name of the key's section (not including a prefix) |
_KEY_ | The name of the key |
_FILE_ | The filename of the file in which the key was defined |
_DIR_ | Just the directory part of the above |
_SECTION_FILE_ | Filename of the first file defining the key's section |
_SECTION_DIR_ | Just the directory part of the above |
To elaborate a bit further: _FULL_NAME_ is, indeed, the full name of the
key's section. This is often going to be the same as it's _NAME_, but if
the section has a space in it, e.g., REPO foo
, then _NAME_ is simply
foo
, where _FULL_NAME_ is REPO foo
. In other words, if you have
parent/child sections as described in the relevant section above, the name
gives the child-specific portion of the name, and the full name includes the
name of the parent as well.
_FILE_ and _DIR_ refer to the file where the effective value actually
comes from. This may not be the same for each key in a section, as different
keys may be specified in different files within the same section (or via the
corresponding OVERRIDE
and DEFAULT
sections). _SECTION_FILE_ and
_SECTION_DIR_ work differently. They give the filename (or directory name)
of the first file where the section was defined according to the breadth-first
starting-from-root algorithm described in the section above about section
ordering.
That is, consider the following two configuration files.
A.depz:
[INCLUDE B.depz]
[mysection]
a_value=Value comes from ${_FILE_}
another_a_value=Value comes from ${_SECTION_FILE_}
B.depz
[mysection]
b_value=Value comes from ${_FILE_}
another_b_value=Value comes from ${_SECTION_FILE_}
In this case, [mysection].a_value
would be "Value comes from A.depz",
and [mysection].b_value
would be "Value comes from B.depz". On the
other hand, both of the "another" values would refer to A.depz
, since
[mysection]
is first defined in A.depz
.
To have depz manage a repository, you create a section for the repository in a configuration file. The section should be a child section of the parent "REPO". Here are the keys in REPO sections that depz actually looks at:
Key | Meaning |
---|---|
full_directory | The local directory in which the repository will live |
checkout | The branch or tag to be checked out initially |
checkout_remote | List of git remotes from which to try getting the repo |
fetch_full | Set to False to only fetch the checkout branch |
group | Allows for grouping repos together |
monitor_branch | Local branches to monitor for changes |
update_skip | When set, don't include this repo when doing --update |
fast_forward_skip | When set, don't include when doing --fast-forward |
skip | When set, don't include this repo ever |
no_tags | Don't download normal tags |
type | Repository type (currently only git) |
remote | Configures a git remote for the repo |
init_archive etc. | An archive to use to initialize the repository |
Some notes:
checkout
is the branch to check out initially. This is often "master".checkout_remote
is a list of names o remote to use when first trying to get the code. In simple cases, this might be "origin". The remote itself should be configured with a corresponding key, such as "remote origin". depz will try each in turn until it succeeds; this way you can easily set multiple remotes, not all of which will necessarily work immediately (e.g., you may want to clone from your own personal fork if it exists, but fall back to an upstream repository otherwise).fetch_full
is True by default and causes depz to fetch all branches. If you override it to False, depz will only fetch the checkout branch.group
is a list of group names. This lets you group repos. When depz is executed, it defaults to the group "default" and ignores any repo which does not list "default" in its groups. This can be overridden with the--group
commandline option. This allows you to, for example, have a "libraries" group or a "mycompany" group and let you easily act on them independently.monitor_branch
specifies a list of local branches which depz can monitor for changes (and potentially fast-forward) similar to how it monitors remotes. The intended use case is if your workflow involves pushing into the local repository and not always just pulling from remotes.update_skip
andfast_forward
skip make it so that depz will ignore this repository when executing--update
and--fast-forward
.no_tags
turns off tag downloading. Note that depz always downloads remote git tags into a special set of remote_tags refs. This option refers to the normal operation of tags.type
is currently always git. Even for Mercurial and Bazaar repositories (see the sections on Mercurial and Bazaar later).- If a key is two words and the first is
remote
, this defines a git remote. The second word of the key is the remote name, and the value should be the remote's URL. init_archive
(and its related keys) are described in the next subsection.
To make things a bit easier to use and more flexible, depz sets up some defaults and other keys which are inherited by all repos. This is done essentially by including the following "magical" section:
[DEFAULT REPO]
type=git
name=${_NAME_}
remote_name=${name}
remote origin=${prefix}/${remote_name}
checkout=master
checkout_remote=mine,origin,upstream
no_tags=False
directory=${name}
base_directory=${_SECTION_DIR_}
full_directory=${base_directory}/${directory}
group=default
fast_forward_skip=False
update_skip=False
monitor_branch=
While you can certainly just set the actual keys listed in the table above,
the defaults can be quite useful. For example, the full_directory
key is
built automatically from, essentially, the directory of your config file and
the name of the repository as defined by its section header name. Thus, if
your section is [REPO pox]
, and you want the directory to be called pox
and in the same directory as your .depz
config file... you don't have to
do anything!
Similarly, it is very common that your local repository name should match
the remote repository name, and that is often (and always, on github) the
last part of the remote URL. The default setting of remote origin
makes
this simple, as it is composed of prefix
and remote_name
. The latter
defaults to the local name (which defaults to the section name). So all
you need to do is set the former.
Thus, a config file for the Rogue programming language's compiler, for example, might look like this:
[REPO rogue]
prefix=https://github.com/AbePralle
That's enough to get started!
To see how this works, it may be useful to manually "merge in" some of the
settings which are included automatically from the [DEFAULT REPO]
section
shown above (note that you don't have to actually do this; it is shown here
only for illustration):
[REPO rogue]
name=${_NAME_}
remote_name=${name}
remote origin=${prefix}/${remote_name}
checkout=master
If we wanted to add another remote, it's a simple matter of adding an
appropriate remote
key:
remote murphymc=https://github.com/MurphyMc/rogue
Or perhaps:
remote murphymc=https://github.com/MurphyMc/${remote_name}
GitHub has a feature called Deploy Keys which let you set ssh keys that are specific to an individual repository. These keys can be set to be read only or read/write. This is a nice way to have fine grained access to private repositories from not-especially-trusted machines without having to store real credentials on the remote machine or use wildly unsafe ssh agent forwarding.
Deploy keys are specific to a given remote (thus, you can have different
deploy keys for different remotes, i.e., one for your organization's
fork and one for your own fork). You enable them by setting the
deploy_key <remote-name>
key in your repo config to true
. Thus, you
might have a config file like:
[REPO pox]
remote upstream=git@github.com:noxrepo/pox
deploy_key upstream=true
prefix=https://github.com/YourNameHere
Note that in this case, it's the upstream
remote that is using a deploy key.
If you wanted to do it with the remote set up by the prefix
key, you'd use
deploy_key origin=true
. See the previous section if you need a reminder
why this is the case.
Also note that the remote must be using an ssh URL because deploy keys are ssh keys! An http/https URL won't work.
Once you have this set, running --init
will now generate a new key pair
for the remote called depz_deploy_key.upstream.private
and
depz_deploy_key.upstream.pub
(where upstream
will be whatever the
remote is named). These are stored in the repository base directory. If
run in interactive mode, before trying to fetch the remote for the first
time, depz will display the public key and pause until you press enter.
At this point, you can do one of two things:
- Copy and paste the public key into the repository's configuration on GitHub. The URL to the appropriate configuration page is hopefully shown.
- Replace the new key pair with a key pair of your own. If you've already got one you want to use, this makes sense, but see the next subsection.
Note that depz adds files matching the pattern depz_deploy_key.*
to the
repository's exclude
file in order to help you not commit these to the
repo, because you really probably don't want to do that.
From now on, depz will try to use these key files when communicating with the specified remote.
The previous example showed how to get depz to generate a new key pair
for you by setting the deploy_key <remote-name>
entry to true
. You
can also set this entry to the base filename of an existing key pair.
If you do this, during --init
, depz will take that base filename,
append .pub
and .private
to it, and copy those files into the
repository to the appropriate depz_deploy_key.*
files.
While depz knows to use the deploy keys, git itself doesn't if you just run git commands by hand. You can refer to the Internet to see how to configure git to use an arbitrary key file -- there are several ways, both via messing with ssh's config or via messing with git's config.
One way to do it -- which only really makes sense if you have a single
ssh remote -- is to configure the repository to always try to use the deploy
key when doing ssh. This is basically a matter of setting the sshCommand
git configuration property to a commandline for ssh that includes the -i
option to specify a key. To save you the effort, the depz --set-ssh-cmd
option does exactly this. After this, git commands in this repository
which use ssh remotes should use the deploy key. However, note that
the sshCommand
configuration key is only available in git 2.10 and
above. If you're using an older version of git, you'll have to work
out another solution.
Sometimes you want a project to use code that's available in an archive file (like a .zip or .tgz) somewhere, but may not be available already in a code repository that you have access to. You could just download it into your main project repository, though that's not particularly elegant. You could download it, create a new github project, add all the files, and push it to github, but this seems a bit over the top especially if you're not planning to actually make any changes to it.
To deal with this situation, you can give depz the URL to an archive and it will download it and initialize the repository to its contents. If you've got remotes set, you could then push it to a new repository somewhere else (e.g., github), or you can just keep it locally. Since it actually creates a new repository containing the initial files, if you do end up changing anything, your changes are already tracked from the start.
To enable this functionality, you must at least set the init_archive
key to
the URL of the archive file. Exactly what types of URLs and what types of
archives will work depends on your version of Python, but it's really likely
that http
, https
, zip
, and tgz
/tar.gz
will work.
In addition, you can set the init_archive_sha1
, init_archive_sha256
, or
init_archive_sha512
keys to the correct hash of the archive, and depz
will check that they are correct before expanding the archive to help
ensure that you get the files you intended. This is a very good idea and
you should do it.
Currently, depz only really works with git. However, Mercurial repositories can be made to work fairly transparently within git via git-remote-hg. This seems to be enough to make them work well within depz, though it has not been tested extensively. On Ubuntu (and probably Debian), you can install git-remote-hg with:
sudo apt install git-remote-hg
When specifying a repository in a config file, simply preface the remote URL with "hg::", like:
[REPO SDL_mixer]
checkout=release-1.2.12
prefix=hg::https://hg.libsdl.org
Again, depz currently only works directly with git. However, similar to how git-remote-hg provides transparent access to Mercurial repositories from within git, git-remote-bzr provides similar functionality for Bazaar. Thus, there is hope that one could use Bazaar repositories with depz by installing git-remote-bzr, and configuring remote URLs with the "bzr::" prefix. This has been tested even less than Mercurial, but at least the following example seems to work.
Install git-remote-bzr:
sudo apt install git-remote-bzr
An example config for the GRUB bootloader:
[REPO grub]
checkout=people+gsutre+fixes
prefix=bzr::bzr://bzr.savannah.gnu.org/grub