EDIT: I’ve gotten quite a few responses asking about Go modules and how
they fit into this picture. I’ve played around with them experimentally, and my
initial impression is that they’re certainly a step in the right direction. I
have some worries on the specification of go.mod
files, but we’ll have to see
how that plays out.
TL;DR:
- Go package idioms make reproducible builds really easy. (Jump to section)
- Go package idioms make dependency analysis really hard. (Jump to section)
- Go builds have a unique failure mode: they can be reproducible without having semantic dependency information fully specified. (Jump to section)
Go is one of my favorite languages. It gets a lot of things right: great tooling, a pragmatic language design, and a sane module system.
I’ve found that newcomers to Go tend to get really confused by the module system. It’s both less flexible and less rigid than other popular languages:
- Newcomers from languages that have canonical package managers (e.g. Node.JS, Ruby, Rust) are often confused by the lack of a canonical package management tool and centralized package registry.
- Newcomers from languages whose module systems are Wild West free-for-alls (e.g. C, Java, sort of PHP) are often frustrated by the rigid import path structure, lack of parent packages, and restrictions on cyclical imports.
Fortunately, the module system (Go calls these “packages”, and I’ll use this terminology for the rest of the post) and the idioms that have evolved around it are extremely simple.
How Go packages work
Any folder that contains a Go source file is a Go package. Its name is the path
of the folder relative to $GOPATH/src
. For example, a folder containing Go
source files located at $GOPATH/src/github.com/alice/foo
is named
github.com/alice/foo
.
GOPATH
is generally a single folder (but can be a list of folders). When a
user compiles a Go package that imports a package P
, the compiler attempts to
resolve P
by checking F/src/P
for each folder F
in GOPATH
and then
GOROOT
.
This has a couple properties:
- Wow, wasn’t that simple? This algorithm is extremely easy to explain and understand.
- You cannot have two different packages with the same name. In particular, this implies that you cannot have two different versions of the same package.
- Every package has only a single version, so you can think of the entire Go workspace as having a single “version”. Version information for each individual package is not reliably stored (at best, you have a revision hash or named reference from the VCS repository of the package).
When you’re working on multiple projects that may depend on different versions of the same package, Property 2 turns out to be pretty annoying.
In Go 1.5, the Go team added support for vendoring to resolve this. Now, when
a source file S
imports a package P
, the resolution algorithm is:
- Does there exist a folder named
vendor
in any ancestor folderA
ofS
? If so, useA/vendor/P
ifP
is in the folder, otherwise keep going upwards. If you’re at the root of the filesystem and haven’t foundP
, then go to step 2. - Do the old thing (look up in
GOPATH
and thenGOROOT
).
Again, this resolution algorithm is both simple and robust. It’s easy to create
tools that support this workflow, and easy to understand how the compiler is
resolving a package. It’s also idiomatic to commit the vendor
folder into
version control, which makes builds extremely robust:
- Developers without the original tools for managing dependencies can still produce a working build.
- There’s no possibility for dependency versions or sources to be different among different builds, since source files committed to version control.
- There are no centralized package registries that must be available during the build, so outages don’t break builds (looking at you, NPM).
Before vendoring, different tools would use different kinds of hacks (messing
with your GOPATH
or GOROOT
, rewriting import paths, etc.) to provide
project-level isolation of dependencies. With vendoring, there is One Obvious
Way.
Easy builds does not guarantee easy dependency analysis
Unfortunately, vendoring support did not come with versioning support. Doubly unfortunately, my day job involves understanding what versions of a dependency went into your build via the FOSSA CLI. Allow me to give you a peek into the rabbit hole that is Go dependency analysis.
Using a sane dependency management tool (ideally dep
)
A good dependency manager (like dep
) will do a couple of things:
- It solves version constraints of the entire transitive graph, correctly reporting when builds are not possible due to the diamond problem.
- It recursively flattens the
vendor
folders of dependencies. - It provides easily parsed command output or (even better) an easily parsed lockfile.
This is the best case scenario. Analysis of these projects works basically the way you’d imagine:
- Ask
go list -json -f '{{ .Deps }}'
for the transitive package graph of a target package. - For each package, look up the version of the package used by asking the build tool or reading its lockfile. This is easy: each package has exactly one version, and they’re all known by the build tool.
Using a tool that allows nested vendor folders (please do not)
Nested vendor folders open a big can of worms:
- You may bring up multiple copies or versions of a single package due to the diamond problem. If any of these packages expects to be a singleton, it will likely break in ways that are difficult to debug.
- Bringing in multiple instances of a package causes compatibility problems for consuming packages.
Dependencies with nested vendor folders may also use a different tool to vendor their dependencies. This means that several tools may all be specifying different versions of a package.
For example, a project may have a direct dependency foo
using tool A
to
specify transitive dependency bar
with version v1
. Elsewhere in its
transitive dependency graph, it may have a direct dependency on bar
using tool
B
to specify version v2
.
To analyze these cases, we have to take the location of an importing file into account for projects with nested vendor folders. We have several options for this:
--option allow-nested-vendor:true
enables resolution logic for using lockfiles in nested vendor folders.--option allow-deep-vendor:true
enables resolution using lockfiles that above root of the nested vendor folder. This supports projects where packages in nested vendor folders may have their versions specified in the top-level lockfile.
Code is written by humans and intent is impossible to infer
Sometimes, version information is intentionally missing for imported packages, because the author of the code doesn’t consider the imported package to be external to the project.
When we analyze a Go package, the target package is usually provided as a Go
import path (e.g. github.com/alice/foo/cmd/foobar
). From this import path, we
try to infer the project of the package by finding the root of the VCS
repository that the package is contained in (e.g. github.com/alice/foo
). For
any imports that are within the project, we don’t try to look up version
information because these imports are generally considered internal (e.g. if
github.com/foo/cmd/foobar
tried to import github.com/foo/lib/quux
, we would
not expect the version of github.com/foo/lib/quux
to be specified by the build
tool since quux
is part of the same project and would be versioned with
foobar
).
For some projects, this assumption is not true. For example, consider the case
where Alice is creating two projects A
and B
that she intends to be released
together. These projects are stored in separate VCS repositories due to external
factors (e.g. maybe one project is open source and the other isn’t), but her
intent is for them to be a single unit.
In this case, knowing the project of a package within A
is not enough to know
that the intent is for the project to be versioned with B
. This human intent
is impossible to infer automatically.
Our analyzer provides several options for handling this:
--option allow-unresolved-prefix:IMPORT_PATH
informs the analyzer to not look up the version for packages with a certain import path prefix. If this flag is not specified, a missing version during an analysis is considered an error due to an underspecified build.--option allow-external-vendor:true
informs the analyzer to look up dependency versions in lockfiles of other projects. This is useful for looking up the versions of dependencies of projects that are versioned together
The same code can be built in different ways
Go supports build constraints, which can include or exclude files based on the target OS or architecture. This can alter large portions of a project’s transitive graph, since package imports occur on a per-file basis.
Also, go list
exits non-zero when a package’s source files are all excluded
by a build constraint. This was a fun surprise to learn about and write
special-case error handling code for.
Support for multiple build constraints is still in progress.
Go projects can have C dependencies
There is basically no good, general way to identify C dependencies. At this point, it becomes a search and indexing problem rather than a lookup problem:
- How many different signals can we aggregate to identify a C dependency?
- Can we identify packages from a system-level package manager?
- Can we identify C source files from names or hashes?
- Can we inspect the compiler or linker’s runtime behavior?
For special cases, there are ways to handle this. If a project is always built in a special Docker container with its C dependencies, it may be possible to read dependencies from a system-level package manager. If there is a set of vendored C dependencies, it may be possible to identify them from their sources. Support for this is in progress as well.
In general, this is still an open question.
A reproducible build is not necessarily a fully specified build.
All of the above cases are examples of when detecting the version of a dependency gets very tricky. Analyzing these projects is hard, but doable.
In some cases, analysis is just plain impossible. Some projects are buildable because their sources are fully committed into version control, but don’t include any kind of lockfile or version data.
These builds are reproducible but not fully specified. This is a subtle failure mode unique to Go.
Our analyzer supports these cases too with
--option allow-unresolved:true
, which treats version
resolution errors as warnings instead.
The only way to get dependency information for revisions in the absence of
lockfiles is to index Go packages by hash, and try to look up known hashes of
known versions. go-resolve
is a side project of mine trying to address
exactly this.
Fin.
This is but a small portion of the madness that consumes the weekdays of my life.
Despite these issues, Go is probably still one of my favorite ecosystems to work
with. (It could be a lot worse: Ruby’s package manifest is literally a Ruby
file that you eval
.)
Have I mentioned that FOSSA is hiring? We’re building infrastructure to make open source more accessible to everyone. If doing impactful work, tackling deeply technical challenges, and working with a great team to build a sustainable business sounds appealing to you, please let us know or contact me directly.