Pathologies of Go package management

EDIT: I’ve gotten quite a few responses asking about Go modules and how they fit into this picture. I’ve played around with them experimentally, and my initial impression is that they’re certainly a step in the right direction. I have some worries on the specification of go.mod files, but we’ll have to see how that plays out.

TL;DR:

Go package idioms make reproducible builds really easy. (Jump to section)
Go package idioms make dependency analysis really hard. (Jump to section)
Go builds have a unique failure mode: they can be reproducible without having semantic dependency information fully specified. (Jump to section)

Go is one of my favorite languages. It gets a lot of things right: great tooling, a pragmatic language design, and a sane module system.

I’ve found that newcomers to Go tend to get really confused by the module system. It’s both less flexible and less rigid than other popular languages:

Newcomers from languages that have canonical package managers (e.g. Node.JS, Ruby, Rust) are often confused by the lack of a canonical package management tool and centralized package registry.
Newcomers from languages whose module systems are Wild West free-for-alls (e.g. C, Java, sort of PHP) are often frustrated by the rigid import path structure, lack of parent packages, and restrictions on cyclical imports.

Fortunately, the module system (Go calls these “packages”, and I’ll use this terminology for the rest of the post) and the idioms that have evolved around it are extremely simple.

How Go packages work

Any folder that contains a Go source file is a Go package. Its name is the path of the folder relative to $GOPATH/src. For example, a folder containing Go source files located at $GOPATH/src/github.com/alice/foo is named github.com/alice/foo.

GOPATH is generally a single folder (but can be a list of folders). When a user compiles a Go package that imports a package P, the compiler attempts to resolve P by checking F/src/P for each folder F in GOPATH and then GOROOT.

This has a couple properties:

Wow, wasn’t that simple? This algorithm is extremely easy to explain and understand.
You cannot have two different packages with the same name. In particular, this implies that you cannot have two different versions of the same package.
Every package has only a single version, so you can think of the entire Go workspace as having a single “version”. Version information for each individual package is not reliably stored (at best, you have a revision hash or named reference from the VCS repository of the package).

When you’re working on multiple projects that may depend on different versions of the same package, Property 2 turns out to be pretty annoying.

In Go 1.5, the Go team added support for vendoring to resolve this. Now, when a source file S imports a package P, the resolution algorithm is:

Does there exist a folder named vendor in any ancestor folder A of S? If so, use A/vendor/P if P is in the folder, otherwise keep going upwards. If you’re at the root of the filesystem and haven’t found P, then go to step 2.
Do the old thing (look up in GOPATH and then GOROOT).

Again, this resolution algorithm is both simple and robust. It’s easy to create tools that support this workflow, and easy to understand how the compiler is resolving a package. It’s also idiomatic to commit the vendor folder into version control, which makes builds extremely robust:

Developers without the original tools for managing dependencies can still produce a working build.
There’s no possibility for dependency versions or sources to be different among different builds, since source files committed to version control.
There are no centralized package registries that must be available during the build, so outages don’t break builds (looking at you, NPM).

Before vendoring, different tools would use different kinds of hacks (messing with your GOPATH or GOROOT, rewriting import paths, etc.) to provide project-level isolation of dependencies. With vendoring, there is One Obvious Way.

Easy builds does not guarantee easy dependency analysis

Unfortunately, vendoring support did not come with versioning support. Doubly unfortunately, my day job involves understanding what versions of a dependency went into your build via the FOSSA CLI. Allow me to give you a peek into the rabbit hole that is Go dependency analysis.

Using a sane dependency management tool (ideally `dep`)

A good dependency manager (like dep) will do a couple of things:

It solves version constraints of the entire transitive graph, correctly reporting when builds are not possible due to the diamond problem.
It recursively flattens the vendor folders of dependencies.
It provides easily parsed command output or (even better) an easily parsed lockfile.

This is the best case scenario. Analysis of these projects works basically the way you’d imagine:

Ask go list -json -f '{{ .Deps }}' for the transitive package graph of a target package.
For each package, look up the version of the package used by asking the build tool or reading its lockfile. This is easy: each package has exactly one version, and they’re all known by the build tool.

Using a tool that allows nested vendor folders (please do not)

Nested vendor folders open a big can of worms:

You may bring up multiple copies or versions of a single package due to the diamond problem. If any of these packages expects to be a singleton, it will likely break in ways that are difficult to debug.
Bringing in multiple instances of a package causes compatibility problems for consuming packages.

Dependencies with nested vendor folders may also use a different tool to vendor their dependencies. This means that several tools may all be specifying different versions of a package.

For example, a project may have a direct dependency foo using tool A to specify transitive dependency bar with version v1. Elsewhere in its transitive dependency graph, it may have a direct dependency on bar using tool B to specify version v2.

To analyze these cases, we have to take the location of an importing file into account for projects with nested vendor folders. We have several options for this:

--option allow-nested-vendor:true enables resolution logic for using lockfiles in nested vendor folders.
--option allow-deep-vendor:true enables resolution using lockfiles that above root of the nested vendor folder. This supports projects where packages in nested vendor folders may have their versions specified in the top-level lockfile.

Code is written by humans and intent is impossible to infer

Sometimes, version information is intentionally missing for imported packages, because the author of the code doesn’t consider the imported package to be external to the project.

When we analyze a Go package, the target package is usually provided as a Go import path (e.g. github.com/alice/foo/cmd/foobar). From this import path, we try to infer the project of the package by finding the root of the VCS repository that the package is contained in (e.g. github.com/alice/foo). For any imports that are within the project, we don’t try to look up version information because these imports are generally considered internal (e.g. if github.com/foo/cmd/foobar tried to import github.com/foo/lib/quux, we would not expect the version of github.com/foo/lib/quux to be specified by the build tool since quux is part of the same project and would be versioned with foobar).

For some projects, this assumption is not true. For example, consider the case where Alice is creating two projects A and B that she intends to be released together. These projects are stored in separate VCS repositories due to external factors (e.g. maybe one project is open source and the other isn’t), but her intent is for them to be a single unit.

In this case, knowing the project of a package within A is not enough to know that the intent is for the project to be versioned with B. This human intent is impossible to infer automatically.

Our analyzer provides several options for handling this:

--option allow-unresolved-prefix:IMPORT_PATH informs the analyzer to not look up the version for packages with a certain import path prefix. If this flag is not specified, a missing version during an analysis is considered an error due to an underspecified build.
--option allow-external-vendor:true informs the analyzer to look up dependency versions in lockfiles of other projects. This is useful for looking up the versions of dependencies of projects that are versioned together

The same code can be built in different ways

Go supports build constraints, which can include or exclude files based on the target OS or architecture. This can alter large portions of a project’s transitive graph, since package imports occur on a per-file basis.

Also, go list exits non-zero when a package’s source files are all excluded by a build constraint. This was a fun surprise to learn about and write special-case error handling code for.

Support for multiple build constraints is still in progress.

Go projects can have C dependencies

There is basically no good, general way to identify C dependencies. At this point, it becomes a search and indexing problem rather than a lookup problem:

How many different signals can we aggregate to identify a C dependency?
Can we identify packages from a system-level package manager?
Can we identify C source files from names or hashes?
Can we inspect the compiler or linker’s runtime behavior?

For special cases, there are ways to handle this. If a project is always built in a special Docker container with its C dependencies, it may be possible to read dependencies from a system-level package manager. If there is a set of vendored C dependencies, it may be possible to identify them from their sources. Support for this is in progress as well.

In general, this is still an open question.

A reproducible build is not necessarily a fully specified build.

All of the above cases are examples of when detecting the version of a dependency gets very tricky. Analyzing these projects is hard, but doable.

In some cases, analysis is just plain impossible. Some projects are buildable because their sources are fully committed into version control, but don’t include any kind of lockfile or version data.

These builds are reproducible but not fully specified. This is a subtle failure mode unique to Go.

Our analyzer supports these cases too with --option allow-unresolved:true, which treats version resolution errors as warnings instead.

The only way to get dependency information for revisions in the absence of lockfiles is to index Go packages by hash, and try to look up known hashes of known versions. go-resolve is a side project of mine trying to address exactly this.

Fin.

This is but a small portion of the madness that consumes the weekdays of my life. Despite these issues, Go is probably still one of my favorite ecosystems to work with. (It could be a lot worse: Ruby’s package manifest is literally a Ruby file that you eval.)

Have I mentioned that FOSSA is hiring? We’re building infrastructure to make open source more accessible to everyone. If doing impactful work, tackling deeply technical challenges, and working with a great team to build a sustainable business sounds appealing to you, please let us know or contact me directly.