Pathologies of Go package management

2018/10/22

Tags: golang dependencies fossa

EDIT: I’ve gotten quite a few responses asking about Go modules and how they fit into this picture. I’ve played around with them experimentally, and my initial impression is that they’re certainly a step in the right direction. I have some worries on the specification of go.mod files, but we’ll have to see how that plays out.


TL;DR:


Go is one of my favorite languages. It gets a lot of things right: great tooling, a pragmatic language design, and a sane module system.

I’ve found that newcomers to Go tend to get really confused by the module system. It’s both less flexible and less rigid than other popular languages:

Fortunately, the module system (Go calls these “packages”, and I’ll use this terminology for the rest of the post) and the idioms that have evolved around it are extremely simple.

How Go packages work

Any folder that contains a Go source file is a Go package. Its name is the path of the folder relative to $GOPATH/src. For example, a folder containing Go source files located at $GOPATH/src/github.com/alice/foo is named github.com/alice/foo.

GOPATH is generally a single folder (but can be a list of folders). When a user compiles a Go package that imports a package P, the compiler attempts to resolve P by checking F/src/P for each folder F in GOPATH and then GOROOT.

This has a couple properties:

  1. Wow, wasn’t that simple? This algorithm is extremely easy to explain and understand.
  2. You cannot have two different packages with the same name. In particular, this implies that you cannot have two different versions of the same package.
  3. Every package has only a single version, so you can think of the entire Go workspace as having a single “version”. Version information for each individual package is not reliably stored (at best, you have a revision hash or named reference from the VCS repository of the package).

When you’re working on multiple projects that may depend on different versions of the same package, Property 2 turns out to be pretty annoying.

In Go 1.5, the Go team added support for vendoring to resolve this. Now, when a source file S imports a package P, the resolution algorithm is:

  1. Does there exist a folder named vendor in any ancestor folder A of S? If so, use A/vendor/P if P is in the folder, otherwise keep going upwards. If you’re at the root of the filesystem and haven’t found P, then go to step 2.
  2. Do the old thing (look up in GOPATH and then GOROOT).

Again, this resolution algorithm is both simple and robust. It’s easy to create tools that support this workflow, and easy to understand how the compiler is resolving a package. It’s also idiomatic to commit the vendor folder into version control, which makes builds extremely robust:

Before vendoring, different tools would use different kinds of hacks (messing with your GOPATH or GOROOT, rewriting import paths, etc.) to provide project-level isolation of dependencies. With vendoring, there is One Obvious Way.

Easy builds does not guarantee easy dependency analysis

Unfortunately, vendoring support did not come with versioning support. Doubly unfortunately, my day job involves understanding what versions of a dependency went into your build via the FOSSA CLI. Allow me to give you a peek into the rabbit hole that is Go dependency analysis.

Using a sane dependency management tool (ideally dep)

A good dependency manager (like dep) will do a couple of things:

This is the best case scenario. Analysis of these projects works basically the way you’d imagine:

  1. Ask go list -json -f '{{ .Deps }}' for the transitive package graph of a target package.
  2. For each package, look up the version of the package used by asking the build tool or reading its lockfile. This is easy: each package has exactly one version, and they’re all known by the build tool.

Using a tool that allows nested vendor folders (please do not)

Nested vendor folders open a big can of worms:

Dependencies with nested vendor folders may also use a different tool to vendor their dependencies. This means that several tools may all be specifying different versions of a package.

For example, a project may have a direct dependency foo using tool A to specify transitive dependency bar with version v1. Elsewhere in its transitive dependency graph, it may have a direct dependency on bar using tool B to specify version v2.

To analyze these cases, we have to take the location of an importing file into account for projects with nested vendor folders. We have several options for this:

Code is written by humans and intent is impossible to infer

Sometimes, version information is intentionally missing for imported packages, because the author of the code doesn’t consider the imported package to be external to the project.

When we analyze a Go package, the target package is usually provided as a Go import path (e.g. github.com/alice/foo/cmd/foobar). From this import path, we try to infer the project of the package by finding the root of the VCS repository that the package is contained in (e.g. github.com/alice/foo). For any imports that are within the project, we don’t try to look up version information because these imports are generally considered internal (e.g. if github.com/foo/cmd/foobar tried to import github.com/foo/lib/quux, we would not expect the version of github.com/foo/lib/quux to be specified by the build tool since quux is part of the same project and would be versioned with foobar).

For some projects, this assumption is not true. For example, consider the case where Alice is creating two projects A and B that she intends to be released together. These projects are stored in separate VCS repositories due to external factors (e.g. maybe one project is open source and the other isn’t), but her intent is for them to be a single unit.

In this case, knowing the project of a package within A is not enough to know that the intent is for the project to be versioned with B. This human intent is impossible to infer automatically.

Our analyzer provides several options for handling this:

The same code can be built in different ways

Go supports build constraints, which can include or exclude files based on the target OS or architecture. This can alter large portions of a project’s transitive graph, since package imports occur on a per-file basis.

Also, go list exits non-zero when a package’s source files are all excluded by a build constraint. This was a fun surprise to learn about and write special-case error handling code for.

Support for multiple build constraints is still in progress.

Go projects can have C dependencies

There is basically no good, general way to identify C dependencies. At this point, it becomes a search and indexing problem rather than a lookup problem:

For special cases, there are ways to handle this. If a project is always built in a special Docker container with its C dependencies, it may be possible to read dependencies from a system-level package manager. If there is a set of vendored C dependencies, it may be possible to identify them from their sources. Support for this is in progress as well.

In general, this is still an open question.

A reproducible build is not necessarily a fully specified build.

All of the above cases are examples of when detecting the version of a dependency gets very tricky. Analyzing these projects is hard, but doable.

In some cases, analysis is just plain impossible. Some projects are buildable because their sources are fully committed into version control, but don’t include any kind of lockfile or version data.

These builds are reproducible but not fully specified. This is a subtle failure mode unique to Go.

Our analyzer supports these cases too with --option allow-unresolved:true, which treats version resolution errors as warnings instead.

The only way to get dependency information for revisions in the absence of lockfiles is to index Go packages by hash, and try to look up known hashes of known versions. go-resolve is a side project of mine trying to address exactly this.

Fin.

This is but a small portion of the madness that consumes the weekdays of my life. Despite these issues, Go is probably still one of my favorite ecosystems to work with. (It could be a lot worse: Ruby’s package manifest is literally a Ruby file that you eval.)

Have I mentioned that FOSSA is hiring? We’re building infrastructure to make open source more accessible to everyone. If doing impactful work, tackling deeply technical challenges, and working with a great team to build a sustainable business sounds appealing to you, please let us know or contact me directly.