A modular package language for Haskell

What’s the difference between a “package” and a “module”?  Though modules have evolved beyond their low-level origins into richly composable, type-based abstractions, packages are largely stuck in a Linux-like land of version ranges and imprecise specifications.

How can we bring packages up to speed with contemporary module systems?  For my first big project, I’m investigating this question in the context of GHC Haskell and its package management system, Hackage/Cabal.

* * *

Derek Dreyer (my advisor) and I, in collaboration with GHC gurus Simon Peyton Jones and Simon Marlow, are designing a new “package language” for Haskell in the style of the ML module system. More specifically, we’re starting with Dreyer and Rossberg's system of mixin modules, MixML,1 as the basis for the language.  As mixin modules, packages consist of arbitrary collections of module specifications and module implementations.

With this language we aim to achieve the following:

1) To formalize the notion of a package as Haskell’s unit of distribution.

We’re leaving Haskell’s notion of module largely untouched (with the exception of our module signatures). Instead, all the moduley stuff like abstraction (think functors), linking (think functor application), and hierarchical composition happen at the package level.

In ML type abstraction constitutes a core motivation of the design of the module system, and a core point of frustration for a plethora of type systems for modules. Though Haskell has much weaker support for type abstraction — based on namespace management like hiding — we must still carefully manage the instantiations of modules and, subsequently, their (data) types.

2) To formalize the notion of a package as a program’s module context.

When you invoke `ghc` on a Haskell source file, the compiler needs to make sense of those modules you’re importing. Usually, with the `—make` flag, GHC determines the packages that contain those modules automatically and loads them into the context; in the general case, however, the user must add `-package P` to load `P` into the context.2 This often works without a hitch, but it also relies heavily on the common (yet brittle) assumption that packages generally don’t define modules with the same name — i.e. that developers generally define globally unique module names in their packages.

Instead, our language offers a precise means by which programmers (or more likely, automated tools) define the context of modules under which programs should be understood. We are (currently) designing the type system for our packages with an elaboration semantics, wherein a package is translated into an internal language that is, essentially, readily understood by GHC.

3) To introduce proper abstraction of dependencies in packages.

Packages in Hackage currently look a lot like those you might find in, say, Linux’s APT: a package has a name, a version number, and a list of dependencies, where dependencies are defined as `name version-constraints`. Why should a type-happy language like Haskell rely on the same old notion of dependency as multilingual/typeless package systems?

Rather than depending on the informal interface `foolib >= 1.2.0 && < 2`, a package developer should depend on an exact interface for `foolib` that describes its modules’ contents: data types, functions, type classes, etc.  After all, it’s (usually) not the case that this developer actually cares about which particular version is linked — she just needs that code to exist and do what it’s supposed to.3

Thus we must offer a way to define interfaces, which are essentially packages that define module signatures instead of module (implementations).  What’s a module signature?  Essentially it already exists in GHC as a “boot file" used for recursive imports.  (Thanks to our formulation of packages as mixin modules, our language will natively and naturally support recursive dependency within packages and between packages.)

Another benefit of proper type-based dependencies lies in the package development process.  Currently a developer tests the compilation of her package against whichever concrete instantiations of dependencies that exist on her machine. Another such instantiation, allowed by her dependencies’ version constraints, might lead to unforeseen type/compilation errors. With our language this is no longer an issue because each dependency describes either a precise implementation/version, a precise (typed) interface, or some combination thereof.

4) To solve some of the problems that developers currently face with Hackage.

Version-range dependencies shoehorn Haskell package authors into choosing between type safety and evolvability of their packages. If an author omits an upper bound on a dependency’s version, then she risks installation of her package by users against a version of that dependency that no longer offers the expected definitions at their expected types. But if she does write an upper bound, then she’s artificially limiting the utility of her own package in case that depended-upon package evolves much faster than hers.

The Haskell community has coalesced around an informal (i.e. not mechanically enforced) “package versioning policy" (PVP) to help with this sort of problem. The PVP dictates how version numbers should be incremented; effectively this allows developers to rely on the version numbers themselves as some approximation of a type or interface.

Another problem developers face is the inability to “privately” depend on another package for implementation reasons, such as a particular QuickCheck API. This is often needed so that Cabal’s dependency resolution doesn’t require that all such dependencies across packages are compiled against the same version number. Our package language allows this sort of dependency easily due to its emphasis on types and interfaces.

* * *

In future blog posts I’ll talk more about various aspects of the design. Something I’ve deliberately shelved for the foreseeable future is the redesign of the Cabal part of Hackage; i.e., the user-facing tool that offers all the nice, implicit behavior of packages like choosing implementations to install. Stay tuned for more!

1 Derek Dreyer and Andreas Rossberg. “Mixin’ Up the ML Module System,” ICFP’08. (pdf)
2 See Section 4.9 of the GHC user’s guide.
3 Granted, what it’s “supposed to do” might actually change between versions. In a later post I’ll talk about structural vs. “nominal” signature matching for packages.