Module identity and equivalence in Haskell
In first-order module languages such as Haskell’s, the identity of a module is given solely by its programmer-written name, e.g. Http. When dealing with packages, the identity is augmented with that of its defining package, e.g. httplib-4.1:Http. These names induce a straightforward equivalence relation on modules: modules P1:M1 and P2:M2 are equivalent iff P1=P2 and M1=M2.
Why does module identity matter? Because type equivalence is a direct consequence of module equivalence. Two type constructors T1 and T2 are equivalent iff their original, defining modules are equivalent (and T1 and T2 are syntactically the same type constructor). Note that, e.g., the type constructors http-4.1:Http.Request and http-4.2:Http.Request are *not* equivalent since the two defining module identities are (slightly) distinct.
In today’s Haskell, a module’s identity does not depend on those modules which it imports. In other words, a module’s identity does not capture the provenance of the module’s entities/components. For example, suppose the server-1.0:Server module imports Http; that import could resolve to either http-4.1:Http or http-4.2:Http, depending on the package environment at build/install time. (All the more reason to formalize the notion of package environment.)
package http-4.1: module Http where data Thing = MkThing Bool String package http-4.2: module Http where data Thing = MkThing String package server-1.0: module Server where import Http data Conn = MkConn Int Http.Thing open :: IO Conn open = ... read :: Conn -> IO String read = ...
Note that a value of type Conn has a different representation depending whether server-1.0 is built against http-4.1 or http-4.2, though the identity of the Server module would be simply server-1.0:Server in either case. It would be unsound to mix together these two possible interpretations of Conn!
This ostensible deficiency in the module equivalence relation does not cause a problem today due to some additional checks by GHC. There’s only one way for both instances of the server module to exist simultaneously (an inflexibility which I’ll bemoan shortly):
- build and install the server-1.0 package against http-4.1,
- build some package A against server-1.0,
- rebuild server-1.0 against http-4.2 and reinstall it, and
- build some package B against server-1.0.
Now A and B have two distinct views of the server-1.0:Server module — A sees a Bool in its representation of Conn but B does not. The only step remaining is to include modules from A and B into a unified package main, in which the two distinct server modules with identity server-1.0:Server will coexist. See the figure below. (Note that the server-1.0 in the figure is the exact same package code in both instantiations, despite the buggy appearance of modules that have been moved around.)
Fortunately GHC prevents you from smashing these two worlds together. Each time a package is installed, it carries a hash of its dependencies. The first installation of server-1.0 has some hash abc123 but the second has a different hash 321cba. Moreover, because the database only stores a single instance of each package, step (3) actually deletes and replaces the original, abc123 instance of server-1.0.
When building the modules of A and B, GHC stores the dependency hash for its current view of server-1.0 — abc123 for A and 321cba for B. Then, when trying to configure the combined main package (which depends on A and B), GHC will complain that A is broken because its (now deleted) abc123-view of server-1.0 no longer exists in the database.
GHC keeps everything in check even though A and B both saw distinct modules with the same identity (but at different points in time). Whew.
* * *
In the next post I’ll explain how our new package language allows — and reconciles — multiple instantiations of a package. In fact, this is how we achieve generic package reuse and transform packages from chunks of files into real, live programming abstractions! Stay tuned.