You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been working on a package manager prototype here, and the result is something that works but also raises a bunch of questions. I've collected some here, together with the decisions made by the prototype.
Manifest vs "inline"
Nix and Dhall allow for importing dependencies dynamically, using things like fetchGit. Many (most?) other languages require dependencies to be specified in some sort of package manifest. The inline form has the advantage of being lightweight -- you can get it all done with only one file, and you only need to download packages that you actually use during evaluation -- but having a manifest makes it easier to use nickel in a sandboxed environment, and it makes lockfiles easier to manage.
The current prototype uses a manifest.
Manifest auto-detection
If we're using an external manifest, how do we find it? It would be nice if the user could just type nickel export foo.ncl and have it just work. The most common method of autodetection appears to be to look in parent directories until we find the manifest file, which has a well-known name. There is a small backwards-compatibility concern with this, in that if a user's system happens to have a file with that well-known name, adding package support to nickel could lead us the misinterpret that file as a manifest.
The current prototype's well-known name is "package.ncl", and the lock-file is called "package.lock". At least the lock-file name should probably change, or no one will be able to use nickel and node in the same project...
Kinds of dependencies
Where can dependencies come from? Dhall allows imports from arbitrary urls. Nix supports fetching from a variety of VCSs, paths, and archive formats.
The current nickel prototype allows for importing from
paths (relative or absolute)
git repositories (currently only from HEAD, but the idea would be to also support branches, tags, and revisions specified by hashes)
a central registry, that can identify packages by name and version number
The current prototype also requires the imported package to contain a manifest file. This might not be necessary, but I guess it will be necessary if the imported package wants to have its own dependencies.
Version compatibility and resolution
How do we choose package versions, and how do we handle a package that gets imported multiple times in the dependency tree? This has to depend on the dependency type, I think.
For path dependencies, there is no version choice: we import the version of the dependency that is present on the filesystem at that path. Path dependencies do present some annoyances for the lock-file, though: a path dependency's dependencies can change at any time. Therefore the lock file should record the existence of a path dependency, but not record its dependencies. (This is consistent with what cargo does.)
Git dependencies can be immutable (if some hash is specified) or not (if a branch or a tag is specified). Immutable git dependencies are easy for the resolver. For mutable git dependencies, if they are not yet present in the lock-file then they are fetched and the tree hash is recorded. After that, the tree hash is looked up in the lock-file and the dependency is treated as an immutable git dependency.
Dependencies from the registry are the most interesting. Fortunately, there are fairly well-established conventions for specifying ranges of versions (like ">=1.0 <3.0", or "^1.2"). What's less clear is how to handle multiple packages with overlapping ranges. Some languages (e.g. python) insist that each package resolves to a single version across the whole dependency tree. Other languages allow multiple versions, keeping track of which package in the dependency tree needs to import which version of a package.
I think we want to allow multiple versions of a package; the alternative can be fragile and annoying. But then we need to figure out how many different versions to allow. There's a trade-off: if we allow pulling in a different version every time a package gets imported, solving the dependency graph is easy. But it increases the chance of getting incompatibilities at runtime: we might accidentally get a value from util@1.1 and try to pass it to an incompatible function defined in util@1.2.
The current prototype uses a strategy similar to cargo: it divides package versions into semver-delimited "bins" and allows resolution to choose at most one version from each bin. That is, we can have a util@2.2 and a util@1.2 in the same dependency tree, but not a util@1.2 and a util@1.1.
Lock-file behavior and updates
What happens if we have a lock-file, but we modify the manifest? We don't want to be too strict about requiring the exact versions in the lock-file, or we'll end up forcing the user to re-create the lock-file from scratch.
The current prototype treats the lock-file as a suggestion: during resolving, when choosing the next package version to try, it picks the locked version first. But if the locked version leads to a conflict, it will try another version without complaining. If nothing has changed since the lock-file was created, it should always resolve the same versions.
Registry updates, and submitting packages
How should we manage the global registry? There's a potential for incurring substantial maintenance costs here, so we should be careful.
The current implementation of the registry is as a git repo with a bunch of files (one per package, containing a line per version). Each entry specifies the location of the package (currently required to be on github) and its git tree hash. This ensures that packages are immutable, but it doesn't stop them from disappearing: we don't keep a copy of the actual package contents.
The current prototype doesn't have any automatic way of introducing new packages. There is a command to scrape package repos and update the list of available versions, so the initial plan is to add new packages manually, and use a cron job to keep them sort of up-to-date.
Registry namespacing
I think packages in the registry should be namespaced, probably with a depth of 2. That is, they should be identified as organization/package-name. This maps nicely to github names, and so if we enable automatic package submission in the future, it will allow us to outsource authorization: you can publish tweag/foo if you're in the github tweag organization.
There is a possible downside of tying this too tightly to github. Maybe there should be depth-3 names, like github/tweag/foo?
Manifest file format
The prototype has its manifest in nickel format. This seemed like a fun choice (and it allows us to use a contract for validation and auto-complete), but a plain-data format like toml might be better for tooling.
Specifying dependency names
How should we refer to package names in the manifest, and in nickel code? The syntax should be light-weight and unambiguous, but it should also support package renaming.
In the current prototype, the manifest explicitly assigns an identifier to every package. For example, your manifest could include
Then the actual nickel code can write import foo or import bar.
This choice has the advantage that renaming packages is trivial, but the disadvantage that the manifest syntax is redundant in the common case. Another possibility would be to allow
dependencies = {
"tweag/foo" = "1.2.0"
}
and then import it with import tweag/foo.
Package entry points
Packages might consist of multiple files, and they might not want to publicly expose the detail of how they're structured. How do we know what part is public?
node allows the manifest to specify the entry point(s). Our prototype hard-codes "main.ncl"; when you type import foo, you get the file main.ncl in the package's root directory.
What kind of tooling do we need?
The current prototype doesn't have much. We probably want
a command for adding a new dependency to the manifest (checking if it exists, and picking the most recent version)
a command for downloading the dependency tree (for use in build systems that expect different "fetch" and "build" phases)
a command that checks for new dependency versions and updates the manifest
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I've been working on a package manager prototype here, and the result is something that works but also raises a bunch of questions. I've collected some here, together with the decisions made by the prototype.
Manifest vs "inline"
Nix and Dhall allow for importing dependencies dynamically, using things like
fetchGit
. Many (most?) other languages require dependencies to be specified in some sort of package manifest. The inline form has the advantage of being lightweight -- you can get it all done with only one file, and you only need to download packages that you actually use during evaluation -- but having a manifest makes it easier to use nickel in a sandboxed environment, and it makes lockfiles easier to manage.The current prototype uses a manifest.
Manifest auto-detection
If we're using an external manifest, how do we find it? It would be nice if the user could just type
nickel export foo.ncl
and have it just work. The most common method of autodetection appears to be to look in parent directories until we find the manifest file, which has a well-known name. There is a small backwards-compatibility concern with this, in that if a user's system happens to have a file with that well-known name, adding package support to nickel could lead us the misinterpret that file as a manifest.The current prototype's well-known name is "package.ncl", and the lock-file is called "package.lock". At least the lock-file name should probably change, or no one will be able to use nickel and node in the same project...
Kinds of dependencies
Where can dependencies come from? Dhall allows imports from arbitrary urls. Nix supports fetching from a variety of VCSs, paths, and archive formats.
The current nickel prototype allows for importing from
The current prototype also requires the imported package to contain a manifest file. This might not be necessary, but I guess it will be necessary if the imported package wants to have its own dependencies.
Version compatibility and resolution
How do we choose package versions, and how do we handle a package that gets imported multiple times in the dependency tree? This has to depend on the dependency type, I think.
For path dependencies, there is no version choice: we import the version of the dependency that is present on the filesystem at that path. Path dependencies do present some annoyances for the lock-file, though: a path dependency's dependencies can change at any time. Therefore the lock file should record the existence of a path dependency, but not record its dependencies. (This is consistent with what cargo does.)
Git dependencies can be immutable (if some hash is specified) or not (if a branch or a tag is specified). Immutable git dependencies are easy for the resolver. For mutable git dependencies, if they are not yet present in the lock-file then they are fetched and the tree hash is recorded. After that, the tree hash is looked up in the lock-file and the dependency is treated as an immutable git dependency.
Dependencies from the registry are the most interesting. Fortunately, there are fairly well-established conventions for specifying ranges of versions (like ">=1.0 <3.0", or "^1.2"). What's less clear is how to handle multiple packages with overlapping ranges. Some languages (e.g. python) insist that each package resolves to a single version across the whole dependency tree. Other languages allow multiple versions, keeping track of which package in the dependency tree needs to import which version of a package.
I think we want to allow multiple versions of a package; the alternative can be fragile and annoying. But then we need to figure out how many different versions to allow. There's a trade-off: if we allow pulling in a different version every time a package gets imported, solving the dependency graph is easy. But it increases the chance of getting incompatibilities at runtime: we might accidentally get a value from
util@1.1
and try to pass it to an incompatible function defined inutil@1.2
.The current prototype uses a strategy similar to cargo: it divides package versions into semver-delimited "bins" and allows resolution to choose at most one version from each bin. That is, we can have a
util@2.2
and autil@1.2
in the same dependency tree, but not autil@1.2
and autil@1.1
.Lock-file behavior and updates
What happens if we have a lock-file, but we modify the manifest? We don't want to be too strict about requiring the exact versions in the lock-file, or we'll end up forcing the user to re-create the lock-file from scratch.
The current prototype treats the lock-file as a suggestion: during resolving, when choosing the next package version to try, it picks the locked version first. But if the locked version leads to a conflict, it will try another version without complaining. If nothing has changed since the lock-file was created, it should always resolve the same versions.
Registry updates, and submitting packages
How should we manage the global registry? There's a potential for incurring substantial maintenance costs here, so we should be careful.
The current implementation of the registry is as a git repo with a bunch of files (one per package, containing a line per version). Each entry specifies the location of the package (currently required to be on github) and its git tree hash. This ensures that packages are immutable, but it doesn't stop them from disappearing: we don't keep a copy of the actual package contents.
The current prototype doesn't have any automatic way of introducing new packages. There is a command to scrape package repos and update the list of available versions, so the initial plan is to add new packages manually, and use a cron job to keep them sort of up-to-date.
Registry namespacing
I think packages in the registry should be namespaced, probably with a depth of 2. That is, they should be identified as organization/package-name. This maps nicely to github names, and so if we enable automatic package submission in the future, it will allow us to outsource authorization: you can publish
tweag/foo
if you're in the githubtweag
organization.There is a possible downside of tying this too tightly to github. Maybe there should be depth-3 names, like
github/tweag/foo
?Manifest file format
The prototype has its manifest in nickel format. This seemed like a fun choice (and it allows us to use a contract for validation and auto-complete), but a plain-data format like toml might be better for tooling.
Specifying dependency names
How should we refer to package names in the manifest, and in nickel code? The syntax should be light-weight and unambiguous, but it should also support package renaming.
In the current prototype, the manifest explicitly assigns an identifier to every package. For example, your manifest could include
Then the actual nickel code can write
import foo
orimport bar
.This choice has the advantage that renaming packages is trivial, but the disadvantage that the manifest syntax is redundant in the common case. Another possibility would be to allow
and then import it with
import tweag/foo
.Package entry points
Packages might consist of multiple files, and they might not want to publicly expose the detail of how they're structured. How do we know what part is public?
node allows the manifest to specify the entry point(s). Our prototype hard-codes "main.ncl"; when you type
import foo
, you get the filemain.ncl
in the package's root directory.What kind of tooling do we need?
The current prototype doesn't have much. We probably want
Anything else?
Beta Was this translation helpful? Give feedback.
All reactions