-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support lockfiles, take 2 #1138
Conversation
Before I dive in, does this also address #1134? |
spago.lock
Outdated
needed_by: | ||
- spago-bin | ||
- spago | ||
- docs-search-index | ||
- docs-search-client-halogen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both the workspace packages and the plan packages now include a needed_by
key, so we don't even spend time computing a plan (which takes time), and instead allows for a blanket approach of "read everything in, filter for what we need, and that's our plan"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm following this — if we list the dependencies for each package (including workspace ones), what is the advantage of needed_by?
In other words what is the difference between a) building a workspace package by looking up each of its dependencies in turn vs. b) build it by filtering by dependencies that list it in their needed_by? Aren't they equivalent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, lockfiles like pnpm
's list dependencies at their resolved version for every package:
https://lfx.rushstack.io/pages/concepts/pnpm_lockfile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we list the dependencies for each package (including workspace ones), what is the advantage of needed_by?
The list of dependencies is just one layer of the tree, but to get a build plan you'd have to keep doing lookups until you reach the bottom of the tree (i.e. packages with no dependencies), and this takes time (about 300ms for a package set build on my M1).
Having a needed_by
allows for storing the whole plan in there, instead of a single layer, so the retrieval is immediate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Would the same be achieved by listing the full resolutions for packages instead of just their explicit listed dependencies? That way a build plan for e.g. the spago-bin
package is already stored as its list of resolutions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you'd have a resolutions
key for each workspace package listing its build plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, each workspace package would have a dependencies
key which reflects the spago.yaml
file so we can detect if they've changed, and then a resolutions
key based on the dependencies
that specifies the plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, let me try it out
toArray :: forall k v. Map k v -> Array (Tuple k v) | ||
toArray = Map.toUnfoldable | ||
({ packages, workspacePackages } :: LockfileBuilderResult) <- | ||
Array.foldM processPackage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used to just map the packages into the lockfile data structure, but we now use a fold to go through them so we can keep track of which workspace package needs which other package, which makes it dead easy to figure out the build plan afterwards
src/Spago/Command/Fetch.purs
Outdated
case workspace.packageSet.lockfile of | ||
-- If we have a lockfile we can just use that - we don't even need build a plan, we can just filter the packages | ||
-- marked as needed by the workspace package that we are processing, as we just dumped the plan in the lockfile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Computing the build plan when we have a lockfile is basically "filter by needed_by
"
src/Spago/Config.purs
Outdated
workspacePackageToLockfilePackage :: WorkspacePackage -> Tuple PackageName Lock.WorkspaceLockPackage | ||
workspacePackageToLockfilePackage { path, package } = Tuple package.name | ||
{ path | ||
, dependencies: package.dependencies | ||
, test_dependencies: foldMap _.dependencies package.test | ||
, needed_by: mempty -- Note: this is filled in later | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These functions feel like they belong to the Lock
module, but that would create a cycle, so they are stuck here. I think it's fine, not too much code
src/Spago/Config.purs
Outdated
shouldComputeNewLockfile :: { workspace :: Core.WorkspaceConfig, workspacePackages :: Map PackageName WorkspacePackage } -> Lock.WorkspaceLock -> Boolean | ||
shouldComputeNewLockfile { workspace, workspacePackages } workspaceLock = | ||
-- the workspace packages should exactly match, except for the needed_by field, which is filled in during build plan construction | ||
(map (workspacePackageToLockfilePackage >>> snd) workspacePackages == (map (_ { needed_by = mempty }) workspaceLock.packages)) | ||
-- and the extra packages should exactly match | ||
&& (fromMaybe Map.empty workspace.extra_packages == workspaceLock.extra_packages) | ||
-- and the package set address needs to match - we have no way to match the package set contents at this point, so we let it be | ||
&& (workspace.package_set == map _.address workspaceLock.package_set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the bit of logic that computes when the lockfile needs a refresh - it's a bit involved because some things are generated and we don't have them at the point where we read the lockfile in (like, we stash the whole package set in there), but hopefully it's commented enough to make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And dumping the package set in there is important so that we can then:
- skip hashing it, since we just have the content
- use it for commands like
spago ls
, where we can use just the lockfile as well, instead of having to look up the set, etc etc
@thomashoneyman CI fails without that patch, so I pushed the fix for #1134 as well! |
bin/src/Main.purs
Outdated
@@ -528,7 +547,7 @@ main = do | |||
logInfo "Set up a new Spago project." | |||
logInfo "Try running `spago run`" | |||
Fetch args -> do | |||
{ env, fetchOpts } <- mkFetchEnv offline args | |||
{ env, fetchOpts } <- mkFetchEnv offline (Record.insert (Proxy :: _ "isRepl") false args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for the choice to use insert
here vs. merge
on line 562? Arbitrary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to use insert
if I'm only dealing with one key, but if you prefer merge I could try to swap it in and see if that works (all the record things are quite finicky when it comes to envs, for some reason)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't prefer either, it just looked like you were using merge
with one field below and I didn't know if there was a reason to choose one or another that emerged from your env-wrangling :)
@@ -42,7 +42,7 @@ type BuildOptions = | |||
, jsonErrors :: Boolean | |||
} | |||
|
|||
run :: forall a. BuildOptions -> Spago (BuildEnv a) Unit | |||
run :: BuildOptions -> Spago (BuildEnv _) Unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason to wildcard this? Won't this produce warnings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question in the other modules, like Docs
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will produce warnings, but we should filter them out - I added all the wildcards after chatting with @mikesol (on Discord, here) about fitting all of our Envs together, and found that using wildcards gets us better inference - I had a few puzzling errors just go away once I swapped them in. (Mike has a whole blog post about this)
src/Spago/Command/Publish.purs
Outdated
, getMetadata: env.getMetadata | ||
, workspace: env.workspace { selected = Just selected } | ||
, logOptions: env.logOptions | ||
-- FIXME: I think we can make it work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a leftover FIXME? Or does it need to be addressed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hoping to fix this one too, but I can't find a good way to make it work. So I'll think about it some more or take out this FIXME
type PreRegistryEnv a = Record (PreRegistryEnvRow a) | ||
|
||
type RegistryEnvRow a = PreRegistryEnvRow | ||
( getRegistry :: Spago (PreRegistryEnv ()) RegistryFunctions | ||
| a | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow why we have the "pre-registry" row and then the "registry" row with the extra getRegistry
function. Would you mind explaining it a little?
Is it solely to allow the registry functions to refer to the "pre-registry" env themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the compiler would try to infer an infinite type if these would run in the same env, so I had to separate them
data BuildType | ||
= RegistrySolverBuild PackageMap | ||
| PackageSetBuild PackageSetInfo PackageMap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this means you are either building with the solver, ie. you use dependency ranges, or you're building with a package set, ie. you don't use ranges. What happens in the hybrid cases?
- A user provides ranges, but also specified a package set and extra_packages
- A user provides some ranges and also specifies a package set
In the first case we have what we need for a solver build, but should we take their package set instructions into account? If not, do we warn them? For the second case is this just an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this means you are either building with the solver, ie. you use dependency ranges, or you're building with a package set, ie. you don't use ranges. What happens in the hybrid cases?
There's no hybrid case. If you have a package set then you are building with it, if you don't have one then you are using the solver.
We throw away the ranges when we get a build plan with the package set - see here, we have a Map PackageName Range
, and call Map.keys
:
spago/src/Spago/Command/Fetch.purs
Line 399 in d7128c7
PackageSetBuild _info set -> getTransitiveDepsFromPackageSet set (Array.fromFoldable $ Map.keys depsRanges) |
If the project has a package set the ranges only matter when publishing, where we build with both the package set and the solver (so the ranges are checked, just not as often as in a solver build)
src/Spago/Config.purs
Outdated
@@ -149,8 +148,8 @@ data Package | |||
|
|||
-- | Reads all the configurations in the tree and builds up the Map of local | |||
-- | packages to be integrated in the package set | |||
readWorkspace :: forall a. Maybe PackageName -> Spago (Git.GitEnv a) Workspace | |||
readWorkspace maybeSelectedPackage = do | |||
readWorkspace :: Maybe PackageName -> Boolean -> Spago (Registry.RegistryEnv _) Workspace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm always a little leery of booleans as arguments; what do you think of either making this a record:
{ selectedPackage :: Maybe PackageName
, pureBuild :: Boolean
}
or a data type
data PureBuild = NotPure | Pure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, either works
On light review this looks quite good; I like the introduction of the locking mechanism for access to refreshing the registry. Did you consider debouncing it as well such that we only refresh the registry at most once every N seconds? Maybe that's a premature optimization, but it's something I had to do in the registry so it came to mind. My main comment has to do with the |
-- TODO: we should look in the "other" packages first | ||
Registry _other -> do | ||
maybeMetadata <- runSpago { logOptions } (getMetadata supportPackageName) | ||
RegistrySolverBuild _other -> do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick thought on solver vs. package set builds: should spago init
require you to specify 'app' or 'lib' as an init type such that libraries use ranges off the bat and don't use a package set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should spago init require you to specify 'app' or 'lib' as an init type such that libraries use ranges off the bat and don't use a package set?
We force users that publish their package to specify ranges, so I don't see a reason to enforce this upfront.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, if you mean "allow users to init a project without a package set", then we already have it:
spago init --use-solver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant we may want to make them choose --use-solver
or --use-package-sets
to avoid people creating a new library with spago init
and then only adding ranges once they go to publish
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory of trying to get a Bower project to work and stepping through whole pages or solver errors in bright red ink is still fresh in me - sure, our solver is much more sensible than Bower's, but I feel the main reason the majority of the community is using package sets today is that they are just less complex to deal with and users can concentrate on actually writing the thing instead of dealing with builds. (as a user I would like that to be my experience at least, and I'm even in that small slice of the demographic that likes dealing with builds!)
So yeah, I would prefer to keep package sets as default because they are easy, and have solver builds as the "yes yes I know what I'm about to get into" opt-in option.
Not a premature optimization, it was in fact the first optimization we had around this, and got to it already in the early days of the codebase - it's very annoying to have to wait for a We debounce Registry pulls so that they happen every 15 minutes at most, see here: Lines 286 to 309 in d7128c7
|
@thomashoneyman ready for a look again |
Looks like the lockfile may need to be regenerated, according to this CI failure? |
Looks pretty good! I did find that if I install a package it is added to my spago.yaml config, but it isn't added to the lockfile and I always get a message that I'm missing a lockfile even though I do have one. On first install:
This adds to the spago.yaml but not to the spago.lock. Then, on a re-run of the install command:
Now I do see it added in the spago.lock file too. The same seems to happen with uninstalling. If I uninstall aff-bus now, I see this message:
The lockfile isn't updated, though. If I try again:
The lockfile is still not updated and lists Note: this is with spago@0.93.21, so not the very latest of this branch, so it's possible this was fixed along the way. |
Also, it seems that if I run |
Eh, the patch includes a change that moves from a fatal error to a warning (if we can't decode the file we just move on and make a new one) - I published a new version (0.93.22), but CI doesn't seem to be picking that one up? |
I've seen this one happen, but since a long time, so I don't think the lockfile is involved |
I'll have a look at these - we have a test that should test this, but it's likely not thorough enough |
It doesn't look like 0.93.22 ever made it to NPM: |
Ah that must be it then! I'll push a new one once I manage to patch the holes you found 😄 |
@thomashoneyman I should have addressed everything now - pushed 0.93.22 with the latest changes, and added a test that checks the lockfile when installing/uninstalling (the scenarios you tried). The patches also fixed up a few other fixtures, which I am happy about |
This PR fixes #917 (and fixes #1134 as well while we're at it), adding lockfile support for all builds (solver and package sets), with no opt-out, i.e. all spago invocations that imply a build plan will now output a lockfile, or use the existing one to save work.
We also add a
--pure
flag to skip updating the lockfile even if it's deemed out of date.Note: this is a breaking change for the lockfile format, but right now they are so broken that no one should be using them 🙂