Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support lockfiles, take 2 #1138

Merged
merged 9 commits into from
Dec 20, 2023
Merged

Support lockfiles, take 2 #1138

merged 9 commits into from
Dec 20, 2023

Conversation

f-f
Copy link
Member

@f-f f-f commented Dec 10, 2023

This PR fixes #917 (and fixes #1134 as well while we're at it), adding lockfile support for all builds (solver and package sets), with no opt-out, i.e. all spago invocations that imply a build plan will now output a lockfile, or use the existing one to save work.

We also add a --pure flag to skip updating the lockfile even if it's deemed out of date.

Note: this is a breaking change for the lockfile format, but right now they are so broken that no one should be using them 🙂

@thomashoneyman
Copy link
Member

Before I dive in, does this also address #1134?

spago.lock Outdated
Comment on lines 42 to 46
needed_by:
- spago-bin
- spago
- docs-search-index
- docs-search-client-halogen
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both the workspace packages and the plan packages now include a needed_by key, so we don't even spend time computing a plan (which takes time), and instead allows for a blanket approach of "read everything in, filter for what we need, and that's our plan"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm following this — if we list the dependencies for each package (including workspace ones), what is the advantage of needed_by?

In other words what is the difference between a) building a workspace package by looking up each of its dependencies in turn vs. b) build it by filtering by dependencies that list it in their needed_by? Aren't they equivalent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, lockfiles like pnpm's list dependencies at their resolved version for every package:
https://lfx.rushstack.io/pages/concepts/pnpm_lockfile

Copy link
Member Author

@f-f f-f Dec 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we list the dependencies for each package (including workspace ones), what is the advantage of needed_by?

The list of dependencies is just one layer of the tree, but to get a build plan you'd have to keep doing lookups until you reach the bottom of the tree (i.e. packages with no dependencies), and this takes time (about 300ms for a package set build on my M1).
Having a needed_by allows for storing the whole plan in there, instead of a single layer, so the retrieval is immediate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Would the same be achieved by listing the full resolutions for packages instead of just their explicit listed dependencies? That way a build plan for e.g. the spago-bin package is already stored as its list of resolutions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you'd have a resolutions key for each workspace package listing its build plan?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, each workspace package would have a dependencies key which reflects the spago.yaml file so we can detect if they've changed, and then a resolutions key based on the dependencies that specifies the plan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, let me try it out

toArray :: forall k v. Map k v -> Array (Tuple k v)
toArray = Map.toUnfoldable
({ packages, workspacePackages } :: LockfileBuilderResult) <-
Array.foldM processPackage
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to just map the packages into the lockfile data structure, but we now use a fold to go through them so we can keep track of which workspace package needs which other package, which makes it dead easy to figure out the build plan afterwards

Comment on lines 379 to 381
case workspace.packageSet.lockfile of
-- If we have a lockfile we can just use that - we don't even need build a plan, we can just filter the packages
-- marked as needed by the workspace package that we are processing, as we just dumped the plan in the lockfile.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computing the build plan when we have a lockfile is basically "filter by needed_by"

Comment on lines 394 to 400
workspacePackageToLockfilePackage :: WorkspacePackage -> Tuple PackageName Lock.WorkspaceLockPackage
workspacePackageToLockfilePackage { path, package } = Tuple package.name
{ path
, dependencies: package.dependencies
, test_dependencies: foldMap _.dependencies package.test
, needed_by: mempty -- Note: this is filled in later
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions feel like they belong to the Lock module, but that would create a cycle, so they are stuck here. I think it's fine, not too much code

Comment on lines 402 to 409
shouldComputeNewLockfile :: { workspace :: Core.WorkspaceConfig, workspacePackages :: Map PackageName WorkspacePackage } -> Lock.WorkspaceLock -> Boolean
shouldComputeNewLockfile { workspace, workspacePackages } workspaceLock =
-- the workspace packages should exactly match, except for the needed_by field, which is filled in during build plan construction
(map (workspacePackageToLockfilePackage >>> snd) workspacePackages == (map (_ { needed_by = mempty }) workspaceLock.packages))
-- and the extra packages should exactly match
&& (fromMaybe Map.empty workspace.extra_packages == workspaceLock.extra_packages)
-- and the package set address needs to match - we have no way to match the package set contents at this point, so we let it be
&& (workspace.package_set == map _.address workspaceLock.package_set)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bit of logic that computes when the lockfile needs a refresh - it's a bit involved because some things are generated and we don't have them at the point where we read the lockfile in (like, we stash the whole package set in there), but hopefully it's commented enough to make sense

Copy link
Member Author

@f-f f-f Dec 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And dumping the package set in there is important so that we can then:

  1. skip hashing it, since we just have the content
  2. use it for commands like spago ls, where we can use just the lockfile as well, instead of having to look up the set, etc etc

@f-f
Copy link
Member Author

f-f commented Dec 10, 2023

@thomashoneyman CI fails without that patch, so I pushed the fix for #1134 as well!

spago.lock Outdated Show resolved Hide resolved
@@ -528,7 +547,7 @@ main = do
logInfo "Set up a new Spago project."
logInfo "Try running `spago run`"
Fetch args -> do
{ env, fetchOpts } <- mkFetchEnv offline args
{ env, fetchOpts } <- mkFetchEnv offline (Record.insert (Proxy :: _ "isRepl") false args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for the choice to use insert here vs. merge on line 562? Arbitrary?

Copy link
Member Author

@f-f f-f Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to use insert if I'm only dealing with one key, but if you prefer merge I could try to swap it in and see if that works (all the record things are quite finicky when it comes to envs, for some reason)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't prefer either, it just looked like you were using merge with one field below and I didn't know if there was a reason to choose one or another that emerged from your env-wrangling :)

@@ -42,7 +42,7 @@ type BuildOptions =
, jsonErrors :: Boolean
}

run :: forall a. BuildOptions -> Spago (BuildEnv a) Unit
run :: BuildOptions -> Spago (BuildEnv _) Unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason to wildcard this? Won't this produce warnings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question in the other modules, like Docs below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will produce warnings, but we should filter them out - I added all the wildcards after chatting with @mikesol (on Discord, here) about fitting all of our Envs together, and found that using wildcards gets us better inference - I had a few puzzling errors just go away once I swapped them in. (Mike has a whole blog post about this)

, getMetadata: env.getMetadata
, workspace: env.workspace { selected = Just selected }
, logOptions: env.logOptions
-- FIXME: I think we can make it work
Copy link
Member

@thomashoneyman thomashoneyman Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a leftover FIXME? Or does it need to be addressed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to fix this one too, but I can't find a good way to make it work. So I'll think about it some more or take out this FIXME

Comment on lines +52 to +57
type PreRegistryEnv a = Record (PreRegistryEnvRow a)

type RegistryEnvRow a = PreRegistryEnvRow
( getRegistry :: Spago (PreRegistryEnv ()) RegistryFunctions
| a
)
Copy link
Member

@thomashoneyman thomashoneyman Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow why we have the "pre-registry" row and then the "registry" row with the extra getRegistry function. Would you mind explaining it a little?

Is it solely to allow the registry functions to refer to the "pre-registry" env themselves?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the compiler would try to infer an infinite type if these would run in the same env, so I had to separate them

Comment on lines +127 to +129
data BuildType
= RegistrySolverBuild PackageMap
| PackageSetBuild PackageSetInfo PackageMap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this means you are either building with the solver, ie. you use dependency ranges, or you're building with a package set, ie. you don't use ranges. What happens in the hybrid cases?

  • A user provides ranges, but also specified a package set and extra_packages
  • A user provides some ranges and also specifies a package set

In the first case we have what we need for a solver build, but should we take their package set instructions into account? If not, do we warn them? For the second case is this just an error?

Copy link
Member Author

@f-f f-f Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this means you are either building with the solver, ie. you use dependency ranges, or you're building with a package set, ie. you don't use ranges. What happens in the hybrid cases?

There's no hybrid case. If you have a package set then you are building with it, if you don't have one then you are using the solver.

We throw away the ranges when we get a build plan with the package set - see here, we have a Map PackageName Range, and call Map.keys:

PackageSetBuild _info set -> getTransitiveDepsFromPackageSet set (Array.fromFoldable $ Map.keys depsRanges)

If the project has a package set the ranges only matter when publishing, where we build with both the package set and the solver (so the ranges are checked, just not as often as in a solver build)

@@ -149,8 +148,8 @@ data Package

-- | Reads all the configurations in the tree and builds up the Map of local
-- | packages to be integrated in the package set
readWorkspace :: forall a. Maybe PackageName -> Spago (Git.GitEnv a) Workspace
readWorkspace maybeSelectedPackage = do
readWorkspace :: Maybe PackageName -> Boolean -> Spago (Registry.RegistryEnv _) Workspace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm always a little leery of booleans as arguments; what do you think of either making this a record:

{ selectedPackage :: Maybe PackageName
, pureBuild :: Boolean
}

or a data type

data PureBuild = NotPure | Pure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, either works

@thomashoneyman
Copy link
Member

On light review this looks quite good; I like the introduction of the locking mechanism for access to refreshing the registry. Did you consider debouncing it as well such that we only refresh the registry at most once every N seconds? Maybe that's a premature optimization, but it's something I had to do in the registry so it came to mind.

My main comment has to do with the needed_by key vs. just storing resolutions directly as I find the latter a little easier to work with, conceptually. Once you make a decision on that I can review again and test this out.

-- TODO: we should look in the "other" packages first
Registry _other -> do
maybeMetadata <- runSpago { logOptions } (getMetadata supportPackageName)
RegistrySolverBuild _other -> do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick thought on solver vs. package set builds: should spago init require you to specify 'app' or 'lib' as an init type such that libraries use ranges off the bat and don't use a package set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should spago init require you to specify 'app' or 'lib' as an init type such that libraries use ranges off the bat and don't use a package set?

We force users that publish their package to specify ranges, so I don't see a reason to enforce this upfront.

Copy link
Member Author

@f-f f-f Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, if you mean "allow users to init a project without a package set", then we already have it:

spago init --use-solver

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant we may want to make them choose --use-solver or --use-package-sets to avoid people creating a new library with spago init and then only adding ranges once they go to publish

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory of trying to get a Bower project to work and stepping through whole pages or solver errors in bright red ink is still fresh in me - sure, our solver is much more sensible than Bower's, but I feel the main reason the majority of the community is using package sets today is that they are just less complex to deal with and users can concentrate on actually writing the thing instead of dealing with builds. (as a user I would like that to be my experience at least, and I'm even in that small slice of the demographic that likes dealing with builds!)

So yeah, I would prefer to keep package sets as default because they are easy, and have solver builds as the "yes yes I know what I'm about to get into" opt-in option.

@f-f
Copy link
Member Author

f-f commented Dec 11, 2023

Did you consider debouncing it as well such that we only refresh the registry at most once every N seconds? Maybe that's a premature optimization, but it's something I had to do in the registry so it came to mind.

Not a premature optimization, it was in fact the first optimization we had around this, and got to it already in the early days of the codebase - it's very annoying to have to wait for a git pull so often.

We debounce Registry pulls so that they happen every 15 minutes at most, see here:

shouldFetchRegistryRepos :: forall a. Db -> Spago (LogEnv a) Boolean
shouldFetchRegistryRepos db = do
now <- liftEffect $ Now.nowDateTime
let registryKey = "registry"
maybeLastRegistryFetch <- liftEffect $ Db.getLastPull db registryKey
case maybeLastRegistryFetch of
-- No record, so we have to fetch
Nothing -> do
logDebug "No record of last registry pull, will fetch"
liftEffect $ Db.updateLastPull db registryKey now
pure true
-- We have a record, so we check if it's old enough
Just lastRegistryFetch -> do
let staleAfter = Minutes 15.0
let (timeDiff :: Minutes) = DateTime.diff now lastRegistryFetch
let isOldEnough = timeDiff > staleAfter
-- We check if it's old, but also if we have it at all
registryExists <- FS.exists Paths.registryPath
if isOldEnough || not registryExists then do
logDebug "Registry is old, refreshing"
liftEffect $ Db.updateLastPull db registryKey now
pure true
else do
pure false

@f-f
Copy link
Member Author

f-f commented Dec 18, 2023

@thomashoneyman ready for a look again

@thomashoneyman
Copy link
Member

Looks like the lockfile may need to be regenerated, according to this CI failure?

@thomashoneyman
Copy link
Member

thomashoneyman commented Dec 18, 2023

Looks pretty good!

I did find that if I install a package it is added to my spago.yaml config, but it isn't added to the lockfile and I always get a message that I'm missing a lockfile even though I do have one.

On first install:

➜ spago install -p bin aff-bus
Reading Spago workspace configuration...

✅ Selecting package to build: bin

Adding 1 packages to the config in bin/spago.yaml
No lockfile found, generating one...
Lockfile written to spago.lock. Please commit this file.
Downloading dependencies...
Building...
           Src   Lib   All
Warnings     0     0     0
Errors       0     0     0

✅ Build succeeded.

This adds to the spago.yaml but not to the spago.lock. Then, on a re-run of the install command:

➜ spago install -p bin aff-bus
Reading Spago workspace configuration...

✅ Selecting package to build: bin

⚠️ You tried to install some packages that are already present in the configuration, proceeding anyways:
  - aff-bus
Adding 1 packages to the config in bin/spago.yaml
No lockfile found, generating one...
Lockfile written to spago.lock. Please commit this file.
Downloading dependencies...
Fetching package aff-bus@6.0.0
Building...

Now I do see it added in the spago.lock file too. The same seems to happen with uninstalling. If I uninstall aff-bus now, I see this message:

➜ spago uninstall -p bin aff-bus
Reading Spago workspace configuration...

✅ Selecting package to build: bin

Removing the following source dependencies:
  aff-bus

The lockfile isn't updated, though. If I try again:

➜ spago uninstall -p bin aff-bus
Reading Spago workspace configuration...

✅ Selecting package to build: bin

⚠️ The following packages cannot be uninstalled because they are not declared in the package's source dependencies:
  aff-bus
The package config for bin was not updated.

The lockfile is still not updated and lists aff-bus as one of my package's dependencies even though that's not true, as reflected in the spago.yaml

Note: this is with spago@0.93.21, so not the very latest of this branch, so it's possible this was fixed along the way.

@thomashoneyman
Copy link
Member

Also, it seems that if I run spago build in the case my lockfile is incorrect (out of date, missing, etc.) then my workspaces are built. Then, if I run spago build again I expect a no-op, but instead my project sources and anything in extra_packages seem to get recompiled a second time. Is there some sort of cache issue there?

@f-f
Copy link
Member Author

f-f commented Dec 19, 2023

Looks like the lockfile may need to be regenerated, according to this CI failure?

Eh, the patch includes a change that moves from a fatal error to a warning (if we can't decode the file we just move on and make a new one) - I published a new version (0.93.22), but CI doesn't seem to be picking that one up?

@f-f
Copy link
Member Author

f-f commented Dec 19, 2023

Also, it seems that if I run spago build in the case my lockfile is incorrect (out of date, missing, etc.) then my workspaces are built. Then, if I run spago build again I expect a no-op, but instead my project sources and anything in extra_packages seem to get recompiled a second time. Is there some sort of cache issue there?

I've seen this one happen, but since a long time, so I don't think the lockfile is involved

@f-f
Copy link
Member Author

f-f commented Dec 19, 2023

I did find that if I install a package it is added to my spago.yaml config, but it isn't added to the lockfile and I always get a message that I'm missing a lockfile even though I do have one.

The lockfile is still not updated and lists aff-bus as one of my package's dependencies even though that's not true, as reflected in the spago.yaml

I'll have a look at these - we have a test that should test this, but it's likely not thorough enough

@thomashoneyman
Copy link
Member

It doesn't look like 0.93.22 ever made it to NPM:
https://registry.npmjs.org/spago

@f-f
Copy link
Member Author

f-f commented Dec 19, 2023

It doesn't look like 0.93.22 ever made it to NPM

Ah that must be it then! I'll push a new one once I manage to patch the holes you found 😄

@f-f
Copy link
Member Author

f-f commented Dec 20, 2023

@thomashoneyman I should have addressed everything now - pushed 0.93.22 with the latest changes, and added a test that checks the lockfile when installing/uninstalling (the scenarios you tried). The patches also fixed up a few other fixtures, which I am happy about

@f-f f-f merged commit ebde68c into master Dec 20, 2023
3 checks passed
@f-f f-f deleted the finish-lockfile-support branch December 20, 2023 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't require a checkout of the registry repo when building with a lockfile Support spago.lock files
3 participants