Support lockfiles, take 2 #1138

f-f · 2023-12-10T17:23:38Z

This PR fixes #917 (and fixes #1134 as well while we're at it), adding lockfile support for all builds (solver and package sets), with no opt-out, i.e. all spago invocations that imply a build plan will now output a lockfile, or use the existing one to save work.

We also add a --pure flag to skip updating the lockfile even if it's deemed out of date.

Note: this is a breaking change for the lockfile format, but right now they are so broken that no one should be using them 🙂

thomashoneyman · 2023-12-10T17:27:09Z

Before I dive in, does this also address #1134?

f-f · 2023-12-10T17:25:53Z

spago.lock

+      needed_by:
+        - spago-bin
+        - spago
+        - docs-search-index
+        - docs-search-client-halogen


Both the workspace packages and the plan packages now include a needed_by key, so we don't even spend time computing a plan (which takes time), and instead allows for a blanket approach of "read everything in, filter for what we need, and that's our plan"

I'm not sure I'm following this — if we list the dependencies for each package (including workspace ones), what is the advantage of needed_by?

In other words what is the difference between a) building a workspace package by looking up each of its dependencies in turn vs. b) build it by filtering by dependencies that list it in their needed_by? Aren't they equivalent?

For example, lockfiles like pnpm's list dependencies at their resolved version for every package:
https://lfx.rushstack.io/pages/concepts/pnpm_lockfile

if we list the dependencies for each package (including workspace ones), what is the advantage of needed_by?

The list of dependencies is just one layer of the tree, but to get a build plan you'd have to keep doing lookups until you reach the bottom of the tree (i.e. packages with no dependencies), and this takes time (about 300ms for a package set build on my M1).
Having a needed_by allows for storing the whole plan in there, instead of a single layer, so the retrieval is immediate.

I see. Would the same be achieved by listing the full resolutions for packages instead of just their explicit listed dependencies? That way a build plan for e.g. the spago-bin package is already stored as its list of resolutions.

So you'd have a resolutions key for each workspace package listing its build plan?

Yea, each workspace package would have a dependencies key which reflects the spago.yaml file so we can detect if they've changed, and then a resolutions key based on the dependencies that specifies the plan.

Makes sense, let me try it out

f-f · 2023-12-10T17:27:55Z

src/Spago/Command/Fetch.purs

+    toArray :: forall k v. Map k v -> Array (Tuple k v)
+    toArray = Map.toUnfoldable
+  ({ packages, workspacePackages } :: LockfileBuilderResult) <-
+    Array.foldM processPackage


We used to just map the packages into the lockfile data structure, but we now use a fold to go through them so we can keep track of which workspace package needs which other package, which makes it dead easy to figure out the build plan afterwards

f-f · 2023-12-10T17:28:54Z

src/Spago/Command/Fetch.purs

+  case workspace.packageSet.lockfile of
+    -- If we have a lockfile we can just use that - we don't even need build a plan, we can just filter the packages
+    -- marked as needed by the workspace package that we are processing, as we just dumped the plan in the lockfile.


Computing the build plan when we have a lockfile is basically "filter by needed_by"

f-f · 2023-12-10T17:30:09Z

src/Spago/Config.purs

+workspacePackageToLockfilePackage :: WorkspacePackage -> Tuple PackageName Lock.WorkspaceLockPackage
+workspacePackageToLockfilePackage { path, package } = Tuple package.name
+  { path
+  , dependencies: package.dependencies
+  , test_dependencies: foldMap _.dependencies package.test
+  , needed_by: mempty -- Note: this is filled in later
+  }


These functions feel like they belong to the Lock module, but that would create a cycle, so they are stuck here. I think it's fine, not too much code

f-f · 2023-12-10T17:31:09Z

src/Spago/Config.purs

+shouldComputeNewLockfile :: { workspace :: Core.WorkspaceConfig, workspacePackages :: Map PackageName WorkspacePackage } -> Lock.WorkspaceLock -> Boolean
+shouldComputeNewLockfile { workspace, workspacePackages } workspaceLock =
+  -- the workspace packages should exactly match, except for the needed_by field, which is filled in during build plan construction
+  (map (workspacePackageToLockfilePackage >>> snd) workspacePackages == (map (_ { needed_by = mempty }) workspaceLock.packages))
+    -- and the extra packages should exactly match
+    && (fromMaybe Map.empty workspace.extra_packages == workspaceLock.extra_packages)
+    -- and the package set address needs to match - we have no way to match the package set contents at this point, so we let it be
+    && (workspace.package_set == map _.address workspaceLock.package_set)


This is the bit of logic that computes when the lockfile needs a refresh - it's a bit involved because some things are generated and we don't have them at the point where we read the lockfile in (like, we stash the whole package set in there), but hopefully it's commented enough to make sense

And dumping the package set in there is important so that we can then:

skip hashing it, since we just have the content

use it for commands like spago ls, where we can use just the lockfile as well, instead of having to look up the set, etc etc

f-f · 2023-12-10T17:41:39Z

@thomashoneyman CI fails without that patch, so I pushed the fix for #1134 as well!

…llel

spago.lock

thomashoneyman · 2023-12-11T19:25:19Z

bin/src/Main.purs

@@ -528,7 +547,7 @@ main = do
            logInfo "Set up a new Spago project."
            logInfo "Try running `spago run`"
          Fetch args -> do
-            { env, fetchOpts } <- mkFetchEnv offline args
+            { env, fetchOpts } <- mkFetchEnv offline (Record.insert (Proxy :: _ "isRepl") false args)


Any reason for the choice to use insert here vs. merge on line 562? Arbitrary?

I tend to use insert if I'm only dealing with one key, but if you prefer merge I could try to swap it in and see if that works (all the record things are quite finicky when it comes to envs, for some reason)

I don't prefer either, it just looked like you were using merge with one field below and I didn't know if there was a reason to choose one or another that emerged from your env-wrangling :)

thomashoneyman · 2023-12-11T19:27:31Z

src/Spago/Command/Build.purs

@@ -42,7 +42,7 @@ type BuildOptions =
  , jsonErrors :: Boolean
  }

-run :: forall a. BuildOptions -> Spago (BuildEnv a) Unit
+run :: BuildOptions -> Spago (BuildEnv _) Unit


What's the reason to wildcard this? Won't this produce warnings?

Same question in the other modules, like Docs below.

It will produce warnings, but we should filter them out - I added all the wildcards after chatting with @mikesol (on Discord, here) about fitting all of our Envs together, and found that using wildcards gets us better inference - I had a few puzzling errors just go away once I swapped them in. (Mike has a whole blog post about this)

thomashoneyman · 2023-12-11T19:28:59Z

src/Spago/Command/Publish.purs

-    , getMetadata: env.getMetadata
-    , workspace: env.workspace { selected = Just selected }
-    , logOptions: env.logOptions
+    -- FIXME: I think we can make it work


Is this a leftover FIXME? Or does it need to be addressed?

I was hoping to fix this one too, but I can't find a good way to make it work. So I'll think about it some more or take out this FIXME

thomashoneyman · 2023-12-11T19:32:12Z

src/Spago/Registry.purs

+type PreRegistryEnv a = Record (PreRegistryEnvRow a)
+
+type RegistryEnvRow a = PreRegistryEnvRow
+  ( getRegistry :: Spago (PreRegistryEnv ()) RegistryFunctions
+  | a
+  )


I'm not sure I follow why we have the "pre-registry" row and then the "registry" row with the extra getRegistry function. Would you mind explaining it a little?

Is it solely to allow the registry functions to refer to the "pre-registry" env themselves?

Yeah, the compiler would try to infer an infinite type if these would run in the same env, so I had to separate them

thomashoneyman · 2023-12-11T19:41:02Z

src/Spago/Config.purs

+data BuildType
+  = RegistrySolverBuild PackageMap
+  | PackageSetBuild PackageSetInfo PackageMap


Presumably this means you are either building with the solver, ie. you use dependency ranges, or you're building with a package set, ie. you don't use ranges. What happens in the hybrid cases?

A user provides ranges, but also specified a package set and extra_packages

A user provides some ranges and also specifies a package set

In the first case we have what we need for a solver build, but should we take their package set instructions into account? If not, do we warn them? For the second case is this just an error?

Presumably this means you are either building with the solver, ie. you use dependency ranges, or you're building with a package set, ie. you don't use ranges. What happens in the hybrid cases?

There's no hybrid case. If you have a package set then you are building with it, if you don't have one then you are using the solver.

We throw away the ranges when we get a build plan with the package set - see here, we have a Map PackageName Range, and call Map.keys:

spago/src/Spago/Command/Fetch.purs

Line 399 in d7128c7

PackageSetBuild _info set -> getTransitiveDepsFromPackageSet set (Array.fromFoldable $ Map.keys depsRanges)

If the project has a package set the ranges only matter when publishing, where we build with both the package set and the solver (so the ranges are checked, just not as often as in a solver build)

thomashoneyman · 2023-12-11T19:42:29Z

src/Spago/Config.purs

@@ -149,8 +148,8 @@ data Package

 -- | Reads all the configurations in the tree and builds up the Map of local
 -- | packages to be integrated in the package set
-readWorkspace :: forall a. Maybe PackageName -> Spago (Git.GitEnv a) Workspace
-readWorkspace maybeSelectedPackage = do
+readWorkspace :: Maybe PackageName -> Boolean -> Spago (Registry.RegistryEnv _) Workspace


I'm always a little leery of booleans as arguments; what do you think of either making this a record:

{ selectedPackage :: Maybe PackageName , pureBuild :: Boolean }

or a data type

data PureBuild = NotPure | Pure

Sure, either works

thomashoneyman · 2023-12-11T19:46:50Z

On light review this looks quite good; I like the introduction of the locking mechanism for access to refreshing the registry. Did you consider debouncing it as well such that we only refresh the registry at most once every N seconds? Maybe that's a premature optimization, but it's something I had to do in the registry so it came to mind.

My main comment has to do with the needed_by key vs. just storing resolutions directly as I find the latter a little easier to work with, conceptually. Once you make a decision on that I can review again and test this out.

thomashoneyman · 2023-12-11T20:20:32Z

src/Spago/Repl.purs

    -- TODO: we should look in the "other" packages first
-    Registry _other -> do
-      maybeMetadata <- runSpago { logOptions } (getMetadata supportPackageName)
+    RegistrySolverBuild _other -> do


A quick thought on solver vs. package set builds: should spago init require you to specify 'app' or 'lib' as an init type such that libraries use ranges off the bat and don't use a package set?

should spago init require you to specify 'app' or 'lib' as an init type such that libraries use ranges off the bat and don't use a package set?

We force users that publish their package to specify ranges, so I don't see a reason to enforce this upfront.

Oh, if you mean "allow users to init a project without a package set", then we already have it:

spago init --use-solver

I meant we may want to make them choose --use-solver or --use-package-sets to avoid people creating a new library with spago init and then only adding ranges once they go to publish

The memory of trying to get a Bower project to work and stepping through whole pages or solver errors in bright red ink is still fresh in me - sure, our solver is much more sensible than Bower's, but I feel the main reason the majority of the community is using package sets today is that they are just less complex to deal with and users can concentrate on actually writing the thing instead of dealing with builds. (as a user I would like that to be my experience at least, and I'm even in that small slice of the demographic that likes dealing with builds!)

So yeah, I would prefer to keep package sets as default because they are easy, and have solver builds as the "yes yes I know what I'm about to get into" opt-in option.

f-f · 2023-12-11T21:16:05Z

Did you consider debouncing it as well such that we only refresh the registry at most once every N seconds? Maybe that's a premature optimization, but it's something I had to do in the registry so it came to mind.

Not a premature optimization, it was in fact the first optimization we had around this, and got to it already in the early days of the codebase - it's very annoying to have to wait for a git pull so often.

We debounce Registry pulls so that they happen every 15 minutes at most, see here:

spago/src/Spago/Registry.purs

Lines 286 to 309 in d7128c7

    
           shouldFetchRegistryRepos :: forall a. Db -> Spago (LogEnv a) Boolean 
        
           shouldFetchRegistryRepos db = do 
        
             now <- liftEffect $ Now.nowDateTime 
        
             let registryKey = "registry" 
        
             maybeLastRegistryFetch <- liftEffect $ Db.getLastPull db registryKey 
        
             case maybeLastRegistryFetch of 
        
               -- No record, so we have to fetch 
        
               Nothing -> do 
        
                 logDebug "No record of last registry pull, will fetch" 
        
                 liftEffect $ Db.updateLastPull db registryKey now 
        
                 pure true 
        
               -- We have a record, so we check if it's old enough 
        
               Just lastRegistryFetch -> do 
        
                 let staleAfter = Minutes 15.0 
        
                 let (timeDiff :: Minutes) = DateTime.diff now lastRegistryFetch 
        
                 let isOldEnough = timeDiff > staleAfter 
        
                 -- We check if it's old, but also if we have it at all 
        
                 registryExists <- FS.exists Paths.registryPath 
        
                 if isOldEnough || not registryExists then do 
        
                   logDebug "Registry is old, refreshing" 
        
                   liftEffect $ Db.updateLastPull db registryKey now 
        
                   pure true 
        
                 else do 
        
                   pure false

f-f · 2023-12-18T15:47:28Z

@thomashoneyman ready for a look again

thomashoneyman · 2023-12-18T18:28:47Z

Looks like the lockfile may need to be regenerated, according to this CI failure?

thomashoneyman · 2023-12-18T18:56:40Z

Looks pretty good!

I did find that if I install a package it is added to my spago.yaml config, but it isn't added to the lockfile and I always get a message that I'm missing a lockfile even though I do have one.

On first install:

➜ spago install -p bin aff-bus
Reading Spago workspace configuration...

✅ Selecting package to build: bin

Adding 1 packages to the config in bin/spago.yaml
No lockfile found, generating one...
Lockfile written to spago.lock. Please commit this file.
Downloading dependencies...
Building...
           Src   Lib   All
Warnings     0     0     0
Errors       0     0     0

✅ Build succeeded.

This adds to the spago.yaml but not to the spago.lock. Then, on a re-run of the install command:

➜ spago install -p bin aff-bus
Reading Spago workspace configuration...

✅ Selecting package to build: bin

⚠️ You tried to install some packages that are already present in the configuration, proceeding anyways:
  - aff-bus
Adding 1 packages to the config in bin/spago.yaml
No lockfile found, generating one...
Lockfile written to spago.lock. Please commit this file.
Downloading dependencies...
Fetching package aff-bus@6.0.0
Building...

Now I do see it added in the spago.lock file too. The same seems to happen with uninstalling. If I uninstall aff-bus now, I see this message:

➜ spago uninstall -p bin aff-bus
Reading Spago workspace configuration...

✅ Selecting package to build: bin

Removing the following source dependencies:
  aff-bus

The lockfile isn't updated, though. If I try again:

➜ spago uninstall -p bin aff-bus
Reading Spago workspace configuration...

✅ Selecting package to build: bin

⚠️ The following packages cannot be uninstalled because they are not declared in the package's source dependencies:
  aff-bus
The package config for bin was not updated.

The lockfile is still not updated and lists aff-bus as one of my package's dependencies even though that's not true, as reflected in the spago.yaml

Note: this is with spago@0.93.21, so not the very latest of this branch, so it's possible this was fixed along the way.

thomashoneyman · 2023-12-18T19:04:13Z

Also, it seems that if I run spago build in the case my lockfile is incorrect (out of date, missing, etc.) then my workspaces are built. Then, if I run spago build again I expect a no-op, but instead my project sources and anything in extra_packages seem to get recompiled a second time. Is there some sort of cache issue there?

f-f · 2023-12-19T08:59:40Z

Looks like the lockfile may need to be regenerated, according to this CI failure?

Eh, the patch includes a change that moves from a fatal error to a warning (if we can't decode the file we just move on and make a new one) - I published a new version (0.93.22), but CI doesn't seem to be picking that one up?

f-f · 2023-12-19T09:02:08Z

Also, it seems that if I run spago build in the case my lockfile is incorrect (out of date, missing, etc.) then my workspaces are built. Then, if I run spago build again I expect a no-op, but instead my project sources and anything in extra_packages seem to get recompiled a second time. Is there some sort of cache issue there?

I've seen this one happen, but since a long time, so I don't think the lockfile is involved

f-f · 2023-12-19T09:03:07Z

I did find that if I install a package it is added to my spago.yaml config, but it isn't added to the lockfile and I always get a message that I'm missing a lockfile even though I do have one.

The lockfile is still not updated and lists aff-bus as one of my package's dependencies even though that's not true, as reflected in the spago.yaml

I'll have a look at these - we have a test that should test this, but it's likely not thorough enough

thomashoneyman · 2023-12-19T15:36:36Z

It doesn't look like 0.93.22 ever made it to NPM:
https://registry.npmjs.org/spago

f-f · 2023-12-19T15:39:16Z

It doesn't look like 0.93.22 ever made it to NPM

Ah that must be it then! I'll push a new one once I manage to patch the holes you found 😄

…talling as well

f-f · 2023-12-20T17:45:50Z

@thomashoneyman I should have addressed everything now - pushed 0.93.22 with the latest changes, and added a test that checks the lockfile when installing/uninstalling (the scenarios you tried). The patches also fixed up a few other fixtures, which I am happy about

f-f added 2 commits December 10, 2023 01:15

Add lockfile support for all builds, no opt-out

df9510a

Add tests for lockfile

9672eba

f-f requested review from JordanMartinez and thomashoneyman December 10, 2023 17:23

f-f commented Dec 10, 2023

View reviewed changes

Fix #1134: do not require the registry repo when the lockfile is present

af61cf6

f-f added 2 commits December 10, 2023 21:11

Add a lock for when multiple fibers try to fetch the registry in para…

8103ab8

…llel

Fix windows test

7776923

natefaubion reviewed Dec 10, 2023

View reviewed changes

spago.lock Outdated Show resolved Hide resolved

Don't fail on old format lockfiles

d7128c7

thomashoneyman reviewed Dec 11, 2023

View reviewed changes

Address review feedback

7d30c49

f-f added 2 commits December 20, 2023 18:17

Write lockfile at the right time for the right reason, and when unins…

3075b14

…talling as well

Make strict happy

957244c

thomashoneyman approved these changes Dec 20, 2023

View reviewed changes

f-f merged commit ebde68c into master Dec 20, 2023
3 checks passed

f-f deleted the finish-lockfile-support branch December 20, 2023 21:37

f-f mentioned this pull request Jan 12, 2024

Fix missing git dependencies when building monorepos #1161

Merged

f-f mentioned this pull request Feb 9, 2024

Fix issue with git clone: tags would get pulled twice due to the lockfile storing hashes #1186

Merged

Support lockfiles, take 2 #1138

Support lockfiles, take 2 #1138

Conversation

f-f commented Dec 10, 2023 • edited Loading

thomashoneyman commented Dec 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

f-f Dec 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

f-f Dec 10, 2023 • edited Loading

Choose a reason for hiding this comment

f-f commented Dec 10, 2023

Choose a reason for hiding this comment

f-f Dec 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomashoneyman Dec 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomashoneyman Dec 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

f-f Dec 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomashoneyman commented Dec 11, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

f-f Dec 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

f-f commented Dec 11, 2023 • edited Loading

f-f commented Dec 18, 2023

thomashoneyman commented Dec 18, 2023

thomashoneyman commented Dec 18, 2023 • edited Loading

thomashoneyman commented Dec 18, 2023

f-f commented Dec 19, 2023 • edited Loading

f-f commented Dec 19, 2023

f-f commented Dec 19, 2023

thomashoneyman commented Dec 19, 2023

f-f commented Dec 19, 2023

f-f commented Dec 20, 2023

f-f commented Dec 10, 2023 •

edited

Loading

f-f Dec 10, 2023 •

edited

Loading

f-f Dec 10, 2023 •

edited

Loading

f-f Dec 11, 2023 •

edited

Loading

thomashoneyman Dec 11, 2023 •

edited

Loading

thomashoneyman Dec 11, 2023 •

edited

Loading

f-f Dec 11, 2023 •

edited

Loading

f-f Dec 11, 2023 •

edited

Loading

f-f commented Dec 11, 2023 •

edited

Loading

thomashoneyman commented Dec 18, 2023 •

edited

Loading

f-f commented Dec 19, 2023 •

edited

Loading