Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow manifest compaction to generate multiple content pieces #387

Merged
merged 2 commits into from
Sep 5, 2024

Conversation

ashmrtn
Copy link
Collaborator

@ashmrtn ashmrtn commented Sep 4, 2024

Allow manifest compaction to generate multiple content pieces. This currently uses number of manifests in a content piece as a proxy for the total size the content piece will be. The number of manifests per content piece is a number out of hat that's meant to be high enough to not trigger the creation of multiple content pieces in most cases while still hopefully avoiding errors about content pieces being too large

This change is only enabled during compaction to reduce blast radius and because compaction already disables content manager flushes to ensure content writes and deletions are seen by other clients atomically (at least assuming no other content manager flushes if this returns an error)

(for completeness)
Even if content writes aren't atomic, then I think things would still work out alright if there was concurrent compaction. For example, the following could occur if content writes didn't appear atomically:

  1. client A begins compaction
  2. client A write content piece 1
  3. client B refreshes content manager and sees content piece 1
  4. client B begins compaction
  5. client B writes content pieces 3 and 4
  6. client B deletes old content pieces, including content piece 1
  7. client A writes content piece 2
  8. client A deletes old content pieces. It doesn't know about content pieces 3 and 4 because the manifest manager hasn't been refreshed since compaction started

The end result would be partial duplication of manifest data as content piece 2 would contain some info present in content pieces 3 and 4. However, content pieces 3 and 4 should still have all manifest entries present in the now-deleted content piece 1 as content piece 1 was included in client B's compaction (that's why it was deleted)

When compacting manifest data, allow persisting multiple content pieces.
This should be alright because flushes on the index are already
disabled, which means all changes should appear atomically.

Uses a max number of manifests per content piece as that's easily
available and can be used as a proxy for total content size to some
extent.
Basic tests that check lookups and number of content pieces after
various operations.
@ashmrtn ashmrtn requested review from vkamra and pandeyabs September 4, 2024 00:00
@ashmrtn ashmrtn self-assigned this Sep 4, 2024
Copy link

@pandeyabs pandeyabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@@ -18,6 +18,8 @@ import (
"github.com/kopia/kopia/repo/content/index"
)

const maxManifestsPerContent = 1000000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, how did we get to this number? Is this assuming some average size per manifest entry from data we have?

require.NoError(t, err, "compacting manifests")

foundContents = getManifestContentCount(ctx, t, mgr)
assert.Equal(t, 2, foundContents)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@@ -18,6 +18,8 @@ import (
"github.com/kopia/kopia/repo/content/index"
)

const maxManifestsPerContent = 1000000
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be useful (in a follow-up) to make this configurable using an ENV. Allows disabling this feature by setting the value to an arbitrarily high value and tuning it to a lower value if needed

@ashmrtn ashmrtn merged commit c745927 into corsostaging Sep 5, 2024
5 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants