-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow manifest compaction to generate multiple content pieces #387
Conversation
When compacting manifest data, allow persisting multiple content pieces. This should be alright because flushes on the index are already disabled, which means all changes should appear atomically. Uses a max number of manifests per content piece as that's easily available and can be used as a proxy for total content size to some extent.
Basic tests that check lookups and number of content pieces after various operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@@ -18,6 +18,8 @@ import ( | |||
"github.com/kopia/kopia/repo/content/index" | |||
) | |||
|
|||
const maxManifestsPerContent = 1000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, how did we get to this number? Is this assuming some average size per manifest entry from data we have?
require.NoError(t, err, "compacting manifests") | ||
|
||
foundContents = getManifestContentCount(ctx, t, mgr) | ||
assert.Equal(t, 2, foundContents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
@@ -18,6 +18,8 @@ import ( | |||
"github.com/kopia/kopia/repo/content/index" | |||
) | |||
|
|||
const maxManifestsPerContent = 1000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be useful (in a follow-up) to make this configurable using an ENV. Allows disabling this feature by setting the value to an arbitrarily high value and tuning it to a lower value if needed
Allow manifest compaction to generate multiple content pieces. This currently uses number of manifests in a content piece as a proxy for the total size the content piece will be. The number of manifests per content piece is a number out of hat that's meant to be high enough to not trigger the creation of multiple content pieces in most cases while still hopefully avoiding errors about content pieces being too large
This change is only enabled during compaction to reduce blast radius and because compaction already disables content manager flushes to ensure content writes and deletions are seen by other clients atomically (at least assuming no other content manager flushes if this returns an error)
(for completeness)
Even if content writes aren't atomic, then I think things would still work out alright if there was concurrent compaction. For example, the following could occur if content writes didn't appear atomically:
The end result would be partial duplication of manifest data as content piece 2 would contain some info present in content pieces 3 and 4. However, content pieces 3 and 4 should still have all manifest entries present in the now-deleted content piece 1 as content piece 1 was included in client B's compaction (that's why it was deleted)