Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replay testing #300

Merged
merged 22 commits into from
Jun 24, 2024
Merged

Replay testing #300

merged 22 commits into from
Jun 24, 2024

Conversation

jeffschoner
Copy link
Contributor

Summary

Adds a ReplayTester like the replayer abstractions in official SDKs. This allows workflow histories to be replayed against code for testing purposes to detect non-determinism before a change goes to production.

This consists of a set of smaller changes. These can be reviewed commit by commit:

  • Specializes workflow event targets by making a mismatch between workflow completion, failure, or continue as new non-deterministic, whereas previously these were all considered equivalent. Moreover, this now catches non-determinism cases where a workflow prematurely ends on replay.
  • Adds methods for downloading and working with histories in JSON and protobuf binary format
  • The ReplayTester abstraction itself
  • An example replay test for SignalWithStartWorkflow and a script to update its history that demonstrates collecting histories in JSON and protobuf binary
  • Inclusion of fiber backtraces on failure so that the error message on replay failure is more expected and useful
  • An option to log while replaying workflows. This is used by the replay tester to emit logs while running to provide better debuggability of tests, but it can be used in other circumstances as well

Testing

  • New unit specs for the replay tester
  • Existing specs have been cleaned up, mostly around workflow event target specialization
  • An example replay test

end

# Runs a replay test on a JSON string directly. See comment on replay_history_json_file more details.
def replay_history_json(workflow_class, json)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels a bit more composable to have just one typesafe replay_history method which takes a History and then have some helper methods to convert from proto/json/files to History
Was perusing the python version, might be worth looking for insights there as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not as close as it could be because temporal-ruby already has its own representation of workflow histories that I didn't want to refactor, but I was able to bring its more composable API shape over here

attempt: 1,
workflow_run_id: 'run_id',
workflow_id: 'workflow_id',
workflow_name: history.find_event_by_id(1).attributes.workflow_type.name

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could probably use a comment or assertion that this is the event you expect.

#
# If the pretty_print optional parameter is set to true, it outputs in a more human
# readable form on output.
def self.correct_event_types(text, pretty_print: true)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because Stripe is on an old version of the Java SDK? Don't see this function used anywhere. Should it be tested? Or maybe just done on the stripe fork?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this originally started with older versions of tctl. Unfortunately, we standardized our internal tooling on this camel case enum style rather than the standard screaming snake case version. It probably does make more sense to leave it out of here, as any new histories collected using current Temporal UI/CLI or these new APIs won't need to rely on this. I agree if it's kept, it should be tested.

# Duplicate the configuration so that this doesn't interfere with other tests in the
# same process that are not replay tests
@config = config.dup.tap do |c|
# Otherwise, replay tests will produce no logs. This helps with test debugging.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. I like it, and I couldn't find evidence that the temporal SDKs do this.
I think people would want this in tests, but perhaps not if doing safe-rollout deploys (might be scary to see redundant logs during production rollouts). I wonder if this should be configurable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a nice option to have during testing, certainly, but yeah I'd make it configurable defaulting to off.

@@ -33,7 +33,7 @@ def initialize(workflow_class, history, task_metadata, config, track_stack_trace

def run
dispatcher.register_handler(
History::EventTarget.workflow,
History::EventTarget.start_workflow,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what's going on here. I ask because it's pretty central code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-determinism checking that happens in discard_command only considers the "target" of a command/event and only includes it and the event ID in its error messages. Prior to this change, any operation affecting a workflow had the same target and therefore the same target in any error messages. I've specialized these targets to individual ones for starting, completing, failing and continuing as new. This comes with two major benefits.

First, the non-determinism checking now applies between these different cases. This means a history that succeeded but on replay tries to fail or continue as new will now be considered non-deterministic while previously this would have been acceptable

Second, the error message will now differentiate between these cases rather than a generic "workflow" error that may be non-obvious to the user. E.g., this error message:

Unexpected command. The replaying code is issuing: (23, workflow) but the history of previous executions recorded: (23, activity)

becomes something like,

Unexpected command. The replaying code is issuing: (23, fail_workflow) but the history of previous executions recorded: (23, activity)

On top of that, we now call discard_command in the handlers for these workflow ending events, which means we do non-determinism checking on workflow end. This is important in catching cases where a workflow history ends but on replay the code keeps running.

This .particular change is simply a consequence of that specialization of event targets. I could have left it in place, but felt it would be more confusing to see a workflow target only for starting workflows, while the other cases all have their own targets.

@jeffschoner jeffschoner marked this pull request as ready for review June 12, 2024 13:46
@jeffschoner
Copy link
Contributor Author

I still plan to make some of the suggested improvements, but marking this as "ready to review" to make clear it's in a reviewable state.

)
end

it 'replay continues an new when history succeeded' do

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*as new
s/succeeded/completed/

@jeffschoner
Copy link
Contributor Author

@drewhoskins-temporal @Sushisource Thanks for the feedback. I've incorporated your suggestions

@DeRauk Ready for your review too

@DeRauk
Copy link
Contributor

DeRauk commented Jun 24, 2024

This is really great, thank you @jeffschoner!

@DeRauk DeRauk merged commit 5d12aa3 into coinbase:master Jun 24, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants