Replay testing #300

jeffschoner · 2024-05-28T03:37:08Z

Summary

Adds a ReplayTester like the replayer abstractions in official SDKs. This allows workflow histories to be replayed against code for testing purposes to detect non-determinism before a change goes to production.

This consists of a set of smaller changes. These can be reviewed commit by commit:

Specializes workflow event targets by making a mismatch between workflow completion, failure, or continue as new non-deterministic, whereas previously these were all considered equivalent. Moreover, this now catches non-determinism cases where a workflow prematurely ends on replay.
Adds methods for downloading and working with histories in JSON and protobuf binary format
The ReplayTester abstraction itself
An example replay test for SignalWithStartWorkflow and a script to update its history that demonstrates collecting histories in JSON and protobuf binary
Inclusion of fiber backtraces on failure so that the error message on replay failure is more expected and useful
An option to log while replaying workflows. This is used by the replay tester to emit logs while running to provide better debuggability of tests, but it can be used in other circumstances as well

Testing

New unit specs for the replay tester
Existing specs have been cleaned up, mostly around workflow event target specialization
An example replay test

drewhoskins-temporal · 2024-06-08T00:09:33Z

lib/temporal/testing/replay_tester.rb

+      end
+
+      # Runs a replay test on a JSON string directly. See comment on replay_history_json_file more details.
+      def replay_history_json(workflow_class, json)


Feels a bit more composable to have just one typesafe replay_history method which takes a History and then have some helper methods to convert from proto/json/files to History
Was perusing the python version, might be worth looking for insights there as well.

It's not as close as it could be because temporal-ruby already has its own representation of workflow histories that I didn't want to refactor, but I was able to bring its more composable API shape over here

drewhoskins-temporal · 2024-06-08T00:14:24Z

lib/temporal/testing/replay_tester.rb

+          attempt: 1,
+          workflow_run_id: 'run_id',
+          workflow_id: 'workflow_id',
+          workflow_name: history.find_event_by_id(1).attributes.workflow_type.name


nit: Could probably use a comment or assertion that this is the event you expect.

drewhoskins-temporal · 2024-06-08T00:21:27Z

lib/temporal/testing/replay_tester.rb

+      #
+      # If the pretty_print optional parameter is set to true, it outputs in a more human
+      # readable form on output.
+      def self.correct_event_types(text, pretty_print: true)


Is this because Stripe is on an old version of the Java SDK? Don't see this function used anywhere. Should it be tested? Or maybe just done on the stripe fork?

I believe this originally started with older versions of tctl. Unfortunately, we standardized our internal tooling on this camel case enum style rather than the standard screaming snake case version. It probably does make more sense to leave it out of here, as any new histories collected using current Temporal UI/CLI or these new APIs won't need to rely on this. I agree if it's kept, it should be tested.

drewhoskins-temporal · 2024-06-08T00:26:03Z

lib/temporal/testing/replay_tester.rb

+        # Duplicate the configuration so that this doesn't interfere with other tests in the
+        # same process that are not replay tests
+        @config = config.dup.tap do |c|
+          # Otherwise, replay tests will produce no logs. This helps with test debugging.


Interesting idea. I like it, and I couldn't find evidence that the temporal SDKs do this.
I think people would want this in tests, but perhaps not if doing safe-rollout deploys (might be scary to see redundant logs during production rollouts). I wonder if this should be configurable.

Could be a nice option to have during testing, certainly, but yeah I'd make it configurable defaulting to off.

drewhoskins-temporal · 2024-06-08T00:27:14Z

lib/temporal/workflow/executor.rb

@@ -33,7 +33,7 @@ def initialize(workflow_class, history, task_metadata, config, track_stack_trace

      def run
        dispatcher.register_handler(
-          History::EventTarget.workflow,
+          History::EventTarget.start_workflow,


Curious what's going on here. I ask because it's pretty central code.

The non-determinism checking that happens in discard_command only considers the "target" of a command/event and only includes it and the event ID in its error messages. Prior to this change, any operation affecting a workflow had the same target and therefore the same target in any error messages. I've specialized these targets to individual ones for starting, completing, failing and continuing as new. This comes with two major benefits.

First, the non-determinism checking now applies between these different cases. This means a history that succeeded but on replay tries to fail or continue as new will now be considered non-deterministic while previously this would have been acceptable

Second, the error message will now differentiate between these cases rather than a generic "workflow" error that may be non-obvious to the user. E.g., this error message:

Unexpected command. The replaying code is issuing: (23, workflow) but the history of previous executions recorded: (23, activity)

becomes something like,

Unexpected command. The replaying code is issuing: (23, fail_workflow) but the history of previous executions recorded: (23, activity)

On top of that, we now call discard_command in the handlers for these workflow ending events, which means we do non-determinism checking on workflow end. This is important in catching cases where a workflow history ends but on replay the code keeps running.

This .particular change is simply a consequence of that specialization of event targets. I could have left it in place, but felt it would be more confusing to see a workflow target only for starting workflows, while the other cases all have their own targets.

jeffschoner · 2024-06-12T13:47:07Z

I still plan to make some of the suggested improvements, but marking this as "ready to review" to make clear it's in a reviewable state.

drewhoskins-temporal · 2024-06-12T16:54:50Z

spec/unit/lib/temporal/testing/replay_tester_spec.rb

+    )
+  end
+
+  it 'replay continues an new when history succeeded' do


*as new
s/succeeded/completed/

jeffschoner · 2024-06-14T23:51:18Z

@drewhoskins-temporal @Sushisource Thanks for the feedback. I've incorporated your suggestions

@DeRauk Ready for your review too

DeRauk · 2024-06-24T15:04:52Z

This is really great, thank you @jeffschoner!

Specialize workflow event targets

182307e

jeffschoner force-pushed the replay-testing branch from 790166f to a506122 Compare May 29, 2024 22:23

jeffschoner added 7 commits May 29, 2024 15:30

Methods for downloading histories

1098dcd

Replay tester

fb9b05e

Basic replay tester unit tests

d19f31a

Add example replay test with history file

d51fcec

Add workflow stack trace to replay error

b84ab2a

Dynamically load replay state in workflow logger

a7fab7c

Log when replaying in replay tests

1889fb0

jeffschoner force-pushed the replay-testing branch from a506122 to 1889fb0 Compare May 29, 2024 22:30

jeffschoner added 7 commits June 4, 2024 15:03

Use binpb extension for protobuf biniaries

a8f87fa

Better comments about logging during replay

bd22318

Simplify file -> bytes read calls

26c05a4

Fix comment typos

f936645

More ergonomic replaying callback

5c9b65e

Improve ReplayTesterError, rubyfmt spec

95bc9c3

Remove extra commands check

ad83f73

drewhoskins-temporal reviewed Jun 8, 2024

View reviewed changes

jeffschoner marked this pull request as ready for review June 12, 2024 13:46

jeffschoner added 3 commits June 12, 2024 08:35

Don't default to logging in replay tests

ba1ca2c

Check history starts correctly

4d0f913

Remove correct_event_types

cb9d2bb

drewhoskins-temporal reviewed Jun 12, 2024

View reviewed changes

jeffschoner added 4 commits June 14, 2024 15:59

Refactor to more composable API

9ba773e

Use real namespace from configuration

1355a37

rubyfmt

02c16f4

Fix test name typo

cab8e1a

jeffschoner force-pushed the replay-testing branch from 592b738 to cab8e1a Compare June 14, 2024 23:27

DeRauk approved these changes Jun 24, 2024

View reviewed changes

DeRauk merged commit 5d12aa3 into coinbase:master Jun 24, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replay testing #300

Replay testing #300

jeffschoner commented May 28, 2024

drewhoskins-temporal Jun 8, 2024

jeffschoner Jun 14, 2024

drewhoskins-temporal Jun 8, 2024

drewhoskins-temporal Jun 8, 2024

jeffschoner Jun 12, 2024

drewhoskins-temporal Jun 8, 2024

Sushisource Jun 10, 2024

drewhoskins-temporal Jun 8, 2024

jeffschoner Jun 12, 2024

jeffschoner commented Jun 12, 2024

drewhoskins-temporal Jun 12, 2024

jeffschoner commented Jun 14, 2024

DeRauk commented Jun 24, 2024

Replay testing #300

Replay testing #300

Conversation

jeffschoner commented May 28, 2024

Summary

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffschoner commented Jun 12, 2024

Choose a reason for hiding this comment

jeffschoner commented Jun 14, 2024

DeRauk commented Jun 24, 2024