-
Notifications
You must be signed in to change notification settings - Fork 177
WildFly Transaction Recovery on OpenShift
This document aims to summarize ideas and implementation details on Narayana transaction recovery used as WildFly component on OpenShift.
OpenShift stops pods arbitrarily (app server is containerized and deployed inside of a pod), happening on Pod
rescheduling, node failures, networking issues. When the app server is stopped/killed there could be left some unfinished XA transaction records (the recovery bothers only with XA transactions as the local transactions are maintained in memory).
The XA transaction consist of a prepare and commit phases. On prepare
a record is persisted in Narayana object store and in remote resource transaction log (it could be a database, JMS broker…). When the XA transaction is stopped then the resource hangs in prepare state. Such a pending transaction record may block other actions to be processed at data (for example database locks a row which cannot be updated until the pending transaction record is finished/removed). The usual resolution is to start the WildFly once again and the recovery manager (a component of Narayana) evaluates the existence of the records (in Narayana object store and in database) and commits the XA transaction, removing the record in database.
For Narayana recovery manager being able to correctly decide to commit or to rollback and to process with the decision the following thigs are required
-
access to the Narayana object store
-
ability to contact the remote resource (database)
-
ensuring unique identity of a started Narayana manager instance
-
ensuring stable hostname/IP address (when remote transaction context propagation is in play)
Note
|
Narayana (current version |
It’s the Narayana (the transaction manager and the recovery manager) that is the arbiter deciding to commit or to rollback. If the Narayana object store disappears (is deleted) then there is no entry to discover the correct XA transaction outcome.
If the Narayana object store and the remote resource transaction log contains the record about the XA transaction then the XA transaction will be committed. If only the Narayana object store contains the record then warning is emitted into server.log
. If only the remote resource transaction log contains the record then the transaction is rolled-back.
Ability to contact remote resource defines if the remote resource log store will be cleared and if the data changes, that could block other operations in the resources, are finished (and locks are released).
Identity of the recovery manager ensures that the recovery manager does not roll-back records in the resource transaction log when it was not the creator of the record. Let’s imagine situation that two Narayana manager accesses the same database. First manager creates an XA transaction and processes with the prepare. At that time the first Narayana object store and the remote resource transaction log contains a prepared (in-doubt) transaction record. If everything goes in order then the first Narayana commits the XA transaction. But if in the same time the second Narayana recovery manager accesses the transaction log of the database it can see there is an unfinished prepared record. The second Narayana can see it knows nothing about such XA transaction (no record exists in the second Narayana object store) and it may command to rollback it. That way we got an inconsistency.
For this would not happen the recovery manager is permitted to roll-back the record only when the transaction identifier
matches. When WildFly is started, it sets the transaction identifier. This identifier is baked into every transaction record when saved. Recovery manager then loads all in-doubt records from the remote resource and filters only those which matches the node identifier.
When the transaction context is propagated to a JVM on a different node then network connection is established. At commit time the remote JVM is asked to finish the transaction. The remote node cannot decide on its own, it should wait until the originator of the transaction decides on the outcome.
When the transaction is started the WildFly saves data on what’s the remote address of the remote JVM. When recovery manager gets into work, it loads that information and connects there. When the remote JVM changes its hostname then the recovery manager is not able to contact the remote JVM.
When the pod should be removed we need to be able to access the Narayana object store, have the same configuration to be able to connect to remote resource, be aware of node identifier
and in case run the pod with with stable hostname/IP address.
Recovery manager is then capable to clear the Narayana object store and the remote resource transaction logs.
If done and all is clear then the pod and the storage can be destroyed.
We have two different implementations of the process above.
-
OpenShift 3.x, it’s a bash and python scripts that are part of the
s2i
scripts -
Openshift 4.x, it’s part of the WildFly Operator, in golang
The solution for OpenShift 3.x is based on a template (https://github.com/jboss-container-images/jboss-eap-7-openshift-image/blob/7.2.x/templates/eap72-tx-recovery-s2i.json) which configures two DeploymentConfig`s. First `DeploymentConfig
configures the application and the second DeploymentConfig
configures a recovery pod (a migration pod) which runs alongside of the application.
Important prerequisite is that both DeploymentConfigs
(all started pods) have access to the shared Narayana object store. That could be a database or a shared PersistentVolume
(the volume can be read and written by any pod).
The high level process is that when the DeploymentConfig
starts a pod the pod saves a descriptor with it’s node identifier
at shared volume. When it is scaled-down then the application pod is removed. The alongside running recovery pod detects orphaned store. Detection works on contacting Kubernetes/OpenShift API, listing all running pods. The recovery pod finds the node identifer
from the orphaned descriptor. The recovery pod starts the application server capable to contact the remote resources (prerequisite: the second DeploymentConfig
has the same configuration as the first DeploymentConfig
). The app server will be started and recovery will be forced to finish in-doubt transactions.
WARN: the limitation is that transactions which run the remote transaction context propagation cannot be safely recovered
-
Design document: https://docs.google.com/document/d/1p6IAt0ocaEaepsXtNXpbAmiMMioQgoZr65thYrlkZ2I/edit?ts=5f5a9a38#*
-
Example template with two
DeploymentConfigs
: https://github.com/jboss-container-images/jboss-eap-7-openshift-image/blob/7.3.x/templates/eap73-openjdk11-tx-recovery-s2i.json -
s2i script which launch the app server: https://github.com/jboss-container-images/jboss-eap-modules/blob/7.3.x/jboss/container/eap/launch/added/openshift-launch.sh#L56
-
s2i recovery script: https://github.com/jboss-container-images/jboss-eap-modules/blob/7.3.x/os-eap-txnrecovery/bash/added/partitionPV.sh
-
to build the EAP image: https://docs.google.com/document/d/123lvasGDg65KBfRW1G_HC261uQ6QVjIWZ_R1V0iDrKw/edit#heading=h.4b26xmtt6uhd
-
work with MiniShift (3.x localy): https://github.com/jbosstm/narayana/wiki/Notes-on-CRC-and-Minishift
For the OpenShift 4.x was taken a little bit different approach. We get rid of the need for shared location where Narayana object store is saved. We removed the need to have an additional pod for every application. We address the need for the hostname stability for EJB remoting context propagation.
The recovery is driven by WildFly Operator. Any application which needs the recovery scaledown processing has to be started with it. The pod controller is StatefulSet
now. It provides persistence for volumes and network identity for the pod.
In difference from the handling in 3.x the scale down recovery processing is done on the living application pod. The application pod is not permitted to be removed from the Kubernetes/OpenShift cluster until all transactions are cleared.
The process works in the similar fashion though. All the handling is controlled by WildFly Operator. When the scaledown is requested (oc patch wildflyserver quickstart -p '[{"op":"replace", "path":"/spec/replicas", "value":0}]' --type json
) then the Operator starts the scaledown verification which checks if all transactions are cleared. If the Narayana object store is clean and the WFTC directory is clean (WFTC processes the remote EJB transactions) then the Operator asks StatefulSet
controller to scaled down the pod. When Narayana object store is not clean then the Operator forces to run recovery to finish the in-doubt transactions. During the time of the recovery is processing the pod is removed from the {{Service}} and does not accept any new requests.
-
Analysis of the approach: https://github.com/wildfly/wildfly-proposals/blob/main/cloud/CLOUD-2262.adoc
-
WildFly Operator recovery code: https://github.com/wildfly/wildfly-operator/blob/0.5.1/pkg/controller/wildflyserver/transaction_recovery.go
-
EAP QE testing: https://gitlab.mw.lab.eng.bos.redhat.com/jbossqe-eap/openshift-eap-tests/-/tree/master/test-eap/src/test/java/com/redhat/xpaas/eap/xa
-
Quickstart on this topic: https://github.com/wildfly/quickstart/tree/23.0.0.Final/ejb-txn-remote-call