-
Notifications
You must be signed in to change notification settings - Fork 32
Consistent Region Support for Messaging Toolkit
This documents describes the design of consistent region for each of the operators in the Messaging Toolkit.
- KafkaProducer will not be allowed at the beginning of a consistent region
[Samantha:] Is the operator doing anything special on drain, checkpoint, reset and resetToInitialState? What can the user expect from message delivery perspective ?
- KafkaConsumer is being developed to be allowed in a consistent region. We will use a low level Kafka API and support replaying of data for consistent regions. This is only being developed to support single-partition/single-topic operators. Multi-partition topics will be consumed using multiple operators, or more simply with user defined parallelism (UDP).
On Drain: No action necessary
On Checkpoint: The operator saves the offset within the message log that we are currently reading from.
On Reset: Use saved offset to begin reading from that location in the message log.
On ResetToInit: the operator restores value of initial offset from a file that is written when the operator is first initialized. If that file is no longer available, start reading data from the most recent offset.
-
JMSSource cannot participate in a consistent region. Customers are expected to use ReplayableStart from the standard toolkit to prevent loss of tuples.
-
JMSSink is stateless and can participate in a consistent region.
MQTTSink Operator
Limitations:
- MQTTSink operator can not be placed at start of a consistent region.
- MQTTSink operator does not support tuple replay for control port. The operator will issue warning messages to user about control port behavior.
On Drain: The operator will flush all tuples remaining in its internal buffer to the remote messaging server.
On Checkpoint: The operator saves states of qos, serverUri, topics attributes.
On Reset: The operator restores the attribute states saved in the provided checkpoint, i.e previously saved checkpoint.
On ResetToInit: the operator restores value of qos, serverUri and topics to its initial state.
With this approach, the MQTTSink operator may expect the following behavior when it participates in a consistent region.
- Messages with qos=1 and 2 will be delivered to messaging server at least once. Duplicate tuples is expected. (Does this mean we are droping support of qos=0, also allowing duplicate tuples will break qos=2 protocol.)
Samantha: No, it does not mean we are dropping support for qos=0. It means that user must set qos=1, 2 to guaranteed message delivery of at least once from Streams.
- MQTTSink operator does not support control port tuple restore, user may want to replay control signals after a reset.
MQTTSource Operator:
- This operator can NOT participate in a consistent region.
- User should use ReplayableStart operator after MQTTSource operator to achieve tuple replay.
- Since this operator has internal buffer, a new attribute will be introduced allowing user to control size of internal buffer. In case of having replayableStart operator after it, setting size of internal buffer to 1 will help avoiding losing tuples in case of reset.
-
XMSSource cannot participate in a consistent region. Customers are expected to use ReplayableSource from the standard toolkit to prevent loss of tuples.
-
XMSSink is stateless and can participate in a consistent region.