Skip to content

Consistent Region Support for Messaging Toolkit

Alex-Cook4 edited this page Jul 22, 2015 · 14 revisions

This documents describes the design of consistent region for each of the operators in the Messaging Toolkit.

Kafka

  • KafkaProducer will not be allowed at the beginning of a consistent region

[Samantha:] Is the operator doing anything special on drain, checkpoint, reset and resetToInitialState? What can the user expect from message delivery perspective ?

  • KafkaConsumer is being developed to be allowed in a consistent region. We will use a low level Kafka API and support replaying of data for consistent regions. This is only being developed to support single-partition/single-topic operators. Multi-partition topics will be consumed using multiple operators, or more simply with user defined parallelism (UDP).

On Drain: No action necessary

On Checkpoint: The operator saves the offset within the message log that we are currently reading from.

On Reset: Use saved offset to begin reading from that location in the message log.

On ResetToInit: the operator restores value of initial offset from a file that is written when the operator is first initialized. If that file is no longer available, start reading data from the most recent offset.

JMS

  • JMSSource cannot participate in a consistent region. Customers are expected to use ReplayableStart from the standard toolkit to prevent loss of tuples.

  • JMSSink is stateless and can participate in a consistent region.

MQTT

MQTTSink Operator

Limitations:

  1. MQTTSink operator can not be placed at start of a consistent region.
  2. MQTTSink operator does not support tuple replay for control port. The operator will issue warning messages to user about control port behavior.

On Drain: The operator will flush all tuples remaining in its internal buffer to the remote messaging server.

On Checkpoint: The operator saves states of qos, serverUri, topics attributes.

On Reset: The operator restores the attribute states saved in the provided checkpoint, i.e previously saved checkpoint.

On ResetToInit: the operator restores value of qos, serverUri and topics to its initial state.

With this approach, the MQTTSink operator may expect the following behavior when it participates in a consistent region.

  • Messages with qos=1 and 2 will be delivered to messaging server at least once. Duplicate tuples is expected. (Does this mean we are droping support of qos=0, also allowing duplicate tuples will break qos=2 protocol.)

Samantha: No, it does not mean we are dropping support for qos=0. It means that user must set qos=1, 2 to guaranteed message delivery of at least once from Streams.

  • MQTTSink operator does not support control port tuple restore, user may want to replay control signals after a reset.

MQTTSource Operator:

  1. This operator can NOT participate in a consistent region.
  2. User should use ReplayableStart operator after MQTTSource operator to achieve tuple replay.
  3. Since this operator has internal buffer, a new attribute will be introduced allowing user to control size of internal buffer. In case of having replayableStart operator after it, setting size of internal buffer to 1 will help avoiding losing tuples in case of reset.

XMS

  • XMSSource cannot participate in a consistent region. Customers are expected to use ReplayableSource from the standard toolkit to prevent loss of tuples.

  • XMSSink is stateless and can participate in a consistent region.

Clone this wiki locally