@@ -50,17 +50,19 @@ financial trading.
5050
5151A data streaming application consists of two layers: the storage layer
5252and the processing layer. As stream storage, AWS offers the managed
53- service Kinesis Data Streams, but you can also run other stream storages
54- like [ Apache Kafka] ( https://kafka.apache.org/ ) or [ Apache
53+ services Kinesis Data Streams and [ Amazon Managed Streaming for Apache]
54+ (https://aws.amazon.com/msk/)Kafka (Amazon MSK), but you can also run
55+ stream storageslike [ Apache Kafka] ( https://kafka.apache.org/ ) or [ Apache
5556Flume] ( https://flume.apache.org/ ) on [ Amazon Elastic Compute
5657Cloud] ( http://aws.amazon.com/ec2 ) (Amazon EC2) or [ Amazon
57- EMR] ( http://aws.amazon.com/emr ) . The processing layer consumes the
58- data from the storage layer and runs computations on that data. This
59- could be your own application that can consume data from the stream, or
60- you use a stream processing framework like Apache Flink, Apache Spark
61- Streaming, or Apache Storm. For this post, we use Kinesis Data Streams
62- as the storage layer and the containerized KCL application on AWS
63- Fargate as the processing layer.
58+ EMR] ( http://aws.amazon.com/emr ) . The processing layer consumes the data
59+ from the storage layer and runs computations on that data. This could be
60+ an Apache Flink application running fully managed on [ Amazon Kinesis
61+ Analytics for Apache Flink] ( https://docs.aws.amazon.com/kinesisanalytics/latest/java/what-is.html ) ,
62+ an application running stream processing frameworks like Apache Spark
63+ Streaming and Apache Storm or a custom application using the Kinesis API
64+ or KCL. For this post, we use Kinesis Data Streams as the storage layer
65+ and the containerized KCL applicationon AWS Fargate as the processing layer.
6466
6567## Streaming data processing architecture
6668
@@ -116,7 +118,7 @@ utilization of 65%.
116118As mentioned earlier, you can run a variety of streaming platforms on
117119AWS. However, for the data processor in this post, you use Kinesis Data
118120Streams. Kinesis Data Streams is a data store where the data is held for
119- 24 hours and configurable up to 168 hours . Kinesis Data Streams is
121+ 24 hours and configurable up to 1 year . Kinesis Data Streams is
120122designed to be highly available and redundant by storing data across
121123three [ Availability
122124Zones] ( https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-availability-zones )
@@ -125,7 +127,9 @@ in the specified Region.
125127The stream consists of one or more * shards* , which are uniquely
126128identified sequences of data records in a stream. One shard has a
127129maximum of 2 MB/s in reads (up to five transactions) and 1 MB/s writes
128- per second (up to 1,000 records per second).
130+ per second (up to 1,000 records per second). Consumers with [ Dedicated
131+ Throughput (Enhanced Fan-Out)] ( https://docs.aws.amazon.com/streams/latest/dev/enhanced-consumers.html )
132+ support up to 2 MB/s data egress per consumer and shard.
129133
130134Each record written to Kinesis Data Streams has a * partition key,* which
131135is used to group data by shard. In this example, the data stream starts
0 commit comments