AWS Cloud Developer Associate Certification

16.Application Integration(SQS, SNS, Kinesis)

  1. Introduction
  2. Amazon SQS
  3. Standard Queue
  4. Producing Messages
  5. Consuming Messages
  6. Multiple EC2 Instances Consumers
  7. Amazon SQS – Security
  8. SQS – Message Visibility Timeout
    1. Amazon SQS – Dead Letter Queue
    2. Amazon SQS – Delay Queue
    3. Amazon SQS – Long Polling
    4. SQS Extended Client
    5. SQS – Must know API
  9. Amazon SQS – FIFO Queue
  10. SQS FIFO – Deduplication
  11. SQS FIFO – Message Grouping
  12. Amazon SNS
    1. Amazon SNS – How to publish
    2. Amazon SNS – Security
    3. SNS + SQS: Fan Out
  13. Amazon SNS – FIFO Topic
  14. SNS FIFO + SQS FIFO: Fan Out
  15. SNS – Message Filtering
  16. Kinesis
    1. Overview
    2. Kinesis Data Streams
    3. Kinesis Data Streams Security
    4. Kinesis Producers
    5. Kinesis – ProvisionedThroughputExceeded
    6. Kinesis Data Streams Consumers
    7. Kinesis Consumers Types
    8. Kinesis Client Library (KCL)
    9. Kinesis Operation – Shard Splitting
    10. Kinesis Operation – Merging Shards
    11. Kinesis Data Firehose
    12. Kinesis Data Streams vs Firehose
    13. Kinesis Data Analytics (SQL application)
  17. Ordering data into Kinesis
  18. Ordering data into SQS
  19. Kinesis vs SQS ordering
  20. SQS vs SNS vs Kinesis

Introduction

  1. There are two patterns of application communication
    1. Synchronous communications(application to application)
    2. Asynchronous / Event based(application to queue to application)
  2. Synchronous between applications can be problematic if there are sudden spikes of traffic
  3. In that case, it’s better to decouple your applications
    1. using SQS: queue model
    2. using SNS: pub/sub model
    3. using Kinesis: real-time streaming model
    4. These services can scale independently from our application!


Amazon SQS
Standard Queue

  1. Oldest offering (over 10 years old)
  2. Fully managed service, used to decouple applications
  3. Attributes:
    1. Unlimited throughput, unlimited number of messages in queue
    2. Default retention of messages: 4 days, maximum of 14 days
    3. Low latency (<10 ms on publish and receive)
    4. Limitation of 256KB per message sent
  4. Can have duplicate messages (at least once delivery, occasionally)
  5. Can have out of order messages (best effort ordering)


SQS –
Producing Messages

  1. Produced to SQS using the SDK (SendMessage API)
  2. The message is persisted in SQS until a consumer deletes it
  3. Message retention: default 4 days, up to 14 days
  4. SQS standard: unlimited throughput


SQS- Consuming Messages

  1. Consumers (running on EC2 instances, servers, or AWS Lambda)…
  2. Poll SQS for messages (receive up to 10 messages at a time)
  3. Process the messages (example: insert the message into an RDS database)
  4. Delete the messages using the DeleteMessage API


SQS- Multiple EC2 Instances Consumers

  1. Consumers receive and process messages in parallel
  2. At least once delivery
  3. Best-effort message ordering
  4. We can scale consumers horizontally to improve throughput of processing


Amazon SQS – Security

  1. Encryption:
    1. In-flight encryption using HTTPS API
    2. At-rest encryption using KMS keys
    3. Client-side encryption if the client wants to perform encryption/decryption itself
  2. Access Controls
    1. IAM policies to regulate access to the SQS API
  3. SQS Access Policies(similar to S3 bucket policies)
    1. Useful for cross-account access to SQS queues
    2. Useful for allowing other services (SNS, S3…) to write to an SQS queue


SQS – Message Visibility Timeout

  1. After a message is polled by a consumer, it becomes invisible to other consumers
  2. By default, the “message visibility timeout” is 30 seconds
  3. That means the message has 30 seconds to be processed
  4. After the message visibility timeout is over, the message is “visible” in SQS
  5. If a message is not processed within the visibility timeout, it will be processed twice
  6. A consumer could call the ChangeMessageVisibility API to get more time
  7. If visibility timeout is high (hours), and consumer crashes, re-processing will take time
  8. If visibility timeout is too low (seconds), we may get duplicates


Amazon SQS – Dead Letter Queue

  1. If a consumer fails to process a message within the Visibility Timeout…
    1. the message goes back to the queue!
  2. We can set a threshold of how many times a message can go back to the queue
  3. After the MaximumReceives threshold is exceeded, the message goes into a dead letter queue (DLQ)
  4. Useful for debugging!
    1. Make sure to process the messages in the DLQ before they expire:
    2. Good to set a retention of 14 days in the DLQ


Amazon SQS – Delay Queue

  1. Delay a message (consumers don’t see it immediately) up to 15 minutes
  2. Default is 0 seconds (message is available right away)
  3. Can set a default at queue level
  4. Can override the default on send using the DelaySeconds parameter


Amazon SQS – Long Polling

  1. When a consumer requests messages from the queue, it can optionally “wait” for messages to arrive if there are none in the queue
  2. This is called Long Polling
  3. LongPolling decreases the number of API calls made to SQS while increasing the efficiency and latency of your application
  4. The wait time can be between 1 sec to 20 sec (20 sec preferable)
  5. Long Polling is preferable to Short Polling
  6. Long polling can be enabled at the queue level or at the API level using WaitTimeSeconds

SQS Extended Client

  1. Message size limit is 256KB, how to send large messages, e.g. 1GB?
  2. Using the SQS Extended Client (Java Library)

SQS – Must know API

  1. CreateQueue (MessageRetentionPeriod), DeleteQueue
  2. PurgeQueue: delete all the messages in queue
  3. SendMessage (DelaySeconds), ReceiveMessage, DeleteMessage
  4. MaxNumberOfMessages: default 1, max 10 (for ReceiveMessage API)
  5. ReceiveMessageWaitTimeSeconds: Long Polling
  6. ChangeMessageVisibility: change the message timeout
  7. Batch APIs for SendMessage, DeleteMessage, ChangeMessageVisibility helps decrease your costs


Amazon SQS – FIFO Queue

  1. FIFO = First In First Out (ordering of messages in the queue)
  2. Limited throughput: 300 msg/s without batching, 3000 msg/s with
  3. Exactly-once send capability (by removing duplicates)
  4. Messages are processed in order by the consumer


SQS FIFO – Deduplication

  1. De-duplication interval is 5 minutes
  2. Two de-duplication methods:
    1. Content-based deduplication: will do a SHA-256 hash of the message body
    2. Explicitly provide a Message Deduplication ID


SQS FIFO – Message Grouping

  1. If you specify the same value of MessageGroupID in an SQS FIFO queue,you can only have one consumer, and all the messages are in order
  2. To get ordering at the level of a subset of messages, specify different values for MessageGroupID
    1. Messages that share a common Message Group ID will be in order within the group
    2. Each Group ID can have a different consumer (parallel processing!)
    3. Ordering across groups is not guaranteed


Amazon SNS

  1. What if you want to send one message to many receivers?
  2. The “event producer” only sends message to one SNS topic
  3. As many “event receivers” (subscriptions) as we want to listen to the SNS topic notifications
  4. Each subscriber to the topic will get all the messages (note: new feature to filter messages)
  5. Up to 10,000,000 (10M)subscriptions per topic
  6. 100,000 (0.1M)topics limit
  7. Subscribers can be:
    1. SQS
    2. HTTP / HTTPS (with delivery retries – how many times)
    3. Lambda
    4. Emails
    5. SMS messages
    6. Mobile Notifications


Amazon SNS – How to publish

  1. Topic Publish (using the SDK)
    1. Create a topic
    2. Create a subscription (or many)
    3. Publish to the topic
  2. Direct Publish (for mobile apps SDK)
    1. Create a platform application
    2. Create a platform endpoint
    3. Publish to the platform endpoint
    4. Works with Google GCM, Apple APNS, Amazon ADM…


Amazon SNS – Security

  1. Encryption:
    1. In-flight encryption using HTTPS API
    2. At-rest encryption using KMS keys
    3. Client-side encryption if the client wants to perform encryption/decryption itself
  2. Access Controls:
    1. IAM policies to regulate access to the SNS API
  3. SNS Access Policies (similar to S3 bucket policies)
    1. Useful for cross-account access to SNS topics
    2. Useful for allowing other services ( S3…) to write to an SNS topic


SNS + SQS: Fan Out

Fan-out- the number of inputs that can be connected to a specified output.

  1. Push once in SNS, receive in all SQS queues that are subscribers
  2. Fully decoupled, no data loss
  3. SQS allows for: data persistence, delayed processing and retries of work
  4. Ability to add more SQS subscribers over time
  5. Make sure your SQS queue access policy allows for SNS to write

Application: S3 Events to multiple queues

  1. For the same combination of: event type (e.g. object create) and prefix (e.g. images/) you can only have one S3 Event rule
  2. If you want to send the same S3 event to many SQS queues, use fan-out


Amazon SNS – FIFO Topic

  1. FIFO = First In First Out (ordering of messages in the topic)
  2. Similar features as SQS FIFO:
    1. Ordering by Message Group ID (all messages in the same group are ordered)
    2. Deduplication using a Deduplication ID or Content Based Deduplication
    3. Can only have SQS FIFO queues as subscribers
    4. Limited throughput (same throughput as SQS FIFO)


SNS FIFO + SQS FIFO: Fan Out

  1. In case you need fan out + ordering + deduplication


SNS – Message Filtering

  1. JSON policy used to filter messages sent to SNS topic’s subscriptions
  2. If a subscription doesn’t have a filter policy, it receives every message


Kines Overview

  1. Makes it easy to collect, process, and analyze streaming data in real-time
  2. Ingest real-time data such as: Application logs, Metrics, Website clickstreams
  3. Kinesis Data Streams: capture, process, and store data streams
  4. Kinesis Data Firehose: load data streams into AWS data stores
  5. Kinesis Data Analytics: analyze data streams with SQL or Apache Flink
  6. Kinesis Video Streams: capture, process, and store video streams


Kinesis Data Streams

  1. Billing is per shard provisioned, can have as many shards as you want
  2. Retention between 1 day (default) to 365 days
  3. Ability to reprocess (replay) data
  4. Once data is inserted in Kinesis, it can’t be deleted (immutability)
  5. Data that shares the same partition goes to the same shard (ordering
  6. Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent
  7. Consumers:
    1. Write your own: Kinesis Client Library (KCL), AWS SDK
    2. Managed: AWS Lambda, Kinesis Data Firehose, Kinesis Data Analytics


Kinesis Data Streams Security

  1. Control access / authorization using IAM policies
  2. Encryption in flight using HTTPS endpoints
  3. Encryption at rest using KMS
  4. You can implement encryption/decryption of data on client side (harder
  5. VPC Endpoints available for Kinesis to access within VPC
  6. Monitor API calls using CloudTrail


Kinesis Producers

  1. Puts data records into data streams
  2. Data record consists of:
    1. Sequence number (unique per partition-key within shard)
    2. Partition key (must specify while put records into stream)
    3. Data blob (up to 1 MB)
  3. Producers
    1. AWS SDK: simple producer
    2. Kinesis Producer Library (KPL): C++, Java, batch, compression, retries
    3. Kinesis Agent: monitor log files
  4. Write throughput: 1 MB/sec or 1000 records/sec per shard
  5. PutRecord API
  6. Use batching with PutRecords API to reduce costs & increase throughput


Kinesis – ProvisionedThroughputExceeded


Kinesis Data Streams Consumers

  1. Get data records from data streams and process them
    1. AWS Lambda
    2. Kinesis Data Analytics
    3. Kinesis Data Firehose
    4. Custom Consumer (AWS SDK) – Classic or Enhanced Fan-Out
    5. Kinesis Client Library (KCL): library to simplify reading from data stream


Kinesis Consumers Types


Kinesis Client Library (KCL)

2 Types –

  1. Shared (Classic) Fan-out Consumer – pull
    1. Low number of consuming applications
    2. Read throughput: 2 MB/sec per shard across all consumers
    3. Max. 5 GetRecords API calls/sec
    4. Latency ~200 ms
    5. Minimize cost ($)
    6. Consumers poll data from Kinesis using GetRecords API call
    7. Returns up to 10 MB (then throttle for 5 seconds) or up to 10000 records
  2. Enhanced Fan-out Consumer – push
    1. Multiple consuming applications for the same stream
    2. 2 MB/sec per consumer per shard
    3. Latency ~70 ms
    4. Higher costs ($$$)
    5. Kinesis pushes data to consumers over HTTP/2 (SubscribeToShard API)
    6. Soft limit of 5 consumer applications (KCL) per data stream (default)


Kinesis Consumers – AWS Lambda

  1. Supports Classic & Enhanced fan-out consumers
  2. Read records in batches
  3. Can configure batch size and batch window
  4. If error occurs, Lambda retries until succeeds or data expired
  5. Can process up to 10 batches per shard simultaneously

Kinesis Client Library (KCL)

  1. A Java library that helps read record from a Kinesis Data Stream with distributed applications sharing the read workload
  2. Each shard is to be read by only one KCL instance(Many to One i.e from Shard to KCL)
    1. 4 shards = max. 4 KCL instances
    2. 6 shards = max. 6 KCL instances
  3. Progress is checkpointed into DynamoDB (needs IAM access)
  4. Track other workers and share the work amongst shards using DynamoDB
  5. KCL can run on EC2, Elastic Beanstalk, and on-premises
  6. Records are read in order at the shard level
  7. Versions:
    1. KCL 1.x (supports shared consumer)
    2. KCL 2.x (supports shared & enhanced fan-out consumer)

Kinesis Operation – Shard Splitting

  1. Used to increase the Stream capacity (1 MB/s data in per shard)
  2. Used to divide a “hot shard”.
  3. The old shard is closed and will be deleted once the data is expired
  4. No automatic scaling (manually increase/decrease capacity)
  5. Can’t split into more than two shards in a single operation


Kinesis Operation – Mer
ging Shards

  1. Decrease the Stream capacity and save costs
  2. Can be used to group two shards with low traffic (cold shards)
  3. Old shards are closed and will be deleted once the data is expired
  4. Can’t merge more than two shards in a single operation


Kinesis Data Firehose

  1. Fully Managed Service, no administration, automatic scaling, serverless
    1. AWS: Redshift / Amazon S3 / ElasticSearch
    2. 3rd party partner: Splunk / MongoDB / DataDog / NewRelic / …
    3. Custom: send to any HTTP endpoint
  2. Pay for data going through Firehose
  3. Near Real Time
    1. 60 seconds latency minimum for non full batches
    2. Or minimum 32 MB of data at a time
  4. Supports many data formats, conversions, transformations, compression
  5. Supports custom data transformations using AWS Lambda
  6. Can send failed or all data to a backup S3 bucket


Kinesis Data Streams vs Firehose

Kinesis Data StreamsKinesis Data Firehose
Streaming service for ingest at scaleLoad streaming data into S3 / Redshift /
ES / 3rd party / custom HTTP
Write custom code (producer /
consumer)
Fully managed
Real-time (~200 ms)Near real-time (buffer time min. 60 sec)
Manage scaling (shard splitting /
merging)
Automatic scaling
Data storage for 1 to 365 daysNo data storage
Supports replay capability
NotE- Firehouse is less sin every parameter as compared to data/video stream except bandwidh at with data transferred i.e 1mbps vs 2mbps


Kinesis Data Analytics (SQL application)

  1. Perform real-time analytics on Kinesis Streams using SQL
  2. Fully managed, no servers to provision
  3. Automatic scaling
  4. Real-time analytics
  5. Pay for actual consumption rate
  6. Can create streams out of the real-time queries
  7. Use cases:
    1. Time-series analytics
    2. Real-time dashboards
    3. Real-time metrics


Ordering data into Kinesis

  1. Imagine you have 100 trucks (truck_1, truck_2, … truck_100) on the road sending their GPS positions regularly into AWS
  2. You want to consume the data in order for each truck, so that you can track their movement accurately
  3. How should you send that data into Kinesis?
  4. Answer: send using a “Partition Key” value of the “truck_id”
  5. The same key will always go to the same shard


Ordering data into SQS

  1. For SQS standard, there is no ordering
  2. For SQS FIFO, if you don’t use a Group ID, messages are consumed in the order they are sent, with only one consumer
  3. You want to scale the number of consumers, but you want messages to be “grouped when they are related to each other
  4. Then you use a Group ID (similar to Partition Key in Kinesis)


Kinesis vs SQS ordering

Let’s assume 100 trucks, 5 kinesis shards, 1 SQS FIFO

Kinesis Data StreamsSQS ordering
On average you’ll have 20 trucks per shardYou only have one SQS FIFO queue
Trucks will have their data ordered within each shardYou will have 100 Group ID
The maximum amount of consumers in parallel we can have is 5You can have up to 100 Consumers (due to the 100 Group ID)
Can receive up to 5 MB/s of dataYou have up to 300 messages per second (or 3000 if using batching


SQS vs SNS vs Kinesis

SQSSNSKinesis
Consumer “pull data”Push data to many
subscribers
Standard: pull data
• 2 MB per shard
Data is deleted after being consumed.Data is not persisted (lost if
not delivered).
– Possibility to replay data
– Data expires after X days
Can have as many workers (consumers) as we want– Up to 12,500,000 (12.5 m)subscribers
– Pub/Sub
– Up to 100,000 topics
Enhanced-fan out: push data
• 2 MB per shard per consumer
No need to provision
throughput
No need to provision
throughput
Must provision throughput
Ordering guarantees only on
FIFO queues
Integrates with SQS for fanout architecture patternOrdering at the shard level
Individual message delay capabilityFIFO capability for SQS FIFOMeant for real-time big data, analytics and ETL

Published by

Unknown's avatar

sevanand yadav

software engineer working as web developer having specialization in spring MVC with mysql,hibernate

Leave a comment