AWS Cloud Developer Associate Certification

13. DynamoDB – NoSQL Serverless Database
Index

  1. Introduction
  2. Primary Keys
    1. Partition(Hash)
    2. Partition+Range
  3. WCU
  4. RCU
  5. Eventual and strong consistency
  6. DynamoDB -Reading Data
    1. DynamoDB -Reading Data (Query)
    2. DynamoDB -Reading Data (Scan)
  7. DynamoDB -Writing Data
  8. DynamoDB as Session state Cache
  9. Dynamo dB Transactions
  10. Dynamo dB WritesTypes

Introduction

  • Traditional applications leverage RDBMS databases
  • Vertical scaling (getting a more powerful CPU / RAM / IO)
  • Horizontal scaling (increasing reading capability by adding EC2 / RDS Read Replicas)

NoSQL databases

  1. NoSQL databases scale horizontally
  2. There’s no “right or wrong” for NoSQL vs SQL, they just require to model the data differently and think about user queries differently

Amazon DynamoDB

  1. Integrated with IAM for security, authorization and administration
  2. Enables event driven programming with DynamoDB Streams

DynamoDB – Basics

  1. DynamoDB is made of Tables
  2. Each table has a Primary Key (must be decided at creation time)
  3. Each table can have an infinite number of items (= rows)
  4. Each item has attributes (can be added over time – can be null)
  5. Maximum size of an item(row) is 400KB
  6. Data types supported are:
    1. Scalar Types – String, Number, Binary, Boolean, Null
    2. Document Types – List, Map
    3. Set Types – String Set, Number Set, Binary Set

DynamoDB – Primary Keys

  1. Option 1: Partition Key (HASH)
    1. Partition key must be unique for each item
    2. Partition key must be “diverse” so that the data is distributed
  2. Option 2: Partition Key + Sort Key (HASH + RANGE
    1. The combination must be unique for each item
    2. Data is grouped by partition key
    3. Example: users-games table, “User_ID” for Partition Key and “Game_ID” for Sort Key

DynamoDB – Partition Keys (Exercise)

  • What is the best Partition Key to maximize data distribution?
    • One with the highest cardinality so it’s a good candidate

DynamoDB – Read/Write Capacity Modes

Control how you manage your table’s capacity (read/write throughput)

  1. Provisioned Mode (default)
    1. You specify the number of reads/writes per second
    2. You need to plan capacity beforehand
    3. Pay for provisioned read & write capacity units
  2. On-Demand Mode
    1. Read/writes automatically scale up/down with your workloads
    2. No capacity planning needed
    3. Pay for what you use, more expensive ($$$)
  3. Note- You can switch between different modes once every 24 hours

R/W Capacity Modes – Provisioned

  1. Table must have provisioned read and write capacity units
  2. Read Capacity Units (RCU) – throughput for reads
  3. Write Capacity Units (WCU) – throughput for writes
  4. Option to setup auto-scaling of throughput to meet demand
  5. Throughput can be exceeded temporarily using “Burst Capacity”
  6. If Burst Capacity has been consumed, you’ll get a “ProvisionedThroughputExceededException”
  7. It’s then advised to do an exponential backoff retry

DynamoDB – Write Capacity Units (WCU)

  1. One Write Capacity Unit (WCU) represents one write per second for an item up to 1 KB in size i.e 1WCU = 1KB/SEC
  2. If the items are larger than 1 KB, more WCUs are consumed

Examples

Example 1: we write 10 items per second, with item size 2 KB
• We need 10 ∗ (2) = 20 𝑊𝐶𝑈𝑠
Example 2: we write 6 items per second, with item size 4.5 KB
• We need 6 ∗ (5) = 30 𝑊𝐶𝑈𝑠 (4.5 gets rounded to the upper KB)
Example 3: we write 120 items per minute, with item size 2 KB
• We need   (120/60)*2 = 4 𝑊𝐶𝑈𝑠

Strongly Consistent ‘Read’ vs. Eventually Consistent ‘Read

  1. Eventually Consistent Read (default) If we read just after a write, it’s possible we’ll get some stale data because of replication
  2. Strongly Consistent Read If we read just after a write, we will get the correct data
    1. Set “ConsistentRead” parameter to True in API calls (GetItem, BatchGetItem, Query, Scan)
    2. Consumes twice the RCU

DynamoDB – Read Capacity Units (RCU)

  1. One Read Capacity Unit (RCU) represents one Strongly Consistent Read per second, or two Eventually Consistent Reads per second, for an item up to 4KB in size
  2. If the items are larger than 4 KB, more RCUs are consumed
Example 1: 10 Strongly Consistent Reads per second, with item size 4 KB
• We need 10 ∗ (4kb/4KB)= 10 𝑅𝐶𝑈𝑠
 Example 2: 16 Eventually Consistent Reads per second, with item size 12 KB
 We need (16/2)= 24 𝑅𝐶𝑈𝑠   (eventual consistency consumes 1/2 than the strong consistent )
• Example 3: 10 Strongly Consistent Reads per second, with item size 6 KB
 We need 10 ∗ ( 8/4)= 20 𝑅𝐶𝑈𝑠 (we must round up 6 KB to 8 KB)

DynamoDB – Partitions Internal

  1. Data is stored in partitions
  2. Partition Keys go through a hashing algorithm to know to which partition they go to.
  3. WCUs and RCUs are spread evenly across partitions

To compute the number of partitions:

• # 𝑜𝑓 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠(by capacity)= (RCU/3000+WRU/1000)
• # 𝑜𝑓 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠(by size) = (total size/10GB)
• # 𝑜𝑓 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 = ceil(max # 𝑜𝑓 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠(BY size), # 𝑜𝑓 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠(by size) )

DynamoDB – Throttling

  1. If we exceed provisioned RCUs or WCUs, we get “ProvisionedThroughputExceededException”
  2. Reasons:
    1. Hot Keys – one partition key is being read too many times (e.g., popular item)
    2. Hot Partitions
    3. Very large items, remember RCU and WCU depends on size of items
  3. Solutions:
    1. Exponential backoff when exception is encountered (already in SDK)
    2. Distribute partition keys as much as possible
    3. If RCU issue, we can use DynamoDB Accelerator (DAX)

R/W Capacity Modes – On-Demand

  1. Read/writes automatically scale up/down with your workloads
  2. Unlimited WCU & RCU, no throttle, more expensive
  3. You’re charged for reads/writes that you use in terms of RRU and WRU
  4. Read Request Units (RRU) – throughput for reads (same as RCU)
  5. Write Request Units (WRU) – throughput for writes (same as WCU)
  6. 2.5x more expensive than provisioned capacity (use with care)
  7. Use cases: unknown workloads, unpredictable application traffic, …

DynamoDB – Writing Data

  1. PutItem
    1. Creates a new item or fully replace an old item (same Primary Key)
    2. Consumes WCUs
  2. UpdateItem
    1. Edits an existing item’s attributes or adds a new item if it doesn’t exist
    2. Can be used to implement Atomic Counters – a numeric attribute that’s unconditionally incremented
  3. Conditional Writes
    1. Accept a write/update/delete only if conditions are met, otherwise returns an error
    2. Helps with concurrent access to items
    3. No performance impact

DynamoDB – Reading Data

  1. GetItem
    1. Read based on Primary key
    2. Primary Key can be HASH or HASH+RANGE
    3. Eventually Consistent Read (default)
    4. Option to use Strongly Consistent Reads (more RCU – might take longer)
    5. ProjectionExpression can be specified to retrieve only certain attributes

DynamoDB – Reading Data (Query)

  1. Query returns items based on
    1. KeyConditionExpression
      1. Partition Key value (must be = operator) – required
      2. Sort Key value (=, <, <=, >, >=, Between, Begins with) – optional
    2. FilterExpression
      1. Additional filtering after the Query operation (before data returned to you)
      2. Use only with non-key attributes (does not allow HASH or RANGE attributes)
  2. Returns
    1. The number of items specified in Limit
    2. Or up to 1 MB of data
  3. Ability to do pagination on the results
  4. Can query table, a Local Secondary Index, or a Global Secondary Index

DynamoDB – Reading Data (Scan)

  1. Scan the entire table and then filter out data (inefficient)
  2. Returns up to 1 MB of data – use pagination to keep on reading
  3. Consumes a lot of RCU
  4. Limit impact using Limit or reduce the size of the result and pause
  5. For faster performance, use Parallel Scan
    1. Multiple workers scan multiple data segments at the same time
    2. Increases the throughput and RCU consumed
    3. Limit the impact of parallel scans just like you would for Scans
  6. Can use ProjectionExpression & FilterExpression (no changes to RCU)

DynamoDB – Deleting Data

  1. DeleteItem
    1. Delete an individual item
    2. Ability to perform a conditional delete
  2. DeleteTable
    1. Delete a whole table and all its items
    2. Much quicker deletion than calling DeleteItem on all items

DynamoDB – Batch Operations

  1. Allows you to save in latency by reducing the number of API calls
  2. Operations are done in parallel for better efficiency
  3. Part of a batch can fail; in which case we need to try again for the failed items
  4. BatchWriteItem
    1. Up to 25 PutItem and/or DeleteItem in one call
    2. Up to 16 MB of data written, up to 400 KB of data per item
    3. Can’t update items (use UpdateItem)
  5. BatchGetItem
    1. Return items from one or more tables
    2. Up to 100 items, up to 16 MB of data
    3. Items are retrieved in parallel to minimize latency

DynamoDB – Local Secondary Index (LSI)

  1. Alternative Sort Key for your table (same Partition Key as that of base table)
  2. The Sort Key consists of one scalar attribute (String, Number, or Binary)
  3. Up to 5 Local Secondary Indexes per table
  4. Must be defined at table creation time
  5. Attribute Projections – can contain some or all the attributes of the base table (KEYS_ONLY, INCLUDE, ALL)

DynamoDB – Global Secondary Index (GSI)

  1. Alternative Primary Key (HASH or HASH+RANGE) from the base table
  2. Speed up queries on non-key attributes
  3. The Index Key consists of scalar attributes (String, Number, or Binary)
  4. Attribute Projections – some or all the attributes of the base table (KEYS_ONLY, INCLUDE, ALL)
  5. Must provision RCUs & WCUs for the index
  6. Can be added/modified after table creation

DynamoDB – Indexes and Throttling

  1. Global Secondary Index (GSI):
    1. If the writes are throttled on the GSI, then the main table will be throttled!
    2. Even if the WCU on the main tables are fine
    3. Choose your GSI partition key carefully!
    4. Assign your WCU capacity carefully!
  2. Local Secondary Index (LSI):
    1. Uses the WCUs and RCUs of the main table
    2. No special throttling considerations

DynamoDB – Optimistic Locking

  1. DynamoDB has a feature called “Conditional Writes
  2. A strategy to ensure an item hasn’t changed before you update/delete it
  3. Each item has an attribute that acts as a version number

DynamoDB Accelerator (DAX)

  1. Fully-managed, highly available, seamless in-memory cache for DynamoDB
  2. Microseconds latency for cached reads & queries
  3. Doesn’t require application logic modification (compatible with existing DynamoDB APIs)
  4. Solves the “Hot Key” problem (too many reads)
  5. 5 minutes TTL for cache (default)
  6. Up to 10 nodes in the cluster
  7. Multi-AZ (3 nodes minimum recommended for production)
  8. Secure (Encryption at rest with KMS, VPC, IAM CloudTrail)

DynamoDB Accelerator (DAX) vs. ElastiCache

DynamoDB Streams

  1. Ordered stream of item-level modifications (create/update/delete) in a table
  2. Stream records can be:
    1. Sent to Kinesis Data Streams
    2. Read by AWS Lambda
    3. Read by Kinesis Client Library applications
  3. Data Retention for up to 24 hours
  4. Use cases:
    1. react to changes in real-time (welcome email to users)
    2. Analytics
    3. Insert into derivative tables
  1. Ordered stream of item-level modifications (create/update/delete) in a table
  2. Stream records can be:
    1. Sent to Kinesis Data Streams
    2. Read by AWS Lambda
    3. Read by Kinesis Client Library applications
  3. Data Retention for up to 24 hours
  4. Use cases:
    1. react to changes in real-time (welcome email to users)
    2. Analytics
    3. Insert into derivative tables
    4. Insert into ElasticSearch
    5. Implement cross-region replication

  1. Ability to choose the information that will be written to the stream:
    1. KEYS_ONLY – only the key attributes of the modified item
    2. NEW_IMAGE – the entire item, as it appears after it was modified
    3. OLD_IMAGE – the entire item, as it appeared before it was modified
    4. NEW_AND_OLD_IMAGES – both the new and the old images of the item
  2. DynamoDB Streams are made of shards, just like Kinesis Data Streams
  3. You don’t provision shards, this is automated by AWS
  4. Records are not retroactively populated in a stream after enabling it.

DynamoDB Streams & AWS Lambda

  1. You need to define an Event Source Mapping to read from a DynamoDB Streams
  2. You need to ensure the Lambda function has the appropriate permissions
  3. Your Lambda function is invoked synchronously

DynamoDB – Time To Live (TTL)

  1. Automatically delete items after an expiry timestamp
  2. Doesn’t consume any WCUs (i.e., no extra cost)
  3. The TTL attribute must be a “Number” data type with “Unix Epoch timestamp” value
  4. Expired items deleted within 48 hours of expiration
  5. Expired items, that haven’t been deleted, appears in reads/queries/scans (if you don’t want them, filter them out)
  6. Expired items are deleted from both LSIs and GSIs
  7. A delete operation for each expired item enters the DynamoDB Streams (can help recover expired items)
  8. Use cases: reduce stored data by keeping only current items, adhere to regulatory obligations,

DynamoDB CLI – Good to Know

  1. –projection-expression: one or more attributes to retrieve
  2. –filter-expression: filter items before returned to you
  3. General AWS CLI Pagination options (e.g., DynamoDB, S3, …)
    1. –page-size: specify that AWS CLI retrieves the full list of items but with a larger number of API calls instead of one API call (default: 1000 items)
    2. max-items: max. number of items to show in the CLI (returns NextToken)
    3. -starting-token: specify the last NextToken to retrieve the next set of items

DynamoDB Transactions

  1. Coordinated, all-or-nothing operations (add/update/delete) to multiple items across one or more tables
  2. Provides Atomicity, Consistency, Isolation, and Durability (ACID)
  3. Read ModesEventual Consistency, Strong Consistency, Transactional
  4. Write Modes Standard, Transactional
  5. Consumes 2x WCUs & RCUs
    1. DynamoDB performs 2 operations for every item (prepare & commit)
  6. Two operations: (up to 25 unique items or up to 4 MB of data)
    1. TransactGetItems – one or more GetItem operations
    2. TransactWriteItems – one or more PutItem, UpdateItem, and DeleteItem operations
  7. Use cases: financial transactions, managing orders, multiplayer games, …

DynamoDB Transactions – Capacity Computations (Important for the exam!)

Example1: 3 Transactional writes per second, with item size 5 KB
• We need 3 ∗ (5KB/1kb) ∗ 2 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑎𝑙 𝑐𝑜𝑠𝑡 = 30 𝑊𝐶𝑈𝑠
Example 2: 5 Transaction reads per second , with item size 5 KB
• We need 5 ∗ (8/4) ∗ 2 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑎𝑙 𝑐𝑜𝑠𝑡 = 20 𝑅𝐶𝑈𝑠
(5 gets rounded to the upper 4 KB)

DynamoDB as Session State Cache

  1. It’s common to use DynamoDB to store session states
  2. vs. ElastiCache
    1. ElastiCache is in-memory, but DynamoDB is serverless
    2. Both are key/value stores
  3. vs. EFS
    1. EFS must be attached to EC2 instances as a network drive
  4. vs. EBS & Instance Store
    1. EBS & Instance Store can only be used for local caching, not shared caching
  5. vs. S3
    1. S3 is higher latency, and not meant for small objects

DynamoDB Write Sharding

  1. Imagine we have a voting application with two candidates, candidate A and candidate B.
  2. If Partition Key is “Candidate_ID”, this results into two partitions, which will generate issues (e.g., Hot Partition)
  3. A strategy that allows better distribution of items evenly across partitions
  4. Add a suffix to Partition Key value
  5. Two methods
    1. Sharding Using Random Suffix
    2. Sharding Using Calculated Suffix

DynamoDB – Write Types

DynamoDB Operations

  1. Table Cleanup
    1. Option 1: Scan + DeleteItem
      1. Very slow, consumes RCU & WCU, expensive
    2. Option 2: Drop Table + Recreate table
      1. Fast, efficient, cheap
  2. Copying a DynamoDB Table
    1. Option 1: Using AWS Data Pipeline
    2. Option 2: Backup and restore into a new table
      1. Takes some time
    3. Option 3: Scan + PutItem or BatchWriteItem
      1. Write your own code

DynamoDB – Security & Other Features

  1. Security
    1. VPC Endpoints available to access DynamoDB without using the Internet
    2. Access fully controlled by IAM
    3. Encryption at rest using AWS KMS and in-transit using SSL/TLS
  2. Backup and Restore feature available
    1. Point-in-time Recovery (PITR) like RDS
    2. No performance impact
  3. Global Tables
    1. Multi-region, multi-active, fully replicated, high performance
  4. DynamoDB Local
    1. Develop and test apps locally without accessing the DynamoDB web service (without Internet)
  5. AWS Database Migration Service (AWS DMS) can be used to migrate to DynamoDB (from MongoDB, Oracle, MySQL, S3, …)

DynamoDB – Fine-Grained Access Control

  1. Using Web Identity Federation or Cognito Identity Pools, each user gets AWS credentials
  2. You can assign an IAM Role to these users with a Condition to limit their API access to DynamoDB
  3. LeadingKeys – limit row-level access for users on the Primary Key
  4. Attributes – limit specific attributes the user can see

Published by

Unknown's avatar

sevanand yadav

software engineer working as web developer having specialization in spring MVC with mysql,hibernate

Leave a comment