12.Amazon RDS and Elastic Cache
Index
- Database types – Relational VS non relational
- Database types – Operational vs Analytical
- Databases Architectures
- amazon RDS
- Elastic cache
- Caching strategies – Lazy load,Write through,TTL
- elastic Cache Engines
- Memcached
- Redis(cluster disabled)
- Redis(cluster enabled)
Database Types – Relational vs Non-Relational – Key differences are how data are managed and how data are stored
| Relational | Non-Relational |
|---|---|
| Organized by tables, rows and columns | Varied data storage models |
| Rigid schema (SQL) | Flexible schema (NoSQL) – data stored in key-value pairs, columns, documents or graph |
| Rules enforced within database | Rules can be defined in application code (outside database) |
| Typically scaled vertically | Scales horizontally |
| Supports complex queries and joins | Unstructured, simple language that supports any kind of schema |
| ACID (Atomicity, Consistency, Isolation, Durability) compliance typically enforced | Performance is typically prioritized, can use ACID transactions in some cases |
| Amazon RDS, Oracle, MySQL, IBM DB2, PostgreSQL | Amazon DynamoDB, MongoDB, Redis, Neo4j |
Database types – Operational vs Analytical
Key differences are use cases and how the database is optimized
| Operational / transactional | Analytical |
|---|---|
| Online Transaction Processing (OLTP) | Online Analytics Processing (OLAP) – the source data comes from OLTP DBs |
| Production DBs that process transactions. E.g. adding customer records, checking stock availability (INSERT, UPDATE, DELETE) | Data warehouse. Typically, separated from the customer facing DBs. Data is extracted for decision making |
| Short transactions and simple queries | Long transactions and complex queries |
| Relational examples: Amazon RDS, Oracle, IBM DB2, MySQL | Relational examples: Amazon RedShift, Teradata, HP Vertica |
| Non-relational examples: Amazon DynamoDB, MongoDB, Cassandra | Non-relational examples: Amazon EMR, MapReduce |
Databases – Architecture Discussion
| Data Store | When to Use |
|---|---|
| Database on EC2 | – Full control over instance and database – Preferred DB not available under RDS |
| Amazon RDS | • Need traditional relational database for OLTP • Your data is well-formed and structured |
| Amazon DynamoDB | • Name/value pair data • Unpredictable data structure • In-memory performance with persistence • High I/O needs • Require dynamic scaling |
| Amazon RedShift | • Data warehouse for large volumes of aggregated data • Primarily OLAP workloads |
| Amazon ElastiCache | • Fast temporary storage for small amounts of data • Highly volatile data (non-persistent) |
Amazon Relational Database Service (RDS)
- Amazon Relational Database Service (Amazon RDS) is a managed service that makes it easy to set up, operate, and scale a relational database in the cloud.
- Automated backups and patching applied in customer-defined maintenance windows
- Push-button scaling, replication and redundancy
- Amazon RDS supports the following database engines
- Amazon Aurora (proprietary AWS database engine).
- MySQL.
- MariaDB.
- Oracle.
- SQL Server
- PostgreSQL
- RDS is a managed service and you do not have access to the underlying EC2 instance (no root access).
Amazon RDS – Scalability
- You can only scale RDS up (compute and storage).
- You cannot decrease the allocated storage for an RDS instance
- You can scale storage and change the storage type for all DB engines except MS SQL.
- For MS SQL the workaround is to create a new instance from a snapshot with the new configuration.
- Scaling storage can happen while the RDS instance is running without outage however there may be performance degradation
- Scaling compute will cause downtime
- You can choose to have changes take effect immediately, however the default is within the maintenance window.
Amazon RDS – Multi-AZ and Read Replicas
| Multi-AZ Deployments | Read Replicas |
|---|---|
| Synchronous replication – highly durable | Asynchronous replication – highly scalable |
| Only database engine on primary instance is active | All read replicas are accessible and can be used for read scaling |
| Automatic failover to standby when a problem is detected | Can be manually promoted to a standalone database instance |
| Always span two Availability Zones within a single Region | Can be within an Availability Zone, Cross-AZ, or Cross- Region |
| Automated backups are taken from standby | No backups configured by default |
| Database engine version upgrades happen on primary | Database engine version upgrade is independent from source instance |
Amazon RDS Aurora Key Features
| Aurora Feature | Benefit |
|---|---|
| High performance and scalability | Offers high performance, self-healing storage that scales up to 64TB, point-in-time recovery and continuous backup to S3 |
| DB compatibility | Compatible with existing MySQL and PostgreSQL open source databases |
| Aurora Replicas | In-region read scaling and failover target – up to 15 (can use Auto Scaling |
| MySQL Read Replicas | Cross-region cluster with read scaling and failover target – up to 5 (each can have up to 15 Aurora Replicas) |
| Global Database | Cross-region cluster with read scaling (fast replication / low latency reads). Can remove secondary and promote |
| Multi-Master | Scales out writes within a region. In preview currently and will not appear on the exam |
| Serverless | On-demand, autoscaling configuration for Amazon Aurora – does not support read replicas or public IPs (can only access through VPC or Direct Connect – not VPN) |
Amazon RDS Aurora Replicas
| Feature | Aurora Replica | MySQL Replica |
|---|---|---|
| Number of replicas | Up to 15 | Up to 5 |
| Replication type | Asynchronous (milliseconds) | Asynchronous (seconds) |
| Performance impact on primary | Low | High |
| Replica location | In-region | Cross-region |
| Act as failover target | Yes (no data loss) | Yes (potentially minutes of data loss) |
| Automated failover | Yes | No |
| Support for user-defined replication delay | No | Yes |
| Support for different data or schema vs. primary | No | Yes |
Amazon ElastiCache
- Fully managed implementations of two popular in-memory data stores – Redis and Memcached.
- ElastiCache is a web service that makes it easy to deploy and run Memcached or Redis protocol-compliant server nodes in the cloud.
- Can be put in front of databases such as RDS and DynamoDB – sits between the application and the database
- Good if your database is particularly read-heavy and the data does not change frequently.
- Billed by node size and hours of use
- Elasticache EC2 nodes cannot be accessed from the Internet, nor can they be accessed by EC2 instances in other VPCs.
Amazon ElastiCache – Caching Strategies
- Lazy Loading
- Write Through
- Dealing with stale data – Time to Live (TTL
Lazy Loading
- Loads the data into the cache only when necessary (if a cache miss occurs).
- Lazy loading avoids filling up the cache with data that won’t be requested
- If requested data is in the cache, ElastiCache returns the data to the application
- If the data is not in the cache or has expired, ElastiCache returns a null.
- The application then fetches the data from the database and writes the data received into the cache so that it is available for next time.
- Data in the cache can become stale if Lazy Loading is implemented without other strategies (such as TTL).
Write Through
- When using a write through strategy, the cache is updated whenever a new write or update is made to the underlying database.
- Allows cache data to remain up-to-date.
- Without a Time To Live (TTL) you can end up with a lot of cached data that is never read
Dealing with stale data – Time to Live (TTL)
- The drawbacks of lazy loading and write through techniques can be mitigated by a TTL.
- The TTL specifies the number of seconds until the key (data) expires to avoid keeping stale data in the cache.
- When reading an expired key, the application checks the value in the underlying database (note- for expired key it doesn’t returns null as opposed to when there is cache miss n first place)
- Lazy Loading treats an expired key as a cache miss and causes the application to retrieve the data from the database and subsequently write the data into the cache with a new TTL
- Depending on the frequency with which data changes this strategy may not eliminate stale data – but helps to avoid it.
Exam tip: the key use cases for ElastiCache are offloading reads from a Database, and storing the results of computations and session state. Also, remember that ElastiCache is an in-memory database and it’s a managed service (so you can’t run it on EC2).
Amazon ElastiCache – Engines
| Feature | Memcached | Redis (cluster mode disabled) | Redis (cluster mode enabled) |
|---|---|---|---|
| Data persistence | No | Yes | Yes |
| Data types | Simple | Complex | Complex |
| Data partitioning | Yes | No | Yes |
| Encryption | No | Yes | Yes |
| High availability (replication) | No | Yes | Yes |
| Multi-AZ | Yes, place nodes in multiple AZs. No failover or replication | Yes, with auto-failover. Uses read replicas (0-5 per shard) | Yes, with auto-failover. Uses read replicas (0- 5 per shard) |
| Scaling | Up (node type); out (add nodes) | Single shard (can add replicas) | Add shards |
| Multithreaded | Yes | No | No |
| Backup and restore | No (and no snapshots) | Yes, automatic and manual snapshots | Yes, automatic and manual snapshots |
Amazon ElastiCache – Memcached
- Simplest model and you can run large nodes
- Memcached can be scaled in and out
Amazon ElastiCache – Redis
- Open-source in-memory key-value store
- Supports more complex data structures: sorted sets and lists
- Supports master / slave replication and multi-AZ for cross-AZ redundancy