System-Design Data-Intensive Application

PART-II Distributed data



In Part I : we discussed aspects of data systems that apply when data is stored on a single machine.
Now, in Part II, we move up a level and ask: what happens if multiple machines are involved in storage and retrieval of data.

Why?Reason to distribute a database across multiple machines:

  1. Scalability: one m/c not sufficient to handle read/write load another can be added
  2. Fault tolerance/high availability: if one machine goes down another can take its place.
  3. Latency: data center/machine geologically close to user to avoid data traveling half way around the world.

How to ensure distributed data?
There are two common ways data is distributed across multiple nodes:

  1. Replication: Keeping a copy of the same data on several different nodes,if some nodes are unavailable, the data can still be served from the remaining nodes
  2. Partitioning: Splitting a big database into smaller subsets called partitions so that different partitions can be assigned to different nodes (also known as sharding).

These are separate mechanisms, but they often go hand in hand,
i.e Create partition and then replicate the partitions


Chapter 5:Replication


Brief:

Replication is easy when the data you are replicating doesn’t change over time. but all the difficulty lies in handling CHANGES to replicated data.
3 Popular algorithms for replicating the data: (summary)

TODO: Know practically the replication policy either from confluence or from leader(preferred)..if it’s distrbuted system(nodes) .it folows 1 of 3 algo to replicat..
analogy deplyoing the ear(it is also kind of replication on nodes)..

Reason for downtime of a node:
1.node crashes(one reason ::memory full) and restarted
2.power outages
2.network is temprorily interrupted
There is no foolproof way of detecting what has gone wrong, so most systems simply use a timeout.

Follower failure:catsup recovery
Leader Failure : fail-over

Implementation of Replication logs:
1,Statement Logs:
definition ..
benefits:
Drawbacks;

2.

Problem eith Replication lag:
terms:
eventually consistent system,read-after write, monotonic read,consistent prefix read(these are the terms opposite/complementary to eventually consistent)

1.Reading your own write:
With asynchronous replication, there is a problem.
if the
user views the data shortly after making a write, the new data may not yet have
reached the replica. To the user, it looks as though the data they submitted was lost, so they will be understandably unhappy.
In this situation, we need read-after-write consistency

How can we implement read-after-write consistency in a system with leader-based
replication?
There are various possible techniques. To mention a few:
1.When reading something that the user may have modified, read it from the
leader; otherwise, read it from a follower. This requires that you have some way
of knowing whether something might have been modified, without actually
querying it.

For example:
user profile information on a social network is nor‐
mally only editable by the owner of the profile, not by anybody else. Thus, a sim‐
ple rule is: always read the user’s own profile from the leader, and any other
users’ profiles from a follower.

2.If most things in the application are potentially editable by the user, that
approach won’t be effective, as most things would have to be read from the
leader (negating the benefit of read scaling). In that case, other criteria may be
used to decide whether to read from the leader

3.The client can remember the timestamp of its most recent write—then the sys‐
tem can ensure that the replica serving any reads for that user reflects updates at
least until that timestamp. If a replica is not sufficiently up to date, either the read
can be handled by another replica or the query can wait until the replica has caught up.

2.Monotonic Read
Problem : asyn read form different follower : user see things moving backward in time.
Descritpion :
monotonic reads only means
that if one user makes several reads in sequence, they will not see time go backward—
i.e., they will not read older data after having previously read newer data.
IMplementation:
One way of achieving monotonic reads is to make sure that each user always makes
their reads from the same replica (different users can read from different replicas).
For example, the replica can be chosen based on a hash of the user ID, rather than
randomly. However, if that replica fails, the user’s queries will need to be rerouted to
another replica.

3.Consistent Prefix Reads
Problem :
answer before question
what is Consistent Prefix Reads? how it resolves above problem

MultiLeader REplication:
useful in case of multi-datacenter
SIngle Leader vs multileader in multi-datacenter?:
Problem with multileader:
Writeconflict, diagram
Synchronous versus asynchronous conflict detection

TODO:
is request to particular data center controllable in multi-datacenter or multi;eader arcchitecutre? so as to ensure better performance and consistency

Leaderless replication:Dynamo-style replication
stale data problem : resolution read-replair,anti-entropy process
Quorums for reading and writing :w+r>n
choice of write(w),read(r)and nodes(n), mostly configurable in dynamostyle db.
in read intensive application make w=n and r=1,this makes read faster but disadvante=ae of node failing.
With n = 3, w = 2, r = 2 we can tolerate one unavailable node
ormally, reads and writes are always sent to all n replicas in parallel. The
parameters w and r determine how many nodes we wait for

Often, r and w are chosen to be a majority (more than n/2) of nodes, because that
ensures w + r > n while still tolerating up to n/2 node failures

Sloppy Quorum:
perform write operation anyways evenif nodes with values are not available. the write is performed on w node where value doesnot exists as temprorily location
HInted handoff:
ONce node becomes avlbl the data is transferred from temporoy location w node to n node with value.


Unit 6 : Partitioning ( Sharding ) :-


Index

In this chapter we will first look at

  1. Different approaches for partitioning large datasets and observe how the indexing of data interacts with partitioning.
  2. Talk about re-balancing, which is necessary if you want to add or remove nodes in your cluster.
  3. Finally, we’ll get an overview of how databases route requests to the right partitions and execute queries.

Introduction

Normally, partitions are defined in such a way that each piece of data (each record,
row, or document) belongs to exactly one partition.

Why do we need partitioning?
The main reason for wanting to partition data is scalability.

Impact of (database) partitioning on Queries –

For queries that operate on a single partition, each node can independently execute
the queries for its own partition, so query throughput can be scaled by adding more
nodes. Large, complex queries can potentially be parallelised across many nodes,
although this gets significantly harder.

Different partitions can be placed on different nodes in a shared-nothing cluster

Partitioning and Replication

There are various ways of achieving this, which we discuss in depth in this unit.

Two major ways of partitioning:

  1. Key Range Partitioning
    1. why? random distribution will make the read hard/(another term for this)
  2. Hash partitioning
    1. better distributed than key-range

Partitioning an Secondery index:

  1. Based on document
    disadvantage: while reading scatter/aggregate from different partitions as the partition is based on primary index.
  2. Based on term

Re-balancing:
Definition: distributing load(read,write storage)
Strategies of Rebalancing:
– How not to do it- Hash Mod n
How to do:
Fixed partitioning.
dynamic partitioning.

Routing Requests
aide of 3rd party vendor like Zookeeper(it is partition key-range aware) is used to route request.


Unit 7:Transaction


Terms:
transaction Commited/uncommited to database
–>INtroduction:
A transaction is usually understood as a mechanism for
grouping multiple operations on multiple objects into one unit of execution.

1.Definatoin
A transaction is a way for an application to group several reads and writes
together into a logical unit.
Advatanges: levay on some application errors

2.Trade-off
perfromance vs consistency using transaction

Isssues with application and the DB Algrithms that provide safeuargs agains them will be disccused throught out this chapter.

Concept of Transaction:
Almost all relational databases today, and some nonrelational databases, support transactions.(POint: Db COMPARISNO BASED on support for transacation)

–>Transaction has advantages and limilations (apart from Availailiyty/consitancey vs performace) that can be described by going into details of guarantees that transactions can provide:

The Meaning of ACID
“The safety guarantees provided by transactions are often described by the well-
known acronym ACID”, which stands for Atomicity, Consistency, Isolation, and Dura‐
bility.

there is a lot of ambiguity
around the meaning of isolation . “The high-level idea is sound, but the devil is in the details. “
Lets discuss each term of ACID:
Atomicity:
In general, atomic refers to something that cannot be broken down into smaller parts.
The word means similar but subtly different things in different branches of comput‐ing. For example, in multi-threaded programming, if one thread executes an atomic
operation, that means there is no way that another thread could see the half-finished
result of the operation. The system can only be in the state it was before the operation
or after the operation, not something in between.
By contrast, in the context of ACID,It does not
describe what happens if several processes try to access the same data at the same
time, because that is covered under the letter I, for isolation

Rather, ACID atomicity describes what happens if a client wants to make several
writes, but a fault occurs after some of the writes have been processed. —for example,
a process crashes, a network connection is interrupted, a disk becomes full, or some
integrity constraint is violated.

The ability to abort a transaction on error and have all writes from that transaction
discarded is the defining feature of ACID atomicity. Perhaps abortability would have
been a better term than atomicity,

Consistency:
The word consistency is terribly overloaded:
• In Chapter 5 we discussed replica consistency and the issue of eventual consis‐
tency that arises in asynchronously replicated systems (see “Problems with Repli‐
cation Lag” on page 161).
• Consistent hashing is an approach to partitioning that some systems use for reba‐
lancing (see “Consistent Hashing” on page 204).
• In the CAP theorem (see Chapter 9), the word consistency is used to mean linear‐
izability (see “Linearizability” on page 324).
• In the context of ACID, consistency refers to an application-specific notion of the
database being in a “good state.”

The idea of ACID consistency is that you have certain statements about your data
(invariants) that must always be true—for example, in an accounting system, credits
and debits across all accounts must always be balanced.

Does the Database provides COnsistency:
it’s the application’s responsibility to define its transactions correctly so that they
preserve consistency. This is not something that the database can guarantee: if you
write bad data that violates your invariants, the database can’t stop you. (Some spe‐
cific kinds of invariants can be checked by the database, for example using foreign
key constraints or uniqueness constraints. However, in general, the application
defines what data is valid or invalid—the database only stores it.)

Atomicity, isolation, and durability are properties of the database, whereas consis‐
tency (in the ACID sense) is a property of the application. The application may rely
on the database’s atomicity and isolation properties in order to achieve consistency,
but it’s not up to the database alone.

Isolation:
Isolation in the sense of ACID means that concurrently executing transactions are
isolated from each other: they cannot step on each other’s toes
he database
ensures that when the transactions have committed, the result is the same as if they
had run serially (one after another), even though in reality they may have run con‐
currently
However, in practice, serializable isolation is rarely used, because it carries a perfor‐
mance penalty
Some popular databases, such as Oracle 11g, don’t even implement it.
In Oracle there is an isolation level called “serializable,” but it actually implements
something called snapshot isolation, which is a weaker guarantee than serializability

Durability
Durability is the promise that once a transaction has com‐
mitted successfully, any data it has written will not be forgotten, even if there is a
hardware fault or the database crashes.

In a single-node database, durability typically means that the data has been written to
nonvolatile storage such as a hard drive or SSD.

In a replicated database, durabil‐
ity may mean that the data has been successfully copied to some number of nodes. In
order to provide a durability guarantee, a database must wait until these writes or
replications are complete before reporting a transaction as successfully committed.

One Object vs Number of OBject:
Atomicity and isolatoin describes -What to do if several “writes are taking place in same transaction”

Atomicity id concept related to writes within transaction whereas
Isolation is a concept related to differnt transaction not being interfering to each other. i.e either all writes of ransaction A s visible to transaction B or none,

Multi-object transactions require some way of determining which read and write
operations belong to the same transaction.
[Prove this point] diff btw relational and doc db wrt transaction:
In relational databases, that is typically
done based on the client’s TCP connection to the database server: on any particular
connection, everything between a BEGIN TRANSACTION and a COMMIT statement is
considered to be part of the same transaction.

On the other hand, many nonrelational databases don’t have such a way of grouping
operations together. Even if there is a multi-object API (for example, a key-value
store may have a multi-put operation that updates several keys in one operation), that
doesn’t necessarily mean it has transaction semantics: the command may succeed for
some keys and fail for others, leaving the database in a partially updated state.

Single-object writes
Atomicity and isolation also apply when a single object is being changed. For exam‐
ple, imagine you are writing a 20 KB JSON document to a database:
• If the network connection is interrupted after the first 10 KB have been sent, does
the database store that unparseable 10 KB fragment of JSON?
Above issue would be incredibly confusing, so storage engines almost universally
aim to provide atomicity and isolation on the level of a single object (such as a key-
value pair) on one node. Atomicity can be implemented using a log for crash recov‐
ery (see “Making B-trees reliable” on page 82), and isolation can be implemented
using a lock on each object (allowing only one thread to access an object at any one
time).
So SIngle Object write wouldnot be transaction in true sense if we go by defination of transaction as it occurs on multiple object(refer defination and corresponding line)

The need for multi-object transactions
There are some use cases in which single-object inserts, updates, and deletes are suffi‐
cient. However, in many other cases writes to several different objects need to be
coordinated:
-IN case of relational DB,Multi-object transactions allow you to ensure that these refer‐
ences remain valid: when inserting several records that refer to one another, the

Foreign keys have to be correct and up to date, or the data becomes nonsensical.

Handling errors and aborts:
A key feature of a transaction is that it can be aborted and safely retried if an error
occurred.

Not all systems follow that philosophy, though. In particular, datastores with leader‐
less replication (see “Leaderless Replication” on page 177) work much more on a
“best effort” basis, which could be summarized as “the database will do as much as it
can, and if it runs into an error, it won’t undo something it has already done”—so it’s
the application’s responsibility to recover from errors.

For example, pop‐
ular object-relational mapping (ORM) frameworks such as Rails’s ActiveRecord and
Django don’t retry aborted transactions—the error usually results in an exception
bubbling up the stack, so any user input is thrown away and the user gets an error
message. This is a shame, because the whole point of aborts is to enable safe retries.

Weak Isolation Levels:
If transacction donot modify same data concurrenlty then it’s safe to run them parallely. concurrency issues(RACE Condition) OCCURS when there is same data is mudified by multiple trsnsactions)
Isolation has performance penalty(Why? check details or guess). NOte: in FInanacial domain use db with ACID property, Relatal DBs that says to provide ACID, are actually not at full level(Better wording).
so we are going to discuss number of weak isolatoin(having somelevel of consurrrency issue) to help you decide best among the isolation levels:
REad Commmited
it gurantees 2 things:
1.Read only data from db that is commited.
2.While Writing to db,overwrite write only the commited changes.

        NO Dirty REads:
         

[refer picture]

if a transaction Ta read from DB that has yet not been commited(meaning or commited and in chich case the uncommited data is visible to another transaction) by Transcation Tb is it called Dirty reads. In other words if uncommited changes of a transaction Ta are visible to transaction Tb then it is called Dirty Reads . it has very confusing impact (see pdf for impacts) NO Dirty Write: advatages if no dirty writes o nconcurrency problem. Terms: transaction Commited/uncommited to database Implementing read commited: not-recommended way: the transaction which is reading acquire the same lock as write-lock(better statement from pdf) so that if transacction has write lock aquired on a object or row, read is not allowed. this has drawback that reads have to wait if there is time taking write transation. recmmended way: for every object he db remebers the old commit and the new value change by tansaction that hold the write lock . so in cae of read it return the old value. Snapshot Isolatoin and Repeatable reads: Preventing Lost updates Write Skew and Phantoms:

Published by

Unknown's avatar

sevanand yadav

software engineer working as web developer having specialization in spring MVC with mysql,hibernate

Leave a comment