Database transactions: isolation levels

Jan, 12 2026

This is probably one of the most frequent topics in system design conversations. Database transactions are used almost everywhere, from small applications to large enterprise systems. Even though they look simple on the surface, the devil lives in the details. Let’s dig into them.

Systems will fail. Race conditions, hardware or network failures, interruptions. Many bad things can happen while applications are running. These failures can impact data consistency and, consequently, the overall user experience.

Martin Kleppmann, in Designing Data-Intensive Applications, says that a transaction is a way for an application to group several reads and writes together into a logical unit [DDIA, page 222].

Transactions are present not only in relational databases, but also in some non-relational ones. Historically, some systems partially or entirely abandoned transactions to achieve higher performance or availability [DDIA, p. 223]. That doesn’t mean this was a bad idea. In many cases, it was a deliberate trade-off. Some applications value consistency over scalability; others make the opposite choice. In the end, it’s all about trade-offs.

ACID

Atomicity, Consistency, Isolation, and Durability. These define the safety guarantees of a transaction, though not all databases implement them in the same way. This is probably the most famous acronym when it comes to databases.

Atomicity. Within a transaction, either every change succeeds, or none of them do. No half-complete updates, no broken states. It’s worth emphasizing that atomicity is not about concurrency. It does not describe what happens when multiple processes access the same data at the same time [DDIA, p. 224]. We’ll get there.

Consistency means that every transaction takes the database from one valid state to another, respecting defined rules, constraints, and integrity checks. No broken invariants, ever. It’s largely the application’s responsibility to write transactions correctly in order to achieve consistency.

Kleppmann says that "consistency" is more like an application property than a database property.

Isolation. Concurrent transactions don’t interfere with each other. Each one behaves as if it’s running alone, even when many are happening at once. Some authors equate isolation with serializability, the strongest isolation level, where the outcome is the same as if transactions had run one after another. In practice, most databases expose weaker isolation levels that trade strict guarantees for performance [DDIA].

Durability means that once a transaction commits successfully, its changes will persist, even in the face of crashes or power loss.

In a single-node database, durability typically means the data has been written to stable storage (such as an SSD or hard drive, often via a write-ahead log), whereas in a replicated database, durability may also require that the data has been successfully copied to one or more replica nodes.

Depending on the system’s configuration, the database may wait for acknowledgments from a quorum or a specific number of replicas before reporting the transaction as committed. I won't go that far into replication in this post.

Isolation levels

For me, this is the trickiest part. Isolation is often confused with atomicity.

Take race conditions as an example. Atomicity, in a database context, is not about concurrency, meaning different processes accessing the same data at the same time.. That is what the isolation property addresses.

Before diving into isolation levels, you need to understand the anomalies they prevent:

Dirty Reads. Transaction T1 reads data modified by T2 before T2 commits. If T2 rolls back, T1 has read data that never existed in a committed state.

Non-Repeatable Reads. Transaction T1 reads a row, T2 modifies and commits that row, then T1 reads the same row again and gets different data.

Phantom Reads. T1 executes a query with a WHERE clause, T2 inserts or deletes rows that match that WHERE clause and commits, then T1 re-executes the same query and sees different rows.

Write Skew. Two transactions read overlapping data sets, make decisions based on what they read, then modify disjoint data sets in ways that violate a constraint when combined.

With that in mind, let's go deeper into the main isolation levels, their trade-offs, how they are implemented in some databases and the use cases where they apply.

Read committed

Each query only sees data that was committed before the query began executing. This isolation level prevents dirty reads, which is the ability to see uncommitted changes from other transactions.

There are a few ways to implement this isolation level. Let's focus on two: locking and snapshot.

In a locking-based implementation, when transaction T1 modifies a row, it acquires a lock on that row. If T2 tries to read the same row before T1 commits, T2 must wait until T1 releases the lock. This prevents T2 from seeing T1's uncommitted change, achieving the "no dirty reads" guarantee.

The downside: if T1 is a long-running transaction, T2 (and potentially many other transactions) wait, creating bottlenecks that degrade performance, which will most likely impact overall user experience.

Instead, most modern databases use Multi-Version Concurrency Control (MVCC) to implement Read Committed. MVCC is a concurrency control method that maintains multiple versions of the same data simultaneously. Instead of locking rows, the database keeps old versions available so readers can see consistent data while writers create new versions.

When transaction T1 modifies a row, it creates a new version tagged with T1's transaction ID. The old version remains accessible. When T2 tries to read that row, T2 checks: "Has T1 committed?" If not, T2 simply reads the previous committed version. No locks required, no waiting. Once T1 commits, its changes become visible to subsequent queries.

This allows readers and writers to work concurrently without blocking each other, a massive improvement over traditional locking approaches.

Trade-offs

What you gain

Dirty read protection
Better concurrency, faster performance for high-throughput systems
Simplicity as we only see what's committed right now
Statement-level consistency as each individual query sees a consistent state

What you need to watch for:

Non-repeatable reads: same query can run multiple times and get different results within your transaction
Phantom reads can happen. Rows can appear or disappear between queries in the same transaction.
Read skew. You can see an inconsistent view of related data.
Lost updates: two transactions read the same value, both calculate updates based on that value, and then both write back their results, causing one update to silently overwrite the other.

Use cases

Payments: balance checks with high throughput

A very common use case for Read Committed is payment systems where you need to read current data, but you don’t require strict repeatability across multiple reads in the same transaction.

Example: checking an account balance before authorizing a payment.

In this scenario, each query seeing the latest committed data is usually what you want. If another transaction updates the balance and commits between two reads, observing the new value is acceptable, sometimes even desirable. What matters is that you never read uncommitted data.

Read Committed works well here because:

It prevents dirty reads (you never see half-finished payments).
It allows high concurrency under heavy load.
It minimizes lock contention in read-heavy systems.

What you still need to watch for:

Lost updates if you implement "read → calculate → write” without safeguards.

You often need explicit locking to prevent lost updates

BEGIN;
SELECT id, balance FROM accounts WHERE id = 1 FOR UPDATE;
UPDATE accounts SET balance = 200 WHERE id = 1;
COMMIT;

or you can use atomic updates

UPDATE accounts SET balance = balance - 100;

In practice, many payment systems use Read Committed plus explicit row locks around critical updates.

Counters and metrics: approximate correctness

Another classic Read Committed use case is counters: page views, likes, impressions, analytics metrics.

These systems care more about throughput than perfect accuracy. If two transactions overwrite each other occasionally, the impact is negligible at scale.

Read Committed is a good fit because:

Reads and writes don’t block each other.
Lost updates are acceptable within some error margin.
Simpler mental model and lower overhead.

If you truly need exact counts, you usually don’t solve that with stronger isolation, you change the data model (append-only logs, event streams, or sharded counters).

Repeatable Read / Snapshot isolation

Every transaction reads from a frozen snapshot of the database taken at its start time. It follows the common mantra: reads never block writes, and writes never block reads. In practice, snapshot isolation means that a transaction reads from a consistent snapshot of committed data, while always being able to see its own writes.

Many authors loosely refer to snapshot isolation as "repeatable read", which is a useful mental model, but this mapping depends on the database implementation. Within the same transaction, a read operation will always return the same result, unless the transaction itself updates the row. Hence, repeatable reads.

Let me give you an example:

Time  Transaction A          Transaction B          Database State
-------------------------------------------------------------------
T1    BEGIN                                        balance = $100
T2                           BEGIN
T3    UPDATE balance = $150
T4    COMMIT                                       balance = $150
T5                           SELECT balance        Sees: $100 (not $150!)
T6                           COMMIT

Transaction B started at T2, so it sees the snapshot from that time. Even though Transaction A committed a change at T4, Transaction B still sees the old value because that's what existed in its snapshot.

It's worth mentioning that most modern databases also use MVCC to implement snapshot isolation. Yes, it might be confusing, but "read committed" and "repeatable reads" both use MVCC, but a bit differently. In read committed, you'll see only the latest committed data at the moment of each query, whereas in "repeatable reads", you see data as it was at the start of your transaction.

Let's illustrate this comparison:

Time    Transaction A               Transaction B (Read Committed)    Transaction C (Snapshot Isolation)
--------------------------------------------------------------------------------------------------------
T1                                  BEGIN                             BEGIN
T2                                  SELECT balance → $100             SELECT balance → $100
T3      BEGIN
T4      UPDATE balance = $150
T5      COMMIT
T6                                  SELECT balance → $150             SELECT balance → $100
T7                                  COMMIT                            COMMIT

Yes, same transaction, different results. Transaction B (Read Committed) sees $150 on the second read, whereas transaction C (Snapshot Isolation) still sees $100.

Trade-offs

What you gain:

No dirty reads (seeing uncommitted changes)
No non-repeatable reads (same query, different results)
Phantom reads are usually prevented for simple reads, though behavior can vary depending on the database and query pattern.
Read queries never block writes, writes never block reads (most MVCC implementations ensure that.)

What you need to watch for:

Write skew anomalies: Two transactions read the same data, make decisions based on it, and both commit successfully even though together they violate a constraint
Lost updates: If two transactions read, modify, and write back, the last write wins

Use cases

Bookings and reservations: consistency within a workflow

Snapshot isolation shines in multi-step workflows where you need a stable view of data across several reads.

A classic example is a booking system: check availability, calculate price, apply discounts, and create a reservation.

Within a single transaction, you want all reads to agree with each other. Seeing availability change halfway through the workflow would create confusing behavior or require complex retry logic.

Snapshot isolation works well because:

All reads see a consistent snapshot.
Non-repeatable reads disappear.
The user-facing logic becomes much easier to reason about.

What you need to watch for:

Write skew when multiple concurrent bookings read the same availability and reserve different rows.
You often need additional constraints or explicit locking to enforce invariants like “only N rooms available”.

Snapshot isolation makes the workflow consistent. However, it does not magically enforce global constraints.

Isolation levels like "repeatable read" help avoid many anomalies, but there is still widespread confusion about what they actually guarantee, especially "repeatable read." Many explanations, and even some documentation, treat it as if it were fully serializable.

In practice, most databases implement "repeatable read" using snapshot isolation. Snapshot isolation provides a consistent view of the database for the duration of a transaction, but it does not guarantee serializability and allows anomalies such as write skew.

In Postgres, the repeatable read level is implemented as snapshot isolation using MVCC. It prevents non-repeatable reads but is not serializable. In MySQL, repeatable read behaves differently, avoiding many phantom-like effects through a combination of MVCC and locking, while using different snapshot semantics. Other systems, such as db2, use different terminology altogether and expose yet another set of isolation semantics.

Because of these differences, isolation level names alone are not reliable. Always consult your database’s documentation to understand the guarantees you are actually getting.

In the next post, we will look at the "lost update" anomaly and how stronger isolation levels commbined with additional mechanisms can be used to enforce correctness.