
This is probably one of the most frequent topics in system design conversations. Database transactions are used almost everywhere, from small applications to large enterprise systems. Even though they look simple on the surface, the devil lives in the details. Let’s dig into them.
Systems will fail. Race conditions, hardware or network failures, interruptions. Many bad things can happen while applications are running. These failures can impact data consistency and, consequently, the overall user experience.
Martin Kleppmann, in Designing Data-Intensive Applications, says that a transaction is a way for an application to group several reads and writes together into a logical unit [DDIA, page 222].
Transactions are present not only in relational databases, but also in some non-relational ones. Historically, some systems partially or entirely abandoned transactions to achieve higher performance or availability [DDIA, p. 223]. That doesn’t mean this was a bad idea. In many cases, it was a deliberate trade-off. Some applications value consistency over scalability; others make the opposite choice. In the end, it’s all about trade-offs.
Atomicity, Consistency, Isolation, and Durability. These define the safety guarantees of a transaction, though not all databases implement them in the same way. This is probably the most famous acronym when it comes to databases.
Atomicity. Within a transaction, either every change succeeds, or none of them do. No half-complete updates, no broken states. It’s worth emphasizing that atomicity is not about concurrency. It does not describe what happens when multiple processes access the same data at the same time [DDIA, p. 224]. We’ll get there.
Consistency means that every transaction takes the database from one valid state to another, respecting defined rules, constraints, and integrity checks. No broken invariants, ever. It’s largely the application’s responsibility to write transactions correctly in order to achieve consistency.
Kleppmann says that "consistency" is more like an application property than a database property.
Isolation. Concurrent transactions don’t interfere with each other. Each one behaves as if it’s running alone, even when many are happening at once. Some authors equate isolation with serializability, the strongest isolation level, where the outcome is the same as if transactions had run one after another. In practice, most databases expose weaker isolation levels that trade strict guarantees for performance [DDIA].
Durability means that once a transaction commits successfully, its changes will persist, even in the face of crashes or power loss.
In a single-node database, durability typically means the data has been written to stable storage (such as an SSD or hard drive, often via a write-ahead log), whereas in a replicated database, durability may also require that the data has been successfully copied to one or more replica nodes.
Depending on the system’s configuration, the database may wait for acknowledgments from a quorum or a specific number of replicas before reporting the transaction as committed. I won't go that far into replication in this post.
For me, this is the trickiest part. Isolation is often confused with atomicity.
Take race conditions as an example. Atomicity, in a database context, is not about concurrency, meaning different processes accessing the same data at the same time.. That is what the isolation property addresses.
Before diving into isolation levels, you need to understand the anomalies they prevent:
Dirty Reads. Transaction T1 reads data modified by T2 before T2 commits. If T2 rolls back, T1 has read data that never existed in a committed state.
Non-Repeatable Reads. Transaction T1 reads a row, T2 modifies and commits that row, then T1 reads the same row again and gets different data.
Phantom Reads. T1 executes a query with a WHERE clause, T2 inserts or deletes rows that match that WHERE clause and commits, then T1 re-executes the same query and sees different rows.
Write Skew. Two transactions read overlapping data sets, make decisions based on what they read, then modify disjoint data sets in ways that violate a constraint when combined.
With that in mind, let's go deeper into the main isolation levels, their trade-offs, how they are implemented in some databases and the use cases where they apply.
Each query only sees data that was committed before the query began executing. This isolation level prevents dirty reads, which is the ability to see uncommitted changes from other transactions.
There are a few ways to implement this isolation level. Let's focus on two: locking and snapshot.
In a locking-based implementation, when transaction T1 modifies a row, it acquires a lock on that row. If T2 tries to read the same row before T1 commits, T2 must wait until T1 releases the lock. This prevents T2 from seeing T1's uncommitted change, achieving the "no dirty reads" guarantee.
The downside: if T1 is a long-running transaction, T2 (and potentially many other transactions) wait, creating bottlenecks that degrade performance, which will most likely impact overall user experience.
Instead, most modern databases use Multi-Version Concurrency Control (MVCC) to implement Read Committed. MVCC is a concurrency control method that maintains multiple versions of the same data simultaneously. Instead of locking rows, the database keeps old versions available so readers can see consistent data while writers create new versions.
When transaction T1 modifies a row, it creates a new version tagged with T1's transaction ID. The old version remains accessible. When T2 tries to read that row, T2 checks: "Has T1 committed?" If not, T2 simply reads the previous committed version. No locks required, no waiting. Once T1 commits, its changes become visible to subsequent queries.
This allows readers and writers to work concurrently without blocking each other, a massive improvement over traditional locking approaches.
What you gain
What you need to watch for:
Payments: balance checks with high throughput
A very common use case for Read Committed is payment systems where you need to read current data, but you don’t require strict repeatability across multiple reads in the same transaction.
Example: checking an account balance before authorizing a payment.
In this scenario, each query seeing the latest committed data is usually what you want. If another transaction updates the balance and commits between two reads, observing the new value is acceptable, sometimes even desirable. What matters is that you never read uncommitted data.
Read Committed works well here because:
What you still need to watch for:
You often need explicit locking to prevent lost updates
BEGIN;
SELECT id, balance FROM accounts WHERE id = 1 FOR UPDATE;
UPDATE accounts SET balance = 200 WHERE id = 1;
COMMIT;
or you can use atomic updates
UPDATE accounts SET balance = balance - 100;
In practice, many payment systems use Read Committed plus explicit row locks around critical updates.
Counters and metrics: approximate correctness
Another classic Read Committed use case is counters: page views, likes, impressions, analytics metrics.
These systems care more about throughput than perfect accuracy. If two transactions overwrite each other occasionally, the impact is negligible at scale.
Read Committed is a good fit because:
If you truly need exact counts, you usually don’t solve that with stronger isolation, you change the data model (append-only logs, event streams, or sharded counters).
Every transaction reads from a frozen snapshot of the database taken at its start time. It follows the common mantra: reads never block writes, and writes never block reads. In practice, snapshot isolation means that a transaction reads from a consistent snapshot of committed data, while always being able to see its own writes.
Many authors loosely refer to snapshot isolation as "repeatable read", which is a useful mental model, but this mapping depends on the database implementation. Within the same transaction, a read operation will always return the same result, unless the transaction itself updates the row. Hence, repeatable reads.
Let me give you an example:
Time Transaction A Transaction B Database State
-------------------------------------------------------------------
T1 BEGIN balance = $100
T2 BEGIN
T3 UPDATE balance = $150
T4 COMMIT balance = $150
T5 SELECT balance Sees: $100 (not $150!)
T6 COMMIT
Transaction B started at T2, so it sees the snapshot from that time. Even though Transaction A committed a change at T4, Transaction B still sees the old value because that's what existed in its snapshot.
It's worth mentioning that most modern databases also use MVCC to implement snapshot isolation. Yes, it might be confusing, but "read committed" and "repeatable reads" both use MVCC, but a bit differently. In read committed, you'll see only the latest committed data at the moment of each query, whereas in "repeatable reads", you see data as it was at the start of your transaction.
Let's illustrate this comparison:
Time Transaction A Transaction B (Read Committed) Transaction C (Snapshot Isolation)
--------------------------------------------------------------------------------------------------------
T1 BEGIN BEGIN
T2 SELECT balance → $100 SELECT balance → $100
T3 BEGIN
T4 UPDATE balance = $150
T5 COMMIT
T6 SELECT balance → $150 SELECT balance → $100
T7 COMMIT COMMIT
Yes, same transaction, different results. Transaction B (Read Committed) sees $150 on the second read, whereas transaction C (Snapshot Isolation) still sees $100.
What you gain:
What you need to watch for:
Bookings and reservations: consistency within a workflow
Snapshot isolation shines in multi-step workflows where you need a stable view of data across several reads.
A classic example is a booking system: check availability, calculate price, apply discounts, and create a reservation.
Within a single transaction, you want all reads to agree with each other. Seeing availability change halfway through the workflow would create confusing behavior or require complex retry logic.
Snapshot isolation works well because:
What you need to watch for:
Snapshot isolation makes the workflow consistent. However, it does not magically enforce global constraints.
Isolation levels like "repeatable read" help avoid many anomalies, but there is still widespread confusion about what they actually guarantee, especially "repeatable read." Many explanations, and even some documentation, treat it as if it were fully serializable.
In practice, most databases implement "repeatable read" using snapshot isolation. Snapshot isolation provides a consistent view of the database for the duration of a transaction, but it does not guarantee serializability and allows anomalies such as write skew.
In Postgres, the repeatable read level is implemented as snapshot isolation using MVCC. It prevents non-repeatable reads but is not serializable. In MySQL, repeatable read behaves differently, avoiding many phantom-like effects through a combination of MVCC and locking, while using different snapshot semantics. Other systems, such as db2, use different terminology altogether and expose yet another set of isolation semantics.
Because of these differences, isolation level names alone are not reliable. Always consult your database’s documentation to understand the guarantees you are actually getting.
In the next post, we will look at the "lost update" anomaly and how stronger isolation levels commbined with additional mechanisms can be used to enforce correctness.