NoSQL Transactions in DynamoDB with Akshat Vig & Somu Perianayagam

Sep 05, 2023

Amazon's DynamoDB serves some of the highest workloads on the planet with predictable, single-digit millisecond latency regardless of data size or concurrent operations. Like many NoSQL databases, DynamoDB did not offer support for transactions at first but added support for ACID transactions in 2018. Akshat Vig and Somu Perianayagam are two Senior Principal Engineers on the DynamoDB team and are here to talk about the team's Usenix research paper describing how they implemented support for transactions while maintaining the core performance characteristics of DynamoDB.

In this show, we talk about DynamoDB transaction internals, performing user research to focus on core user needs, and staying on top of cutting-edge research as a Principal Engineer.

Segments

What's hard about NoSQL transactions? -- [04:02]

Distributed Transactions at Scale in Amazon DynamoDB Paper
Most NoSQL Databases scale via horizontal sharding. Implementing transactions across multiple shards can be difficult.

User research around DynamoDB transactions -- [06:28]

Transactions not a part of initial DynamoDB service -- released at re:Invent 2018
David Yanacek - Transactions Library - provided client-side transactions for DynamoDB users
- Increasing adoption of the library showed customer need for transactions
- But: high cost to do transactions client-side
scaling limitations at Amazon - move workloads to DynamoDB from SQL
- Transactions became really important for making that transfer

The key to DynamoDB transactions (single-request transactions) -- [11:58]

Core needs: Retain predictable performance and don't affect non-transactional workloads
Traditional transactional approaches (two-phase locking; BEGIN & END transaction) did not make sense
Rejected Multi - Version Concurrency Control because of need to rewrite storage layer

Decision: Require all transaction operations be submitted in a single request
- Two new API operations: TransactWriteItems & TransactGetItems

Staying on top of research + being a Principal Engineer (+ higher) at AWS -- [16:22]

Scheduled Reading groups at Amazon
TLA+ for modeling programs and systems
P-Modeling

DynamoDB transaction internals -- [22:40]

Two-phase commit

Transaction failures + recovery -- [27:14]

Once a transaction has reached the commit phase, then it's executed to completion

Transaction idempotency -- [32:37]

The TransactWriteItems requests are idempotent

Client idempotency token

Timestamp ordering -- [37:14]

Assigned timestamp defines the serial order of all the transactions

Isolation levels in DynamoDB -- [44:02]

Serializable
Read - committed

Should you ever use TransactGetItems? -- [49:16]

Similar to strongly consistent reads: usually only need to prevent user confusion rather than maintain core data consistency.

Why no Y-axis? -- [51:51]

Why the two DynamoDB papers don't include Y-axes on their load-testing charts

Continual improvements in DynamoDB -- [54:32]

Architecture is constantly evolving
Key tenant is customer availability

Transcript

[00:01:47] Alex: Akshat, Somu, Welcome to the show.

[00:01:49] Akshat: Thanks, Alex. Thanks for having us.

[00:00:12] Somu: Thanks for having us.

[00:01:53] Alex: Yeah, absolutely. So, you two are both Senior Principal Engineers on the DynamoDB team at AWS, which is a pretty high position. Can you give everyone a little background on what it is you do on the Dynamo team, how you've been there, things like that?

[00:02:08] Akshat: Yeah, I can go first. I joined Amazon in, I think, 2010. From there, I first was working in Amazon India. Then when I saw AWS getting built, I was like, “Hey, I want to work here,” because the problems are super fun. I joined, first, SimpleDB team, and at the same time, DynamoDB was incepted. I've been with DynamoDB right from its inception, and have been able to contribute a lot of bugs and a lot of features to DynamoDB over the years, like DynamoDB streams, point in time backup restore, transactions, global databases, and we're going to talk about transactions today.

[00:02:49] Somu: Like Akshat, I've been with Amazon for about 12 years now. I started in Dynamo, I've been working in Dynamo. I've worked in all components of Dynamo, front-end, backend, control plane, but my areas of focus right now are replication services, transactions, replication services, global secondary indexes, global tables, what we're doing for our regional table replication, and how we make it highly available. Much of my focus has been around this stuff, but around all the multi-region services we have as well, at this point in time.

[00:03:31] Alex: Awesome, great. Well, thanks for coming on, because I'm obviously a huge DynamoDB fan and big fans of you two.

I'm excited to talk about your new paper. There's a really good history of papers in this area – the original Amazon Dynamo paper, not DynamoDB, in 2006 or so, really kicked off a lot in the NoSQL world. Last year, the Amazon DynamoDB paper that basically said, “Hey, here's where we took some of those learnings, made it into this cloud service and what we learned and what we built with DynamoDB.”

What's hard about NoSQL transactions

Now this year, this new transactions paper that came out, which is Distributed Transactions at Scale in Amazon DynamoDB, if people want to go look that up. Just showing how you added transactions and how transactions can work at scale. I'm excited to go deep on that today.

Just to get started, Akshat, do you want to tell us what are transactions and especially, what are the uniqueness of transactions in NoSQL databases?

[00:04:29] Akshat: Yeah. I think if you look at NoSQL databases, a lot of NoSQL databases, either do not support transactions, because NoSQL databases, they are generally the key characteristics that are considered good. The reason people choose them is high availability, high scalability, and single-digit millisecond performance. DynamoDB provides all three.

Specifically, generally transactions are considered at odds with scalability. Scalability here, I refer as two things. One is predictable performance and second is unbounded growth. Your table can be really tiny in the beginning and as you do more traffic, it can scale, it can partition. Mostly, I think previously, we have seen, like a lot of NoSQL databases, they shy away from implementing transactions. Or some do implement, but they implement it in a form which is constraint, where you can do transactions on a single partition. All the items that reside at a single machine.

When we started hearing from our customers that, “Hey, we would like to have transactions in DynamoDB,” so you're like, okay. First, let's just understand why do you actually need it, because we have seen a lot of workloads that are running on DynamoDB without actual transactions. What exactly are you looking for in transactions? I think, we went through that journey and took the challenge that, hey, we really want to add transactions, which provide the ACID properties -- atomicity, consistency, durability and isolation -- for multi-item and multi-table writes that you want to do, reads or writes that you want to do on your database table in DynamoDB, or across tables in DynamoDB. That's how we started.

User research around DynamoDB transactions

[00:06:28] Alex: Absolutely. Transactions were released at re:Invent 2018. This is six and a half years after Dynamo's been out. I guess, how soon after Dynamo being out were you starting to get requests for transactions, how long did that user research period last, of, like you're saying, what do you need these for? What constraints do you have here?

[00:06:50] Akshat: Yeah. Before we actually added transactions, I think there was a transactions library that was built by Amazon. One of the developers in our team, David Yanacek, he built a transactions library. That was essentially doing, trying to provide the same experience of ACID properties on your database. This was, I don't remember exactly, but this was, I think, 2016-ish time.

[00:07:19] Somu: 2014-ish, 2014-ish. 2015-ish, I think.

[00:07:22] Akshat: Yeah. Something around those times. The pattern that we were seeing was a lot of, for example, control planes that are getting built, or a lot of teams in Amazon who are using DynamoDB. At that time, there was also a push that, hey, we want to move all the workloads to DynamoDB and get away from the relational databases that we have seen, have scaling limitations. Transactions became really important for making that transfer from SQL databases to NoSQL databases.

At that point, transactions library was one thing which we saw, that, okay, the adoption of transactions library is increasing, so there was one signal. Second is, people started telling us about, hey, the transactions library is great. There are certain limitations that we are seeing with that, which is, for every write, we have 7X cost we have to pay, because transactions library essentially was trying to maintain a ledger, and the whole state machine of where the transaction is, how far it has gone forward.

In case the transaction is not going to finish, it has to do the rollbacks and things of that nature. All the complexity was actually encapsulated in this library as an abstraction, given to the customers.

Overall, I would say, the signal of people adopting that library a lot more and direct conversations with the customers hearing about it. This is a specific use case we are building, and it would really simplify if there were ACID properties, like full atomic transactions across multiple tables and multiple items in DynamoDB.

[00:09:06] Alex: Yeah. It's interesting to see that trade-off between the client-side solutions, like transactions library, or a few other ones that Dynamo provides, and then the actual service solutions, given that Dynamo gives you all the low-level access to most of the stuff, you can perform that, or be a query planner, or a transaction coordinator client side if you want. Then it's nice when that can move up into that layer.

I guess, once you decided, hey, we're building transactions, how long does it take? You already had tens of thousands of users, probably millions of requests per second, things like that. How long does that take to build and deliver that feature where it's available at re:Invent, in that November?

[00:09:50] Somu: I think, once we decided that we wanted to build transactions, we had a bunch of people go and figure out like, hey, what are the algorithms which is doable? Going back to your initial question of why NoSQL databases usually shy away from transactions. It's the scale and complexity of it, right? There are different algorithms you can do, implement, and then you have, once your transaction fails, how do you recover transactions? A lot that has to go into what is the algorithm you're going to choose and build.

I think a key point which Dynamo was emphasizing on and we wanted to do is that we wanted to build a protocol, which is scalable and predictable, and what is the interface you want to provide to customers? Because traditional transactions have been like, hey, BEGIN TRANSACTION, END TRANSACTION. A lot of customers are used to that. But that would take away a key tenant of Dynamo, which is predictable performance, because you now don't know how many – how long a transaction is going to be. How do we balance the trade-off? How do we expose this to customers? What are the protocols we're going to choose? I think we spent a lot of time on that. Then when we were close to knowing what the protocol was and what the API is where. I think it was roughly about a year, I would say that it took us to do.

[00:11:05] Akshat: Yeah, and I think a lot of time, I would say, goes into, as Somu was saying, a lot of time goes into understanding state-of-the-art, like what already exists, and then doing trade-offs, POCs to actually figure out how much time it took us to decide this is the right one. Because there is Dynamo, as I was saying, ACID, like atomicity, for single item was already there. Consistency. You have consistent reads and eventually consistent reads. When you do a write, you preserve the correct state. Consistency, you already get that isolation, I think, was the main thing, and atomicity across multiple items was the thing that we wanted to add.

I think, a lot of time, I would say, goes into two phases. One is just figuring out what to do, and once you figure out, building, I think, is this fastest. That last part is actually proving what we have built is correct. Yeah.

The key to DynamoDB transactions (single-request transactions)

[00:11:58] Alex: Yup. Yup. You talked about different constraints that different NoSQL databases have for transactions, and you talked about some of these only implemented on a single shard, or node, or partition, whatever that is. I assume that wasn't really feasible for Dynamo, just because that's invisible to you as a user and because those partitions are so small. That other constraint of, hey, it has to come in as a single request and all get executed together, as Somu mentioned. Was that something you narrowed in on pretty early of like, “Hey, this is what we're going to do”? Were you checking with users? Is that going to be okay? Will that still giving you what you want? Or is that something that came, took a while to hash out and figure out?

[00:12:33] Akshat: Yeah. I think for that specific journey, if I recall, I think we did a lot of, I would say, experiments and research on that. It involved trying out some of the workloads. We actually went and talked to customers to understand, hey, why do they use this concept of BEGIN and END TRANSACTION? Specifically, I think one of the biggest reasons [we chose single operation] is that if you let someone do BEGIN TRANSACTION and then send a bunch of writes and reads, and also, other operations, maybe someone puts a sleep there. The resources are tied up for that long for that particular transaction. Then when the resources are tied up, you also don't get predictable performance.

A lot of these decisions went into defining the tenets for what transactions should look like. We essentially defined goals for it, that we want to execute a set of operations atomically, and serializably, for any items in any tables with predictable performance, and also no impact to non-transactional workloads. A lot of standard techniques, like two-phase locking, and the begin-end transaction approach, a lot of those just did not make sense for us.

Even, I think, for example, one of the things we really actually considered and debated a lot was multi-version concurrency control. If we could build something on that, you get read isolation. Your reads could be isolated from writes. But supporting multi-version concurrency control in Dynamo would actually mean we have to change the storage engine. If you build MVCC, you need to track multiple versions, which means the additional cost that comes with it of storing multiple items, then you have to pass that cost to the customers.

Also, we had to, all these basically standard approaches, we had to reject. Then, we nailed it down to, okay, we want to do a single-request transaction based on these goals, or tenets that we have defined. Then, we went to some teams in Amazon.com and said that, “Hey, if we provide you these two APIs, would you be able to convert your existing transactional workloads into a DynamoDB transaction?” We did a similar exercise with some external customers as well, to validate what we are building – that does not have obvious adoption blockers and things of that nature.

Turns out, all the use cases that we actually discussed with the customers, we were able to convert them into the two APIs that we added, two operations in the DynamoDB APIs that we added. One is TransactWriteItems, and second is TransactGetItems. Just to explain, TransactWriteItems and TransactGetItems a little bit. Essentially, with TransactWriteItems, you can do a bunch of writes, which could be update, delete, or put request. You can also specify conditions. The conditions could be on these items which you're trying to update, the DynamoDB standard like OCC [optimistic concurrency control] write that you do. Or you can also do a check item, which is not an item that you're updating on a transaction. Similarly, for TransactGetItem, a separate API, where you can do multiple gets in the same call, which you want to read in a serializable manner.

[00:16:04] Alex: Yep, absolutely. Yeah, I love that single request model. I think you're right that almost anything that can be modeled into it, and the ones that can't are probably the ones your DBA is going to advise you against doing anyway on your relational database, like where you're holding that transaction open for a while and maybe calling some other API, or something . Those can really get you into trouble.

Staying on top of research and being a Principal Engineer (+ higher) at AWS

You mentioned like, hey, what's the state of the art in terms of protocols and patterns and things like that? Where do you go look for research on transaction protocols, or just different things that's happening? Is that academia? Is that industry papers, or where are you finding that stuff?

[00:16:39] Somu: Oh, right. I think there's been a lot of good work in academia starting from the 60s about transactions. It is very interesting, right? Because the inspiration we took was from one of the papers published by Phil Bernstein and this was in the 1970s when most of us were not even born. I think, academia has a lot of the good research and then there's been a lot of good research in the industry as well. Now, there's been industries been doing a lot of research and we've been publishing recently as well. Industry has also been doing a lot of research.

Look back at a lot of the papers, which has published in standard computer sense conferences, like Usenix, Sigma, or OSDI. Then I learned from what has worked in the past and what has not worked in the past and what will work for us, technically, right? In case of transactions, the timestamp ordering. Why does it work for us? We will definitely go into details. There's an element of that as well here as like, what makes sense for us.

[00:17:44] Alex: Yeah. What does that look like at Amazon? Is it mostly just informal, like, hey, did you see this new paper? Or are there scheduled reading groups, or different things like that to make sure everyone's up on the way to stuff? What does that –

[00:17:55] Somu: We have scheduled reading groups, because we have people of varied interest and we want to learn a lot about what's happening and what's not happening. We may not get to do that on a day-to-day in a job basis, so we have people who have focused reading groups, who read papers all the time and talk about, hey, pros and cons. What did we understand? What did we not understand? What did the paper do well? What did the paper not do well? We talk a lot about how to use the different things. For example, a big thing within Amazon is how do we use formal modeling tools like TLA plus, or P modeling, right? We'll have scheduled groups, which go dive deep into that stuff. There are scheduled groups for everything, like data structures, algorithms, distributed systems.

[00:18:41] Alex: I know, I've seen a lot on TLA plus at Amazon. Is that something that both of you are doing, or is that something like, hey, there's a group that's really good at that, or a few people that are really good at that and they'll come help you through it? How often are you actually using those methods?

[00:18:57] Somu: There are very few people who use TLA plus, partly because it's more complex, but it's very helpful. For example, the plus scale, it's made life a lot easier to go for you and me to go write something. Back in the day, that the TLA plus specification was harder to write, but at plus scale, it's very easy when they convert it to TLA plus it's easy to write. The P modeling is something which we have all developers now use, because it's closer to the code you would write, and it is easier to prototype in P model and check a model in P and then run with that stuff. I think that's something we advise all developers to write.

TLA plus is usually being a niche set of developers. We use this stuff for a really very critical set of problems, like Dynamo when we started. When we did Dynamo first, we had a TLA plus model for all of Dynamo operations to ensure that everything is correct. That's still the foundation for Dynamo in some ways.

[00:19:55] Akshat: Same for transactions. We did a similar thing for transactions as well to prove the correctness of the algorithm. Similar to that, we actually also have a ACID verifier which runs in production to since, whatever time has the transactions has been launched, we still run the ACID verifier on just to make sure that we have not any gaps that we have any blind spots, or things like that to ensure the protocol is correct.

[00:20:24] Alex: Yeah, absolutely. Okay, one more thing before we get into internals of transactions. You're both senior principal engineers. You've been at Dynamo for 12 years, obviously doing a lot of higher-level stuff. I'm sure writing documents, writing these papers, giving talks, but Amazon is also known for being very practical hands-on for their advanced people. How much time during the week do you still sit down and write code?

[00:20:49] Akshat: I think it varies. It varies on different phases of the project. Overall, I would say, in terms of if I look at the full year, a lot of time I think is spent in figuring out what we are doing and how we are doing and whether it is correct or not. Then second phase is I think where you write the P modeling stuff that Somu was talking about. I think a lot of time gets spent in that.

Third is I think POCs, where you come up with an idea, you write a POC to prove that hey, this actually makes sense, or this actually, whatever we are claiming is going to be what we would – is what we are claiming is actually going to be achieved. That's one. Then third, I would say, the last part is reviewing and ensuring that operationally, we are ready and ensuring that the testing that we are doing, we have good coverage. I would say, writing code, testing, P modeling, writing docs, it's equal split in terms of the time spent.

[00:21:54] Somu: If I am working on a project, I would usually take something no other developer wants to take, or non-critical, because I'm not blocking them in any way, or fashion, because I'm doing a bunch of other things as well simultaneously. I think, like Akshat said, it depends on the phase of the project. If it's something which is in an ideation at this point in time, we would write a bunch of code, again, to prove it works and doesn't work. We are doing some modeling stuff at this point in time. That's how we can ensure that we are up to date and hands on in that stuff as well.

[00:22:25] Akshat: The other part is also code review, which still keep you very close and connected. That because operationally, I think if you're not connected operationally, it's very hard to debug things when you get paged at night at 2 am.

DynamoDB transaction internals

[00:22:40] Alex: Yeah, yeah. Exactly. Okay, let's get into transaction internals. First thing, two-phase commit, which is the pattern you use here on the transaction coordinator. Do one of you want to explain how two-phase commit works?

[00:22:54] Akshat: Yeah. Before that, let's just talk through a high level at DynamoDB, normal put request that comes and close through. Then I'll add the two-phase, how we implemented that.

First, any request that a developer, or an application sends to DynamoDB, it first hits the load balancer. From there, it goes to a request router, which is a stateless fleet. The request router has to figure out where to send this request. If it's a put request, it sends it to a leader replica of a partition. Now, DynamoDB table is partitioned for scale and that number of partitions are identified based on the size, or the read and write capacity units that you want for your table.

You might have a table which has 10 partitions. This item that you're trying to put will reside in a specific partition and that partition has three replicas. One of the replicas is the leader replica. The write request goes to that leader and it replicates it to two other replicas before two other followers, which once it gets acknowledgement from at least one more. Two copies are committed and we acknowledge it back to the client.

To find out which storage node to route the request, there is a metadata system which we use. Now for transactions, we introduced a transaction coordinator, which has the responsibility of ensuring that a particular transaction that is accepted has to go through completely. A request that customer makes a TransactWriteItem request. It goes to the request router. Then, it goes to the transaction coordinator. First thing the transaction coordinator does, it stores it in a ledger and ledger is like a DynamoDB table, and we can come back to it. But the main point of ledger is to ensure that whatever request is executed atomically. Either the full request succeeds, or it does not succeed.

Second part is fault tolerance that if a transaction coordinator which is processing a request crashes, since the request is stored in the ledger, any other transaction coordinator can pick it up and run with it. Transaction coordinator, once it stores it in the ledger, it is doing checkpointing and state management of where the transaction is. Once it is stored in the ledger, it sends prepare messages to all the storage nodes involved. Let's say, you are doing a 10-item transaction, which are for 10 different tables and they could be 10 completely different partitions all in the same account. Now, once that request is sent for prepares, at that point, all the check conditions, like if you're doing an OCC write with a put item, or you're purely doing just a check –

[00:25:49] Alex: Just to interrupt you, what's OCC?

[00:25:52] Akshat: Yeah, so optimistic concurrency control. If you want to do a write, saying that, “Hey, I want this write to succeed only if certain conditions are evaluated to true. If that happens, then only accept this write. Otherwise, reject this particular write request that we are sending to you.” Prepare messages are evaluating that. It also evaluates if there are any of the validations that like, item size, 400 KB item. Things like that. If any of those will not be met, then you should just reply back saying, “I cannot accept the transaction.”

But assuming that every storage node, all the 10 storage nodes in the 10-item transaction case, reply back saying that, yeah, this particular transaction, prepare, we can accept the transaction moves on to the commit phase. Once it has passed the prepare phase, i.e. the transaction coordinator got acknowledgement from every storage node and it is also written in the ledger that the transaction has finished the prepare step, it moves to the commit step, which is making sure the actual write happening at that particular point.

The item is taken from the ledger and then sent to the specific storage node to finish the transaction. Once the commits are done, your full transaction actually is finished.

At a high level, that's the two-phase protocol.

Transaction failures + recovery

[00:27:14] Alex: Got you. Okay, so we have prepare and commit. Prepare is just basically checking with every node, saying, “Hey, is this good or not?” If they all come back with that, accept thumbs up, then it comes back and says, “Okay, go ahead and commit.” Once it's in that commit phase and then tells them all to execute, is there basically no going back, even say, one of those nodes failed originally, or something happens, we're just going to keep trying until that –we've already decided this transaction is going through with this one?

[00:27:38] Somu: Yes. Once a transaction has reached the commit phase, then it's executed to completion. Failure of transaction coordinator, or failure of a node which is hosting the partition was not going to stop it. It's going to finish it to completion. If a transaction coordinator fails, another one is going to pick it up. It knows, hey, the transaction is in commit phase. I'm just going to send commit messages to all the items that are involved in the transaction, no matter whether it knows whether a single item is resent, commit has been sent or not.

If a storage node fails, it's the same thing. Nodes fail all the time. A new leader is elected and the new leader can complete the commit. It doesn't need any prior knowledge of the transaction at this point in time.

[00:28:25] Alex: Okay. Tell me about that, the transaction coordinator failing. How does a new one pick up that stalled transaction and make sure it gets executed?

[00:28:35] Somu: All transaction coordinators run a small component of the recovery. They keep scanning the ledger to say, “Hey, are all transactions getting executed?” If they find a transaction which is not executed for a long period of time, then they would say, this transaction is not executed at this point in time. Either we take it forward. Let's say, there is a transaction in a prepared state. Transaction coordinator may say, you know this transaction has not been executed for a long time. It's in prepared state. I don't know what happened to all the prepares. What I'm going to do is cancel this transaction. I'm not going to execute this transaction. I'm going to move this into a canceled phase and then send cancel notifications to all the members involved in the transaction.

Or it can decide, oh, the transaction is in commit phase. Let me just take it to completion and send everybody a commit message at this point in time. This is a small recovery component.

There's a small piece we missed, which is when we do the prepares for an item, every storage node has a marker saying, well, this item has been prepared for this particular transaction. Let's say that for some reason, that the transaction has not been acted upon for some period of time and the storage node looks at that and say, “Hey, this item is still in prepared state for quite some time.” It can also kick off a recovery and say, “Hey, can please somebody recover this transaction, to recover this item for me, because it's been a long time since the transaction started?”

[00:29:58] Alex: When you say a long time, how are we talking here? Are we talking like seconds, or a minute, or what does that look like?

[00:30:04] Somu: We are talking five minutes, seconds at this point in time.

[00:30:08] Akshat: Yeah, seconds. Seconds. Yeah. I think the most interesting part out of this also is there are no rollbacks. That's why there are no rollbacks here, right? Because the prepare phase is actually not writing anything, it's just storing that marker that Somu pointed out. Hence, if any of the prepare fails, or we identify that this transaction cannot be completed, we just send cancellation, which is basically not, yeah, aborting the transaction.

[00:30:36] Alex: Got you. If anything is in the, I guess, the prepare phase where a node has accepted it and sent back accept, but maybe the transaction is stalled for whatever reason, are writes to that item effectively blocked at that point until it's recovered?

[00:30:50] Somu: Yes. The writes to that particular item cannot be now serialized. You would have to have the transaction complete. You have the rights serialized. Any other singleton write would be rejected saying, “Hey, there's a transaction conflict at this point in time. We can't accept it.” We can talk a little bit more about this, because we did talk in the paper about some optimizations we can do there and we know that we can do this.

In reality, we're not seeing this happen. Customers mixing traffic of transact writes with singleton writes, so we don't see this thing much in practice to go and say, “Okay, we have to go and implement this optimization where we can do serialize these writes.”

[00:31:27] Alex: That's interesting. Most items you see are either involved in transaction writes, or singleton writes, but not both. That's interesting. Which is a recommendation, I think, from Cassandra with their lightweight transactions, because I think you can get some bad issues there with that. It's interesting that customer patterns work out that way anyway.

[00:31:47] Akshat: Yeah. I think the part of if there is a transaction stuck, as Somu pointed out, if there is a write request that comes to it and the transaction has been stuck for a while, that also will kick off recovery automatically. Plus, I think when devised these algorithms, we actually thought about, we want to support for contention as well. That's why we chose timestamp ordering. Where we can do some interesting tricks, which we talked about, and we actually also tried some of those implementations before we went ahead with this approach.

[00:32:19] Alex: Yeah, okay. For a transaction that's stuck, what happened to the client there? Is that just hanging until it times out at 30 seconds, whatever the client timeout is? If something picks it up, is it going to be able to respond back to that client? Or is that basically just like, “Hey, we'll clean it up,” but the client, they're sort of on their own at that point?

Transaction idempotency

[00:32:37] Akshat: The TransactWriteItem requests, they're actually idempotent. If, let's say, a request that took longer than the client timeout, clients can just retry using the same client token, which is the idempotency token. That token is used to identify what based – that token, you need to identify the transaction. Based on that, we can tell you that, hey, this transaction actually succeeded if you come back, or this transaction failed. Again, most of these transactions are, we are still talking milliseconds. We're not talking seconds to finish. Most of the transactions are still finishing in milliseconds and clients are getting an acknowledgement back on these.

[00:33:16] Alex: Yeah. You mentioned the idempotency and the client request token on a transact, right? Should I always include a client request? There's no, I mean, not cost, but even there's no latency cost on that, or any cost, is there –

[00:33:33] Akshat: No. That's a recommendation from DynamoDB. If you're using TransactWriteItem request, use the client token so that you can recover really easily. Retry as many times. There is a time limit for which this client idempotency token will work, because you might be trying to do a different transaction. There is a time limit. After which, it won't. Yeah, it is recommended to use that.

[00:33:58] Somu: The nice thing about client request token, Alex, is that let's say, your client for some reason timed or, but the request was executed successfully on Dynamo side. You can come back with the same thing in Dynamo and say, “Hey, this transaction was successful. You don't have to execute this stuff.” I think that's a super nice thing about the client request token. Also, the fact that let's say, that if for some reason, if for some reason you come back and the idempotency token is expired, I think, with that window is 10 minutes at this point in time, we would try to re-execute the transaction, right? But most of the transactions are usually have conditions in them and the conditions will fail. Then we will say, “Okay. No, this transaction has a condition failure, so we won't be able to execute this stuff.”

[00:34:43] Akshat: Yeah. This client token actually was not something we initially plan to add. This was again, when we built it, we gave it to a few customers. They tried it out and they were like, “Hey, this particular use case, we don't know if this transaction succeeded or failed, because we timed out.” This was like, I would say, in the later part of the project, we designed it, implemented it and launched it. Quite a flexible and iterative process there.

[00:35:06] Alex: Yeah, cool. Okay, so you mentioned that there's the 10-minute window where that request is guaranteed to be idempotent if you're including that token. Are you just keeping records in that transaction ledger for 10 minutes, expiring at some point, but at least they're hanging around for ten minutes is the point there. Okay.

Then you mentioned looking for stalled transactions. Is that just like, you're just brute force scanning the table, taking all the transaction coordinators? Each one's taking a segment and just continually running scans against it?

[00:35:38] Akshat: It's a parallel. Parallel scan.

[00:35:41] Somu: The ledger is a DynamoDB table. I think we talked about this before. I think it's very heavily sharded, to put it nicely. You can do a lot of scans on this table and it's in a pay-per-request table, right? We have all the transaction coordinators. They can pick a small segment of it and say, “I need to scan a 1,000 items.” They all can scan it quite quickly and figure out any transactions are just stalled.

[00:36:12] Alex: Yeah. Okay. Tell me about that DynamoDB table that's used for the ledger. Is there a different Dynamo instance somewhere that's used for these internal type things, like the ledger, or is it just a loop of writing back to itself?

Dynamo as a service is a multi-tenant service. All these customers across a region, or a lot of customers within a region, are using the exact same Dynamo service. I guess, how does that foundational Dynamo instance work? Is that a separate instance that's different and special, or anything like that?

[00:36:48] Somu: No. This is a normal user-level table. Then the transaction coordinators are just another user and there is a normal user level table at this point in time. As you mentioned, that there is a circular dependency here, so you can't use transactions on this table, but we don't need to have – we don't have any to use transactions on this table. This is a Dynamo level table. I use a normal user-level table, so we get all the other features of Dynamo, which we can use.

Timestamp ordering

[00:37:14] Alex: Yeah. Wow. That's pretty amazing. Okay. All right. You’ve mentioned timestamp ordering a couple of times. I guess, what is timestamp ordering? How do you use it in transactions?

[00:37:26] Akshat: Yeah. Timestamp ordering. we talked a lot about atomicity till now, the two-phase protocol, right? For serializability, we decided to borrow timestamp ordering technique, which Somu –

[00:37:37] Alex: Hold on. Serializability. This is a confusing topic, but just like, high level.

[00:37:43] Somu: We'll spend ages on that.

[00:37:44] Alex: If you could do what no one else has managed to do and describe that in one or two sentences. What's the high-level idea of serializability?

[00:37:53] Akshat: I think it's mainly around concurrent access. If you have concurrent access of data in a database, you need to define a order in which these transactions are executing. Timestamp ordering has this very nice property that if you assign a timestamp to each transaction, the timestamp basically is the clock that is being used, that is being used from the transaction coordinator. The assigned timestamp defines the serial order of all the transactions that are going to execute on a set of tables that you're doing.

That basically defines the serial order of the transaction. Even if you have concurrent access from multiple users trying to do transactions on the same set of items, timestamp ordering gives this nice property where we can serialize, or define a serial order of these transactions.

[00:38:40] Somu: It's like, kids are coming and asking us something, right? Then you say, “Hey, hold on. Your brother asked me something first. I'm going to execute his request first, because we want one parent at this point in time.” That's exactly what timestamp ordering allows us to do is to have concurrency control to say, hey, which transactions get – what is the order in which transactions will get executed?

[00:39:01] Alex: Awesome. Awesome. I love that example..

Again, two-phase commit, what other options were there in terms of ordering and serialization that were considered, or things like that?

[00:39:15] Akshat: Two-phase locking is one, right?

Two-phase locking is one where you lock the items with on which you're executing the transaction and then you finish the transaction, then move on to the next one.

But locks means deadlocks. Locks means a lot of things that you have to take care of. We didn't want that.

That's why timestamp ordering, which gives you this nice property of if you assign timestamp, as I said, then the transaction executes, or appear to execute at their assigned time, serializability is achieved.

The nice property is if you have the timestamp assigned, you can accept multiple transactions, even if let's say, one transaction is prepared, accepted on a particular storage node, if you send another transaction with a timestamp, you can put it in the specific order and execute them, because there is a timestamp associated with it. There are certain rules which you have to evaluate whether this particular second transaction you should accept, when there is already a prepared transaction or not. Yeah, that's the key thing with timestamp ordering.

[00:40:18] Somu: It's also simple in the sense that let's say that I accepted a transaction with timestamp. I get a transaction with timestamp nine and I want to say, “You know what? I accepted already something with 10. I'm not going to execute nine anymore. Please go away and come back with a new timestamp.” It's in the order. Like in anywhere else, like a DMV where maybe they accept nine, but they don't accept something which is very old still.

[00:40:43] Alex: Yeah. Yeah. Yeah. I thought that was one of the most interesting parts of the paper, just talking about the different interactions and optimizations on top of that interacting with singleton operations, writes or reads and how that can interact with conflicting transactions, conflicting, or conflicts among transactions and things like that. I thought that was really interesting.

I guess, one question I had on there and it mentions the transaction coordinator, which that's what assigns the timestamp, I believe, the coordinator node. Okay, so that's using the AWS time sync service, so that the clocks should be within a couple microseconds, or something like that. It also says like, hey, synchronized clocks are actually not necessary across these and there's going to be a little bit of discrepancy. I guess, why aren't they necessary and why is it useful? Why aren't they necessary and then why is it helpful, I guess, to have them as synchronized as possible?

[00:41:39] Somu: For the correctness of the protocol, synchronized clocks are not necessary, because the clocks just act as a number at this point in time. Just to talk a number. If there's two transaction coordinators which pick different numbers, then it gets automatically resolved in who comes out first. Since clocks don't have to be synchronized, it's for correctness. From an availability perspective, you want to have clocks as closely as possible, synchronized as closely as possible. For the same example, I just gave you a couple of minutes ago is that let's say, there's a transaction coordinator whose clock is off by a couple of seconds. It's behind by a couple of seconds. Then always, its transactions are going to get rejected for the same items, which another transaction coordinator assigns timestamp, because its time is behind. It's always going to get an "I already executed a transaction of timestamp X and your timestamp is less, so I'm not going to execute your transaction".

From an availability perspective, it's nice to have clocks closely in sync. That's exactly why we have – we use time sync, because we have some guarantees around how much a clock drift is going to be there, and we can control the precision of the clocks.

[00:42:51] Akshat: Yeah. It's to avoid unnecessary cancellations, because of these variable timestamps. For load, we have different transaction coordinators, so timestamps could vary. We also have guard rails in the system, where we identify. If a particular coordinator has the time drifting, we just excommunicate that node out from the fleet.

Alos, the storage node also has checks in place where if a transaction coordinator sends a request, which is way out in future, it will say that, “Hey, what are you doing? I'm not going to accept this transaction.” We have guard rails across different levels of guard rails in place to ensure that we keep high availability for these transactions.

[00:43:33] Alex: I was just going to ask that, because it seems like everywhere in Dynamo, it's sort of like, everyone's checking on each other all the time. It's just like, hey, if I get something goofy, I'm going to send that back in and also, tell them to get rid of that node.

[00:43:46] Akshat: When I joined SimpleDB team, I was working with a guy David Lutz, and he was – I asked him. I had not built distributed system. He's like, “One thing you need to learn and this will go throughout your career, never trust anyone in the distributed system. That's the default rule.”

Isolation levels in DynamoDB

[00:44:02] Alex: That's amazing. Yeah. I love to see it. Okay. We talked about serializability. I know one thing that comes up a lot around this is isolation levels, which again is a whole other level of depth in terms of that. Tell me a little bit about, I guess, the isolation levels will get, especially across different levels of operations in Dynamo.

[00:44:28] Akshat: Yeah. I think, if you think about it, TransactGetItem, TransactWriteItem. There is actually a documented page as well on this. TransactGetItem, TransactWriteItems. They are serialized. For get items, if you do a consistent get request, you are essentially getting read committed data. You always get read committed data. There is nothing which you're getting, which is not committed. If you're doing, let's say, a non-transactional read on item, which has already transactions going on, as Somu pointed out, those requests will be serialized with that transaction. If you have a transactional workload and you do a normal get item, those will also be serialized.

But they also are giving you a read committed data. Your get request won't actually be rejected. You will get the answer back with the whatever is the committed state of that item at that particular time. Then, I think, with batch writes, and I would say for batch writes and TransactWriteItems, you have similar at item level, the same serializability.

[00:45:46] Somu: I think that's a key part is that it's very hard to define these in some ways, because there are certain Dynamo APIs, like batch writes that can span different items, which are provided just as a convenience for customers. Customers don't have to come back and go back. How do you define serializability of a single batch write across a transactional write? It's hard to do that, because each of these individual rights are serializable by itself, but the entire batch write operation is probably not serializable with the transact write item. Helping customers understand that nuance is very, very tricky.

It's where we have this whole documentation vision, or lengthy documentation based thing that yes, each individual write within the batch write is serializable, but the entire operation is not serializable against a single transact write item. I think the nuance is there for batch write. Likewise, even for scan, when you're doing a scan and a query, you're always going to get the read committed data. If a transaction is executing across the same items in the scan, then you're going to get the latest committed data always.

[00:46:55] Alex: Yup. Yup. Absolutely. Yeah, and just so I understand it and maybe put into practical terms, if I do a batch get item in Dynamo, let's just say I'm retrieving two items, and at the same time, there's a transaction that's acting on those two items, each one of those get operations within batch get will be serializable with respect to that transaction, but it's possible that my batch get result has one item before the transaction and one item after the transaction.

[00:47:22] Somu: Yes.

[00:47:24] Alex: Okay. Yup. Then there's the issue, I guess, potentially of, I guess, read committed – oh, man. Okay, so read commit. I always get tied up on this stuff. I think some people see read committed, especially in the query respect, or also, the batch get respect. I have getting read committed, it's not serializable here. Then I think of, okay, what are the isolation levels and what anomalies can I get if I think of the database literature?

The thing that comes out to me is yes, that is true, but you don't see the anomalies that you might – from my point of view, in a relational database where you have a long running transaction. If you look at the read committed isolation level and then now you can have, well, like phantom reads and non-repeatable reads. But that's within the context of a transaction, but that's not going to happen in Dynamo, because you have that single shot, single request transaction, you don't have to BEGIN, run a bunch of stuff and then needing to commit, whatever type of thing. You don't see those type of anomalies, just because you can't do that type of operation. Am I missing something?

[00:48:33] Akshat: You’re right. I think, as you pointed out, just to reiterate, I think the between any write operations, serializable isolation is there between a standard read operation, you also have serialized TransactWriteItems and TransactGetItems. If you care about what you were saying, where I did a transactional write and then I want to get a transaction fully serializable, the transaction should not give me an answer back on a bunch of items, because I read them as a unit. TransactGetItem is what you should use to ensure that you're getting isolation as a unit as well. If you do batch write and batch get, you get at individual item level serializable isolation, but not as a unit.

Should you ever use TransactGetItems?

[00:49:16] Alex: Yeah, got you. Okay. On that same note, TransactGetItems. I almost never tell people to use it. What do you see people using it for? What's the core needs there around – I'm not saying it's not a useful thing, but I think it's one of those things, like the strongly consistent read on a DynamoDB leader that maybe, or you think you need it less than you actually do need it? I guess, where are you seeing the TransactGetItems use cases?

[00:49:47] Akshat: I think, a lot of them I've seen in where you – I agree with you that most of the cases, you can actually model with just consistent reads, or eventually consistent reads. But there are certain use cases where you really want, as I said, as a unit, you did an operation as a unit. Let's say, you are moving state in a state machine in control plane that you're building, where you have three items, which together define the final thing that you want to show to the customer. You don't want to read any of those items in an individual manner and show something to the user. That's where I think it makes sense to use TransactGetItems, where even if one of the items that you read is you cannot accept that to be read committed. That's when you use TransactGetItems, but the space is very narrow. I agree with you.

[00:50:39] Somu: For a classic example would be, Alex, this happened a couple of days ago, two days like, you're transferring money between your two accounts, right? Then you want to view both the balances together. If you'd land up doing to a batch get, you may have been a temporary state of euphoria, or surprise. You want to use TransactGetItems to say, “Okay, I did the transfer. I need to know what happens to use the TransactGetItems there.” There are control planes have such use cases, banks and banks and some such use cases where you finally want to display this stuff. Those are cases where TransactGetItems is super useful.

[00:51:10] Alex: It's almost preventing end user, or user-facing confusion, rather than your application and some of the business processes. If it's a background process, you almost don't need to use TransactGetItems.

[00:51:22] Akshat: Yes. If you're depending on both of them to be consistent in the database, this is the key word, right? Let's say that I see that that order of statuses gone from in warehouse to ship, then I expect something else to have been done. Then that consistency will not – you'll not get with the batch get. If you want to consistent read, then you want to do the chance I get to read both items together.

Why no Y-axis?

[00:51:51] Alex: Yeah. All right, cool. I want to stir some stuff up a little bit, because there was some consternation on Twitter. At the end of this DynamoDB transactions paper and also, the DynamoDB paper last year, there are some just charts showing different benchmarks and things like that that I think are really useful and showing, I guess, how does latency change as a number of operations you have against your table, the number of transactions you're running against your table increases, or more items in your transactions, or more contention on all those things. All those charts don't have labels on the Y-axis showing how many milliseconds it takes in all these different levels. Why no labels?

[00:52:38] Somu: We just forgot it. No, I'm kidding. I think –

[00:52:41] Alex: Actually forgot how to put them in.

[00:52:42] Somu: I actually forgot to put them in, even the last checker. No, I think it's partly, I think we could have done a little bit better job there. The point was not to show the numbers as such, right? I mean, the numbers that I think anybody can grok. It's a very simple test to go run and everybody can run the test and grok the numbers. The point was to show the relative difference between, for example, singleton write versus a transactional write, what's the cost, latency cost and it's the X amount or more. I think that was the whole point and we didn't want to give absolute numbers, which doesn't make sense.

One of the lessons was we could have done a little bit better job of normalizing the numbers and presenting the normalized number on Y-axis, but I think that's a lesson for us to take away next time.

[00:53:27] Alex: Yeah. I like it. I agree. Last year, when I first read that Dynamo paper, I was like, “Why are the numbers on the – Why wouldn't they show that?” Then the more you think about it, Dynamo's whole point is consistent performance, no matter what, right? It doesn't matter how many items you have in your table, how many concurrent requests you're making, all those different things. I think these benchmarks are trying to show that at different levels like, hey, it's still the same, whether you're doing one item, whether you're doing a million transactions per second.

[00:53:59] Akshat: We keep making all these optimizations in the stack to improve performance across the board as well. I think it's just, again, as Somu pointed out, these numbers will be more distraction than actually help, because you might run an experiment 10 years later, and the performance will be even better, right? What's the point? The key point is that you get consistent performance as you're scaling your operations. That's the key message we wanted to actually take away from that, not that this transaction operated at 5 millisecond, or 10 millisecond, or 20 millisecond, or whatever that is.

Continual improvements in DynamoDB

[00:54:32] Alex: Yup, exactly. Yeah, because a lot of those benchmarks can be gamed, or who knows what's going on and just, are they representative of things? I think, yeah, showing, like you're saying, it doesn't really matter what those other – those other factors are mostly unimportant to the scale you're going to get there.

I guess, consistent performance with Dynamo is just so interesting and such a key tenant on just everything in terms of the APIs and features that are developing all that stuff. I guess, how far does that go? I don't know if this is even easy to think about, but if you had some change that would reduce latency for your P50, your P90, something like that, but it would maybe increase your P99 by 10%, 20%, something like that? Is that something that's like, “No. Hey, we don't want to increase that spread. We don't want to decrease our P99 at any cost?” Maybe that thing just never comes up. I guess, how front of mind is that consistent performance for Dynamo?

[00:55:39] Akshat: I think it is, like as I said, it is one of the core tenants in the beginning. That's one of the core tenants. Whenever we do a new thing in Dynamo, we have to ensure that. Whenever we look at the lens of improving latencies, I think we start from entitlements. What exactly, if we have to do this operation, what each hop in the overall stack, how much latency is attributed, or allowed for each hop to actually take from the full request? We go from there. If there is network distance between two, that's one of the entitlements, right?

It varies from when you are looking at a problem, if you find an opportunity to improve the latency at P50, I think the goal is to make sure the variance between P50 and P99 is also not too high, because consistent performance is about giving you, at any time when you make a call, you get the same performance on the read and write operations that you're doing.

[00:56:41] Alex: Very cool. Okay, one thing on the latency I wanted to look at, it was just on one of the charts, especially showing how latency changes as you increase the number of transactions you're running. There was a spike at the end of P99 for very high request rates. If you're doing lots of transactions per second, there was a little bit of spike on P99 compared to even slightly lower request rates. You mentioned, it was a Java garbage collection issue. I guess, is that something that when you see that, you're like, “Hey, we need to –” if it's a GC issue, do we need – I know you are doing some stuff in Rust. Is that something you're like, “Hey, we need to change that because that tail latency is so unacceptable?”

Or, is it also like, if it only shows up, I think it was doing a million ops per second, which was, they're doing three ops per transaction. 333,000 transactions per second. Do you not have that many users doing that to where it's a big issue and that P99 is okay at that point? Or is that something you're actively thinking about?

[00:57:42] Somu: That one was a very interesting one, because I know we went back and forth on those numbers on what are the issues with that stuff. That was specifically with the 100-item transactions. When you're doing a 100-item transaction, a transaction coordinator is holding on to those objects for a longer period of time, ensuring that you're talking to 100 different nodes. The P99 data has been higher. We do want to address the P99 issue there, but the number of customers using 100-item transactions are also very – the number of applications using 100-item transactions is also far low, right? We would address that note.

If those customers, those applications are using 100-item transactions, they're already paying the latency penalty at this point in time. You have 100 transactions. As long as it's consistent, we are, okay, we will address it, but maybe not as soon as, but we will definitely address it. We don't want that to regress, right? We want to keep it where it is at this point in time and measure it and see what happens.

[00:58:43] Akshat: We actually run canaries across all the different AZs, all the different endpoints that we expose to actually find issues in latency before our customers do. We have canaries running all the time, acting like customers, doing these variable size transactions to identify if there is any issue in a particular stack, in a particular region, or anywhere in the stack, we get paged, figure out what the issue is and resolve it as well. Yeah, we have we – we don't take this lightly.

[00:59:15] Alex: Yeah, very cool. I remember that from last year's paper about how you do monitoring and the performance degradation tests, and you have all those canaries like you were saying, but also, I think, some of your high traffic Amazon.com tables, right? You showed you get direct access to their monitoring and be able to pick up some of the latency degradation there, if any.

Yeah, pretty cool to see. That's it. Cool. Okay, transactions. That's great stuff. I just want to close here. You've both been working on Dynamo since it was released now. What does that look like to, I guess, not do new feature development, but, but maintaining, or updating the foundations of Dynamo? How much does some of that stuff change? You would know this better than me, but just like, I don't know, as we've seen changes from hard disk drives to SSDs, to NVMe, is that something that is like a regular change, or even the storage engines you're using? Or how much of that foundational work, how often does that – is that something that gets updated every couple of years, or is constant maintenance? Or what does that look like?

[01:00:21] Somu: Our architecture is constantly evolving. We're finding new things, right? The best part about Dynamo is customers don't have to worry about the stuff. That's the best thing. There's a lot of things in the bag changing all the time. Our key tenant is customer availability, or latency should not regress because we're doing something in the behind. We do a lot of things. A classic example would be when I worked on encryption-at-rest, back in 2018, I would say, I keep forgetting this numbers. But anyways, 2018, right?

There's a whole thing that we totally integrated everybody under the covers with KMS. This was a whole sweep and the customer's number never saw a blip. Yes, there are things constantly changing in the background. We're trying to improve latency. We're trying to make things more efficient. All these customers don't get to see. That's the best part of being a fully managed service. To answer your question, it's constantly happening, but nobody gets to know about this stuff.

[01:01:18] Akshat: I think a lot of developers also asked this question to me, like who are interviewing at our team as well that, hey, you have been here for that long. Are you not bored? I’m like, no. Every year, there is some fun problem that we have to launch. The best part is, as soon as you launch, you don't get one customer. You get so many customers who want to use your feature. Traffic also, you don't get one request or two requests. You get millions of requests. You have this fun challenge that you have to solve, which has all – Dynamo has so many fun problems that still keeps us excited.

[01:01:50] Alex: Yup. Yep. Do you get the same thrill of releasing a public feature, a very visible feature, like transactions as when you're releasing something like adaptive capacity, which for those listening, it was more just how Dynamo is splitting your provision throughput across the different partitions in your table. It was something that was mostly under the hood, you didn't even know about it, until you all published a really good blog post on it and then further improvements including on demand mode and stuff like that. Do you still get the same thrill when those sorts of releases come out and you're like, “Man, we just solved a huge problem for a lot of people and they might not even know for a little while.”

[01:02:29] Somu: That one specifically, yes. Because a lot of the customers were complaining about it as well. We do it right away, and I think everyone was super excited about. Yeah. I think everything we do in Dynamo is very exciting at the end of the day, right? Because you have direct customer impact one way or the other. It just boils down to what the impact is.

[01:02:50] Akshat: Yeah, I think, I remember once, I don't know which year it was, but I think me and Somu actually worked on a problem, which reduced number of operational tickets we used to get from really a big dent, like 10X improvement on that. Yeah, I think we get the same thrill. It's where you want to put your mind and solve the problem. As I said, Dynamo has so much fun problems to solve.

[01:03:11] Alex: Yeah. Okay. Cool. The last two years, you've written some really great papers, the DynamoDB paper last year, transactions this year. What are we getting next year? What's the next paper coming down the pipe? Do you have one?

[01:03:22] Akshat: We have to think about it. First, we have to build something and then we – Yeah, there is definitely a lot more we are thinking and evaluating what we should do. We have also started doing a lot of talks and at different venues and different conferences. Yeah, getting feedback from customers. Transactions paper, actually, the way we decided was also, I would say, customer-driven. We wrote the paper on DynamoDB. I was just looking at how the response has been on different blogs. A lot of blogs had this theme where people were asking like, “Oh, I wish there were details about how transactions are implemented in DynamoDB.”

It was like, a bunch of people had left that comment. That's when we picked it up and we rolled this paper. We'll see how the response to this paper is, figure out what customers want and write that.

[01:04:13] Somu: Yeah, like Akshat said, it's mostly, what's going to be the next takeaway message here, right? For example, in Dynamo, we said these are the learnings from the past 10 years with transactions. We said, you don't always need a long running transaction on a NoSQL database. You can build a fast, scalable transaction with a single request transaction. The next one is going to be, what's the next takeaway message from us to the community in general? That's where we’ll be focusing on. Hopefully, soon.

[01:04:40] Alex: Yeah. I agree. I hope we see it. That point you're making about, what can we take away in transactions? I think both papers are very good at just being really thinking about user needs from first principles and be like, okay, you know what? We can – other things might have all these features, but you cut off this 5% of features and you actually eliminate a whole host of problems. As long as you're fine with that constraint, you can get a lot of other benefits as well. I think just the framing of user needs upfront in both papers is so good and helpful in understanding how this is working. I love that.

Hey, Akshat, Somu, thank you for coming on. I respect you both a bunch. I love Dynamo and I'm really grateful for you coming on to talk today.

[01:05:22] Somu: Alex, super thanks for having us, by the way. You're one of the biggest DynamoDB proponents. Your book is – It's probably a lot of reference to lot. Super thanks for having us. It's a privilege to talk to you about the transactions paper.

[01:05:39] Akshat: Yeah, same here. I think you have been doing amazing work and I've been following you for a long time. Thanks for all the great work that you do.

[01:05:48] Alex: Cool. Thank you. I'll link to the paper, but everyone be sure to check out the paper, because there's a lot of great stuff we didn't even get into here. Make sure you check that out.