Scaling MySQL with Sam Lambert from PlanetScale

Aug 30, 2023

What would it look like if databases were built for developers rather than operators?

Sam Lambert is the CEO of PlanetScale, a company that provides a managed MySQL database solution. PlanetScale uses Vitess, a database clustering system that allows for horizontal scaling of MySQL. MySQL powers an incredible amount of the internet, and Vitess is behind enormous MySQL installs at YouTube, Slack, GitHub, and more.

In this show, we talk about the architecture of Vitess, what it's like to manage upgrades and releases of high-scale databases, and how to maintain a high-performance culture.

Segments

The Two Parts of PlanetScale [00:02:50]

Part A: Managed Vitess -- horizontally scalable MySQL for enormous users
Part B: Developer-friendly database tooling for all -- better schema change management workflow, PlanetScale Boost (materialized cache with automatic incremental updates), PlanetScale Insights (query visibility + metrics)

History & current maintenance of Vitess [00:09:10]

Originally developed at YouTube
'Cloud-native': built for hostile environment (Borg) where machines could come and go for any reason
Only graduated database in CNCF (Note: TiKV, a storage backend for databases, is also graduated)
Maintenance and release process for Vitess project

Vitess architecture [00:14:54]

Core components:
- VTGate: Proxy layer to storage nodes
- VTTablet: Sidecar process for MySQL storage nodes
- VTOrchestrator: Admin process to monitor + repair cluster
- VTStream: Changelog of write operations across the cluster
Data is sharded across VTTablets
- Each shard has a single writer node with 1+ replicas
- Similar to DynamoDB + others; contrast with Cassandra

Vitess limitations + tradeoffs [00:33:44]

Will naturally see a small bit of extra latency (~1ms) as compared to single-node MySQL to account for the extra hop through the VTGate proxy
- But: allows for unlimited connection scaling, high availability, etc.
Foreign keys: Not currently supported but in development
Transactions: single-shard preferred, but cross-shard is allowed
Consistency: repeatable read in transactions
Many-to-many queries: Not terrible if using the same shard

From Vitess to PlanetScale [00:42:10]

All PlanetScale users get a Vitess instance, even if not sharding
Online migration process from MySQL to PlanetScale

Cloud-prem delivery model [00:48:55]

"Cloud-prem": SaaS but in your own cloud account
Why PlanetScale does not offer on-prem
Future of cloud-prem

Pricing of infrastructure services [00:56:50]

PlanetScale offers both -- a usage-based model based on rows read + written or a fixed model based on instance size
Why different companies like each plan
Future of infrastructure pricing

PlanetScale as a high-performance culture [01:02:12]

Outside evidence of high-performance:
- Cadence of shipping major features
- Reliability of service
- High-quality educational content (blog posts, Aaron Francis's Database for Developers course)
- Good design aesthetic
Maintaining a high talent bar
"No passengers."
"Experts leading experts."
Large companies and a better metric of success
Bad managers and their effects

Sam's talents + career evolution [01:10:08]

Staying technical while climbing the executive ladder
Appreciation for design

Talking about competitors [01:16:02]

Sam is challenged to say one nice thing about some competitors, including:
- PostgreSQL
- Amazon RDS
- Amazon Aurora
- Neon
- Edge-based SQLite databases (Turso, LiteFS on Fly.io)

Transcript

[0:01:17] Alex: Sam. Welcome to the show.

[0:01:19] Sam: Hey, thank you for having me.

[0:01:21] Alex: Yeah, absolutely. Glad to have you here. You are the CEO of PlanetScale. You've got a lot of great experience at GitHub and other places. Can you give listeners just a bit of your background and what you do, your history, things like that?

[0:01:35]Sam:Yeah. Like you said, CEO of PlanetScale. Before this, I've worked at two very significant companies that were very significant in tech. I did a short stint at Facebook on the traffic engineering team, which was just an awesome experience, seeing that's the scale of Facebook. You can imagine the scale of Facebook and then quadruplet and it's still not there. It was just absolutely incredible.

Then before that, I was at GitHub, which also another site at scale, I think around the time I was there, it was the 32nd largest website on the internet, I think. I was there for about eight years. Started as the first database engineer at the company when the company was still small, back in the no managers culture, very flat organization and just a very disruptive, interesting and cool tech company to be at. That's I would say, where the formative years of my career was spent. Now I'm at PlanetScale trying to solve some of the problems I've seen plague every company that I've ever been at.

The Two Parts of PlanetScale

[0:02:50] Alex: Yeah, absolutely. I'm sure you've seen some pretty amazing stuff. I'm excited to talk about PlanetScale. Just for the users, I think of PlanetScale as two different things, right? It's like, originally, I think of it as manage the test, which is a horizontally scalable MySQL solution. If you have these hyper-scalers like GitHub, Facebook, YouTube, somebody that has terabytes and up of data, now they need to horizontally scale their MySQL. That's what the test is doing, like an open-source solution, but PlanetScale is providing a managed version of that.

Then in addition to that, there's just a bunch of other delightful database add-ons that seem to have nothing to do with the test, but are available to all your users. That would be like, hey, a better workflow for schema changes, thinking about non-blocking changes, branches, deployments, reverts, things like that. Also, PlanetScale Boost, which came out recently-ish, right? Which is a materialized cache with automatic incremental updates all the time.

PlanetScale Insights, which just gives you very good query visibility compared to other solutions out there. I guess, my question is, was it always this way? Or did you start off as like, “Hey, we're going to manage Vitess and then realized we can provide a lot of stuff to other downstream customers as well”? What's the story there?

[0:04:10] Sam: Yeah, the original vision for the company was the Vitess itself is just extremely powerful database technology, of which there's not many like this on the market, or not locked inside a single cloud platform. There's a lot of power to be able to take a technology like this and deploy it anywhere across clouds. It was originally managing that. That picked up some really great logos and customers of Vitess users.

If you sent a Slack message today, that's gone into a Vitess database. Slack have been very vocal about how they're huge for test shop. GitHub's primary database is Vitess. Then we have people, like HubSpot, Roblox, Etsy, all of these companies that have been very vocal about their use of Vitess. That's one thing that first of all is just really cool to see contributions from major companies running this technology at scale.

Means that the rising tide raises all boats. You get bugs, you get decent incremental features built on top of the platform that's useful to a broad audience built by very, very talented engineers and maintainers. Then we wanted to see way past that vision, which was built – our first tag line was database for developers. Everyone, I think, was like, well, what were databases for? Yeah, so it was a slightly tongue in cheek. Truthfully, I don't think many databases have been built for developers. They've been built to operate and been built for operators, but the developer experience and the daily life of using most databases sucks. We really wanted to improve it. I did a little post on this recently talking about developer experience, and developer experience is not really just like flipsy fun, little things and a little avatar.

[0:06:19] Alex: Dark mode.

[0:06:20] Sam: Yeah, dark mode or whatever. It's doing what you're supposed to do for years and years and years on end. Bringing a database product to market that has great developer experience requires being a great and proven tested database and doing what it's supposed to do. Having Vitess foundation enables us to do that. Then we built all these incredibly cool features on top that make operating and using the database really joyful even. We came out with database branching. We even patented that. It was just crazy that we patented it. They were able to in 2021, not that they would have probably used the pattern, but it's just a marking a territory, saying this was us. We came out. It was crazy that it was done so late.

I put my head up after being at GitHub and realizing, databases just have not moved. They were just still doing the fundamental things. I'm proud that we've brought a huge shift in terms of the minimum bar for features. You can see that by pretty much every other competitive company is scrambling to add these features, while we build the next step.

[0:06:08] Alex: Yeah, did you have, when you were at GitHub, or teams at Facebook, I imagine, did they have branching-like workflows? I remember Ghost coming out from GitHub to help with database migrations. Did big teams have those tools to make branching pretty easy, but it's just not available for the masses?

[0:07:25] Sam: We never quite went as far as branching, but we did get to good staging environments. I think what the impression that GitHub left on PlanetScale is the unrelenting drive towards shipping software constantly and deploying software constantly. When I joined the company, the company had grown really rapidly. This was around the time that we raised that record beating series A round, the company’s bootstrapped, which is winning hearts and minds of developers everywhere. Things had grown really quickly in the database.

There was the usual patterns and things and that come together inside the database. That had to be fixed, but it was we never had the option to fix infrastructure problems while slowing development down. That to me was such an amazing, tough, but amazing lesson to learn, because it taught you the appreciation for shipping velocity and what shipping velocity does for a company. When you see things like branching, unlike deploy requests, all these things in the plants, it's all in service of you shipping really quickly and taking away fear of the database. We really believe in just push it to prod, push it to prod. If your database allows you to do that, then you will gain huge speedups and impact for you and your user base.

History & current maintenance of Vitess

[0:09:10] Alex: Yeah. What's the history of Vitess? Is it YouTube that developed it? Is that right?

[0:09:15]Sam:Yeah. YouTube is growing rapidly. It's now the second most visited website on the internet. Second most traffic search engine, and has what? Two and a half billion monthly active users. They were using MySQL naturally. I mean, if you look at the top 100, the internet is pretty much exclusively powered by MySQL. They were scaling up and they needed a solution to go and shard. It was originally like a proxying project, then became sharding and all these things got built into it.

Yeah, they just did an outstanding job. It was built on the earliest version of Go. If you look up in the Go history that they talk about the early Vitess creators being instrumental to the Go project even, because they were giving feedback on Go 0.1. What people don't realize, it's one of the oldest Go projects in existence. It was also built on Borg, which is the predecessor to Kubernetes. It was built in this highly hostile environment. Everyone was used to running databases with raid controllers and it's like, “Oh, we're going to recover that machine and bring not in the Kubernetes, or Borg world.” That meant that they built a extremely resilient “cloud native,” database technology, which could recover from pretty much any type of failure. Then they put very smart Google engineers, spending nearly a decade to go and build it. Now we build on top of that extremely strong foundation. It trickles down in so many ways and why it's important. Yeah, it was picked up by Slack, picked up by GitHub, picked up by so many other companies and became the ubiquitous MySQL scaling solution out there.

[0:11:03]Alex:At GitHub, before pulling in Vitess, did you have a home built horizontal scaling solution, or I guess, what did that look like?

[0:11:13] Sam:We were doing the classic pattern that everyone does, which hopefully is dying. Not at GitHub, but I mean, in general, as an industry. Well, first of all, we ran our own data centers, which is still quite of a good advantage, because we were able to buy very expensive, fast database servers. When I joined, we had three database servers. Now it's a significant amount more. It got partitioned. We would end up taking the largest tables that were always causing problems. The notifications table was a pain, the stars table was a pain, the statuses table was a pain. Statuses table, at the bottom of a pull request, you get all those checks. It's like, all of the different statuses a pull request can be in, just hammered the database constantly.

We moved them out into their own clusters, so they freed up the rest of the database infrastructure in terms of buffer and contention and resources. This is a really common pattern. We see this a lot. A lot of people come to us having split up their main database cluster. Problem is it's not always better for availability, right? Even if you have high availability in those separate clusters, you're pinning a lot on three or four masters across three or four. We were doing that. We built some awesome tooling, right? We continued the maintenance orchestrator with [inaudible 0:11:24]. We built ghost. We built lots of really ski-free, lots of really cool open-source projects on top of MySQL. Yeah, like you mentioned, you see the DNA of those are now in Vitess, and they're in pilot scale, the product itself. We've been fortunate enough to take it way further than we even imagined at GitHub.

[0:13:01]Alex:Yeah. What does active work look like on Vitess right now? I imagine PlanetScale employees are major contributors. Are you the leading contributor? Is that still some YouTube folks? How does that breakdown?

[0:13:14]Sam:The project leads are primarily at PlanetScale. We craft the project. We manage the releases. We maintain Vitess, essentially, even though it is a CNC project, CNCF project, it was I think, I think it still remains the only graduated database in the CNCF, and was a very early project to do so alongside Kubernetes. Yeah, we maintain it. It's a lot of work. We have to manage releases very carefully. When we deploy our software, we know it's going out to billions of users. One of the biggest sites in China runs on Vitess. It's serious to go and release database software. You can't mess around. You can't just yeet it out to production. It goes through rigorous testing. Every push gets tested against every modern framework.

We have all of these acceptance and regression tests, then it's rolled out. But companies like Slack, they have maintainers there. They're still very active maintainers. The Slack, even from hyper-scale, back in MySQL scaling solution, we still have a Slack with 3,000 people in it. We have the monthly maintain cause, weekly check ins with all the maintainers. This is a real big active open-source project that gets – has a huge impact at a lot of companies. We get a great set of folks show up every time to manage the project. Then we have the features in it that get released, or setups for new features that are coming for PlanetScale. It's a big operation to maintain Vitess.

Vitess architecture

[0:14:54]Alex:Yeah, absolutely. Okay, let's talk Vitess a little bit. Walk me through the high-level architecture. What are the components in a Vitess cluster?

[0:14:59]Sam:There's a few set of different components of Vitess. Firstly, VTGate, which is what could be known as a proxy, I guess, is the easiest way of describing it. VTGate is what terminates your connection, and determines the query routing, and where it has to go. Vitess sits on top of MySQL. We run real MySQL under the hood. That has a number of benefits. First of all, MySQL is exceptionally proven and trusted. It's like, what? 26, 27-years-old now. Incredibly reliable. Even a decade ago when I was playing around, I've slammed MySQL servers with write, ripped the power cord out of the server, and everything that was acknowledged was still there. It's incredibly robust and good and trustworthy.

We build on top of MySQL, but we have a layer on top of proxy and layer, which means we can't support everything that MySQL supports. We're working on the last few things that don't make us a 100% compatible. Foreign keys is one of the major ones. We get a lot of noses turned to us for not having for foreign keys. But we've solved it. We just have to keep ramping up towards being production grade, which is the very high bar.

[0:16:25]Alex:That VTGate, is that just emulating the MySQL parser? Or is it using the MySQL parser at all under the hood and hooking into that? Do you have to basically replicate all that MySQL parsing logic?

[0:16:38] Sam: Yeah, fully emulated in Go.

[0:16:41]Alex:Okay.

[0:16:43]Sam: It's not an easy task. Luckily, a lot of time is spent at Google just running down all of those edge cases. The majority of cases, unless you're doing something weird, your app is pretty much going to work. Then certainly for the benefits you get from sharding, it becomes worth – it's not like a giant. More on the PostgreSQL side, there's some really interesting PostgreSQL databases that are PostgreSQL wire compatible, but not fully compatible, which causes people some weird edge cases in the long run. We try and avoid that. Like I said, we test all of the frameworks that are out there and popular to make sure they work well and foreign keys is really one of the last things that we have to fix.

VTGate looks at the query, looks at what has to be done to serve that query. Then breaks up among shards if necessary to go and retrieve that data. You can have VTGates horizontally scaled. That's one reason we can claim unlimited connection scalability. What we mean by unlimited is if you keep adding VTGate nodes, you can keep adding connection resources, basically. We have some people running literally millions and millions of connections to their databases. That itself, we talk about developer experience, that itself is a subtly extremely difficult problem. Most people have to run proxies in front of their databases to terminate and handle connections. Then if you're using lambda, or various worker architectures, you can really thrash the database with connections, which can be very painful, and can just exhaust and waste a load of resources on your database, basically.

[0:18:46] Alex:Yeah, absolutely. When I'm in the PlanetScale dashboard, and I look at my database, and it says like, load balancers, and it shows my database, is the load balancer, are those the VTGates essentially?

[0:18:56]Sam: Yeah.

[0:18:57]Alex:Okay.

[0:19:00]Sam:Then we have an edge network that routes you to those right, correct places, and you'll see on there's a bunch of benchmarks out there that show us to be the fastest, or one of the fastest, because we have a fully deployed global edge that then routes you to the right VTGates.

[0:19:16]Alex:Then what are the storage nodes themselves? What are they doing? What are they called?

[0:19:19]Sam:Storage nodes, you have VTTablet. It's a sidecar process to MySQL that receives all sorts of events and serves queries, and can rewrite queries and can protect and buffer the database, all next to MySQL within a shard.

[0:19:47]Alex:Got you. Okay, and that's running MySQL. Then running the latest version of MySQL, is it a pin to some previous version? What is that? What's it run there?

[0:19:54]Sam:Oh, so that's one thing we're quite proud of, is because of how advanced the architecture is, and we can fail over and manipulate the cluster really simply without outages. We keep people vary up to date. We manage rollouts on very large customers to help just make sure that they don't have regressions. Most of the time you're going to be running on the latest, if not very close to the latest version of MySQL.

It’s so funny, if you're an RDS user right now, the only online way to migrate from 5.7 to 8 is to move to PlanetScale. Within Amazon, you can't online migrate to the latest version without an outage, without downtime, which is bonkers.

[0:20:43]Alex:That's quite amazing to me. How many RDS users they are, I can't believe they couldn't build some of that migration tooling themselves. Yup.

[0:20:51] Sam:Well, it's the power of being the incumbent. Have everyone trapped.

[0:20:55] Alex:Yeah. If I'm a typical, say I'm on the scalar program, I'm not an enterprise that has something more managed and custom, but are you just automatically upgrading Vitess and MySQL for me, or my clicking buttons to say, upgrade to the latest?

[0:21:09] Sam: We do it all for you. Any PlanetScale user will not have realized that we continually upgrade their software for performance improvements for all sorts, without them ever noticing the impact. It's not easy. It takes a lot of work and a lot of automation to do these upgrades online. We have customers that have come to us purely because they couldn't solve that themselves, right? That's great. That's cool. That's value we provide. We do tens of thousands of failovers every week, not because the software is failing, but because we use them as a mechanism to upgrade, resize, repack, do all of these various things that we do for our infrastructure. It just works very well.

[0:21:53]Alex:Okay, that VTTablet, or sorry, VTGate, which is the proxy layer, it's parsing that query and figuring out which shards it needs to go to. Does it rewrite that as a MySQL query again, or is it able to pass in the parse query to the MySQL instance?

[0:22:08]Sam: It depends. It sometimes does rewriting, because it might have to scatter across a number of shards to go and aggregate the data and pull it together. Sometimes, often a very common rewrite is it adds limit clauses, so that you don't just go and thrash the database with too many rows that you need. It does all these things that help guard against errant application behavior, or rems looking at you. We love our rems, but they can get a bit unruly. There's a whole number of things that it does. The team that build the Vitess query engine in the parser are just geniuses, and I wish I understood with the depth that they do, but the work they do is absolutely outstanding. Writing a query planner is very, very hard.

[0:23:04]Alex:No kidding. No kidding. I can't even imagine. Okay, so that high level architecture, so we basically have the stateless proxy layer that has some config about where these shards are assigned, and then all those shard layers that run the storage nodes. Is that a pretty common setup for – if we're talking about any of these distributed databases, whether it's Cockroach, or Amazon Aurora, or Dynamo, or Cassandra, is that the same high-level architecture? Are there any major differences between Vitess and some of those other ones, or even groupings of patterns there?

[0:23:36] Sam:I think at a very high level, they're all going to have that same push down to storage node, low balance somewhere. We've chosen for a shared nothing sharded model. Some folks do an automated sharding model where they KV across around a lot of nodes that has certain tradeoffs. We choose to have a single writer node per shard, rather than multi-master. That's an interesting tradeoff. It's a tradeoff made for scale and safety, primarily, right? If you have multiple masters, then you have to do conflict resolution. If you're doing that across a network with latency, you can dramatically slow down the throughput of your database.

[0:24:27]Alex:What are the ones that are doing multi-master? Would that be Cassandra? Are there many others?

[0:24:31]Sam:Cassandra do it, but then they have various consistency tweaks. Cockroach do it. Spanner does. But again, they all have these tradeoffs under the herd about really understanding where you're placing your data, how you're doing it. With us, each shard follows a primary machine, basically. That means you can have very, very fast writes that get sent over. They get sent over semi synchronously, which means they have to be acknowledged by at least one other node in the cluster, so you know the data is safe, but you're not trying to gain quorum among three, or six, or whatever nodes.

[0:25:14] Alex: Yeah. Tell me about that on the flow of a write request. When it's going to come in, it's going to hit the VTGate, route it to ideally, one VTTablet and –

[0:25:20]Sam:Straight to my – yeah. Straight to the MySQL node.

[0:24:22]Alex:The primary.

[0:25:28]Sam:Another part of the architecture that's really important is VT Orchestrator, which is always watching your cluster and making sure that it's always in the correct state to serve queries. It always has a master, always has at least one replica, or as many as you've configured. Make sure if that master disappears, replica gets promoted, new one is created. It's always making sure that your cluster stays in the correct state to go and do the job you need. This is where a simpler, it's much more easy to reason around. We wrote to one node within a few milliseconds it appears on another node, versus we have two copies of our application talking to two separate databases, and now they are conflicting.

That inside a code base to reason around those types of problems is much, much harder, and is really not needed by many people. If you have a low volume metadata store that has to be strongly consistent absolutely everywhere, you can choose some of these architectures and they'll probably work for you.

[0:26:42]Alex:What does what is the typical topology look like for a big service, and especially within a shard, how many replicas are we talking behind that primary? Are they same region? Are they same data center? Are they cross-region? What does that look like?

[0:26:59]Sam: By default, every parts Go database is multi AZ. We put replicas across each availability certain, should an availability is on go. We have a very multi-layered approach. We don't trust EBS fully, right? Aurora, for example, they rely on EBS's underlying replication to replicate the data. We do two foot per node, but then we have multiple nodes that have multiple EBS volumes. We're not sharing single volumes across the database. We are making sure there is an isolated copy on EBS, or the Google Cloud version per AZ. Meaning, you can lose it completely, you lose an availability zone, or an EBS volume inside your cluster and your data is completely safe and replicated elsewhere.

[0:27:50]Alex:Yeah. You mentioned semi-synchronous replication there. When that write comes in, hits the master, has to be acknowledged by at least one other one. If all my shards have say, 10 replicas, or something like that, are all of them in that candidate group for semi-synchronous replication? Or is it a group of three that are the primary and the replicas that are responsible for that and the other ones are made only just like re-replicas purely.

[0:28:19]Sam: It's tunable. Really, only needs to get to one other server normally. It's not waiting for all 10 to receive that. This is the beauty of sharding. Most people's sharded setups have one or two other replicas. Because you're breaking your data out across so many servers, and it's horizontal across these shards. Your failure domain gets to be much smaller. If you've got hundreds of terabytes of data, but it's strung across a 100 shards, for example, when a time, when if when if a failover happens, it's a tiny blip for a very small subset of your users, and it comes back online, rather than sharing tons of infrastructure, which means you can have global outages. That's bad.

[0:29:13]Alex:Yeah. When you see someone that has lots of replicas per shard, do you say, “Hey, you should actually be sharding more, like smaller shards?” Or is that just different patterns? Sometimes more replicas is better. Sometimes more shards is better.

[0:29:28]Sam:That's what we'd recommend. Then everyone has their version of the Justin Bieber shard. The Justin Bieber shard is a shard set up that was the Instagram. He basically was on his own infrastructure, because he's the top user until a bunch of issues were solved. Everyone's got that shard that has that really important customer on it, or whatever, that they may beef up a little bit. Normally, we recommend keeping it uniform and sharding smaller, which is not a difficult operation.

[0:29:59]Alex: With Vitess, is it like, can I do isolation of shards that small, if I have a major customer get that – it's not like, I have this key space that's going to be equally divided, I can make it a really narrow one?

[0:30:12] Sam:You can get very, very granular in terms of how you want to pin and place shards and place data. You can even shard by geographical regions and have data pinned to various regions. People use that for GDPR and nightmare situations like that.

[0:30:33]Alex:Then can I make queries that don't use my shard key?

[0:30:37] Sam:Yeah.

[0:30:39]Alex:That will just scatter gather hit every node?

[0:30:42]Sam:Correct. Not the most ideal, but it's completely possible. Yeah. For some people, it's not even ever a problem. We help the larger customers tune their queries to hit shards more uniformly.

[0:30:00] Alex:Yeah. Okay. Tell me about VStream. How does that come in? What is it? How do you all use it?

[0:31:10]Sam: VStream is an incredible technology. VStream powers our connect product. It powers parts of Boost. VStream, basically, is an enhancement to my SQL replication that allows us to stream data pretty much across shards. Oh, it's not built on top of MySQL itself, part of the Go application. If you imagine a re-shard, a massive re-shard where you have to completely change your shard key, it's a tough operation. Because it means you have to stream data across disparate shards into a different cluster and re-shard it. Some people do that routinely for their query plans.

It was also the message query infrastructure at YouTube as well. It's just a very robust streaming architecture that allows us to be incredibly flexible with the data. We haven't even got close to surfacing the power of things like VStream into our product. We will get there in terms of making even more powerful primitives. There's some extremely cool things that Vitess users even know about the Oscars, when does that come in the product.

[0:32:30] Alex: When can we get that? Yeah. Okay. VStream is basically just streaming out the any write updates in my table. Where do I, or where do you hook into VStream? Is that at the VTGate level? Is that at each VTTablet level, or maybe the primary for a shard? If I want to process that, how am I connecting to that?

[0:32:50]Sam: You can connect to it. If you want to connect to the VStream right now, we actually send it out to you via connect, which can give you – it's an API that can give you constant streaming. You can put that into Kafka, you can put it anywhere. That's how we do insights as well, which is how we capture queries and we stream them to our data warehouse. We just expose that for you. You can listen to the aggregated stream of the whole cluster. That's one thing that's really, really important. It's not even per shard. It's the entire clusters change stream, which is, again, really powerful when you have a sharded system.

[0:33:26]Alex: Wait, and so when you're talking about insights, are you doing, I guess, read queries also end up in the stream, so I could look at – Is that right?

[0:33:36]Sam:No. I don't think reads the part.

[0:33:38] Alex: Okay. Just inserts, or updates up. Yeah. All updates.

Vitess limitations & tradeoffs

Okay, very nice. I want to talk a little bit about just Vitess limitations almost, or just the fact that now it's a distributed system that has multiple nodes, what can and can't be supported, or just changes? Tell me about transactions. Are transactions supporting Vitess? What pattern? Are they single-shard only, or are they cross-shard? What can that do there?

[0:34:04]Sam:Single shard is easy, well supported. There is a cross-shard transaction implementation that is certainly slower and could do with a bunch of improvements. It's not something we're focusing on heavily right now, because most people if they're at scale, they don't need it and they're much better off localizing the transactions within the shard anyway, but it is possible, although just not completely recommended.

I think the limitation that is probably most apparent to most folks is to the wider, or widest group of folks is you get a tiny bit more latency, because of the extra hops that we have to put you through. If you're just comparing a straight up, like create RDS database on Amazon with PlanetScale, there's going to be a little extra latency on PlanetScale, just because of those extra hops.

Now, it's an unfair comparison, because with that RDS setup, you don't have anything that's hardly available. You don't have a proxy in front of it. You don't have any of those things. You don't have real connections going. When you compare them, it's the same, but you do have this default extra hop that we can't really take away. But we return for you and return for that hop, unlimited connection scaling, high availability, all of these things that make it very easy to not think about your database.

It's a thing. It's like, if you're doing a 1,000 really bad queries, or have a horrible N plus one on your page, you might notice it. I think the idea is that you should really architect away from that being a problem.

[0:35:47]Alex:Yeah. What does that look like? Say, I'm running a query that is just going to hit a single shard, it's not going to be a cross-shard transaction, or a scatter gather. If it's just hitting a single shard, are we talking sub-millisecond addition for that hop there?

[0:36:01]Sam:Yeah. Yeah, it's going to add about a millisecond, I think to the – That's just the connection coming in to get to the database server. Everything you do on the database server then will be – isn't taking extra hops.

[0:36:15]Alex: Yeah, okay. Okay, what about, what sort of consistency does PlanetScale provide? Yeah, if I'm reading from a primary, or in also if I'm reading for –

[0:36:27]Sam: You get you get repeatable read, isolation layer level. We don't do serialized isolation, obviously, because it's very slow and it wouldn't be necessary for the one way that we're replicating the data. It pretty much mirrors what you get from MySQL.

[0:36:48]Alex:Yeah. Also, just single-item read consistency, that be priced around consistency, if I'm hitting the primary, but also if I'm using a replica, then some eventual consistency.

[0:36:59]Sam:Yeah, if it's been committed anywhere across the cluster, you can – it has been committed. You're not reading anything dirty, or that is uncommitted.

[0:37:09]Alex:Yeah, got you. Are there certain patterns that you recommend people avoid and scatter-gather type queries? Also, like many to many does, is that something you try and get people away from, or any other types of patterns, given that you could hit those crash shard queries?

[0:37:15]Sam: Many to many is not terrible when you're, again, within the same shard. Scatter-gather is the main one we try and move people away from. Or sometimes people do a lot relying on single rows, like updating single counts is obviously very, very painful for any database server and lock contention. Normally, it's just like helping people get away from some nasty query patterns that can be very detrimental to the database. We make that really easy to find through PlantetScale insights. Yeah.

[0:37:56]Alex:Yeah, let's move on to insights, because I think it's so interesting, because I know a lot of other databases are okay, only turn on logging for super slow queries, because you might have some performance, but you all have all the queries basically available for that.

[0:38:09]Sam: Correct.

[0:38:11]Alex:Yeah, what's going on there? That's because of VStream?

[0:38:14]Sam:"Yeah, we stream that data out to a data warehouse. We capture every single query. It means that we can do really solid analysis. We just did a really good blog post about how we store time series data actually in shadow MySQL. We basically keep a sketch, like a pattern of eight days trailing of the performance of each individual query. We've seen trillions of these at this point. It means we can, like you said, if you attach data dog, or whatever, it's like sampling your database, right? It's not getting every single query.

Anyone who spent enough time scaling anything has spent time hunting for a specific database query that they can't find. Because when you have a server that's doing, I don't know, 50,000 queries a second, finding those one or two that are really detrimental is really, really tough. Well, we see all of them. We can then surface the ones that really, really matter. It's very unique in what we do. There's actually so much of the current roadmap is about making insights incredible.

Part of our strategy as a product is we have to have a number of things in place. In terms of primitives, one of the primitives being deploy requests. We had to have a mechanism that we can suggest changes to your database. We needed something that we can get in the loop and that there's a workflow attached. We didn't just go and say, we're building a database backend, we're building a whole load of workflows that you could only build when you have the database backend. Insights is a key way of informing what begins workflows. It is our most favorite feature, even among giant customers. One of our customers very, very large, runs hundreds and hundreds and hundreds of terabytes on PlanetScale. Then, they have one guy that wakes up and have his morning workflows open the insight page, find the slowest query, fix it.

Now, just if you're lucky enough to have a tool that just tells you this is the worst query, you're already winning. Then just having people that just fix them routinely, you constantly keep your app feeling extremely quick. It's just great. I mean, it's one of the best ways to continually optimize your database. That will just get more and more optimized over time. We're nearly done with a bunch of features that will just make database administration just a thing of the past.

[0:40:46]Alex:Yeah, that's amazing. Is it hard to do aggregation by query signature to say like, “Hey, this query is actually the same as this query, even though it has some different parameters,” or is that pretty straightforward?

[0:40:54]Sam:Yeah, it's not easy. The post talks about that a little bit, actually. Some of it is a bit – you have to take the fingerprint of those query and these queries, we're even learning similar queries by intent as well, in terms of what they're actually trying to gain. It's not easy. But it surfaces really amazing data when you have it, like showing people when that query showed up, how many times if we debug something, it's like, oh, now I'm finding out, like looking through code to find out where and why a query showed up. We can show what connection around that query, where it came from.

There's also cool things that we support. You can use a key value format in query. I don't know if people realize you can add comments to SQL queries. We then pass those key values and allow you to just tag in size. You might find a slow query in insights and see that it's coming from some worker, because it's tagged that, and then people even tagged down to request IDs and things.

[0:42:01]Alex:Or method even just be like, “Hey, it's this method. It's this request ID,” all that stuff, and it’s so much easier to find. Yeah.

[0:42:04]Sam:Yeah, we do exactly that. With the margin area jam, you can see action and the view and the controller is great.

From Vitess to PlanetScale

[0:42:10]Alex:That's amazing. Okay, so that's a good time to switch to point of scale from pure Vitess. Just to be clear, if I come in, I'm using scalar, if I'm using hobby, or scalar, or scalar pro, no matter what, I'm going to be using Vitess under the hood, even if I don't have my queries, my cluster sharded. Okay, how hard is it if I do want to switch up to sharding later, now I'm at terabytes of data or something like that, and I want to shard. Is that pretty easy to just make this a sharded system now? Is that a button click? Is that a, “Hey, we got to call you”? What's that look like at the meeting?

[0:42:44]Sam:At the moment, it's going to become a button click. Right now, because the majority of customers that need sharding are quite large. We're usually talking to them anyway. If they're migrating from a legacy architecture, I mean, some people are still moving out of data centers onto things like PlanetScale. We have to help them with some of the edge cases. We've got to the stage where it's pretty much scripted in terms of discovering what the best shard key would be for you by looking at your queries. Then we basically help you. Yeah, it's very cool.

[0:43:16]Alex:It's an insight to game. Yeah.

[0:43:17] Sam:Yeah. We'll get to the point where it will be a button click. Right now, we want our bar for user experience is very, very high. Because it's at the moment, a fairly personal process to go through with based on your application, we spend a bit of time with you figuring out. A large amount of the proof of concepts we do with customers is just, we deliver them a sharding scheme. This is how your data will be bucketed. But it's not terrible. It's way worse than switching database. That's usually, anyone who's gone and picked, like right now, if you're listening to this, and you've gone and picked a database that hasn't run a huge scale, you might be either short on your business, or you might have a complete database rewrite in your future. That's usually the path people go through and it's hell. When you're doing that, you're spending a year not shipping. You're having outages. This is why we've done PlanetScale. I mean, that database scaling operation is agony.

[0:44:32]Alex:Yeah. You mentioned earlier moving from RDS, MySQL 5.7 to 8, or something like that, you can do a zero downtime. Can you do other migrations very easily? I'm thinking, especially non-MySQL migrations, can you do a PostgreSQL to MySQL migration? Is that a more custom process that take a little bit –

[0:44:51]Sam:It's custom. One of our biggest customers has actually came from PostgreSQL. It's not crazily difficult if they're not using some of the edge case features of PostgreSQL. The data import normally has to just be done by manually. Well, not manually, but nibbling out the data and putting into PlanetScale database. If it's from a MySQL database, we can do it, like I said, fully online.

The online migration process from MySQL is really cool. Basically, we connect to the MySQL node. You give us the credentials to the MySQL server. It's all encrypted. We don't be crazy to allow your data to go over the open internet without encryption. Basically, it's VStream again. Every other database provider you see, and you go to their documentation, we didn't want this for our users, the migration process starts with dump your database. Not everyone, especially a lot of developers that haven't spent their time don't know the tradeoffs with just dumping a database. They do not realize that you have to capture a point in time that you can replicate from.

It's a difficult process. It's just not easy to go and do. People want their import docs to look as easy and clean as possible and don't describe some other dangerous tradeoffs that can be made by doing things this way. We didn't want that. We wanted it to be seamless and truly an easy migration. The way you migrate to PlanetScale is you connect to what, say an RDS database. We nibble the data out. We never dump. There's no dumping of data involved in this process. This is again with VStream, can incrementally build replicas and copies. It just nibbles the data slowly out of your database until it's completely caught up to date and replicating.

Then we tell you, “Ta-da. PlanetScale is up to date with your previous infrastructure.” This is when it gets really good. We create you some credentials on PlanetScale, and you connect your application to those. You redeploy your app using these new credentials.

[0:47:12]Alex:New connection stream. Yeah.

[0:47:13]Sam: What we're doing is we're proxying rights back to your RDS database. You're hitting PlanetScale. You're reading from us, but we're proxying rights back to your old database. That's step two. Then step three, you just say, it's time to fail over. What we will do is we will swap the roles. We will make that RDS database a replica of us. We will then, using our proxy, transfer your rut buffer and transfer your rights to the primary on PlanetScale. Within three steps, you have achieved, a fully zero downtime migration, without doing really anything, so having to change, give us some credentials and change your applications credentials.

[0:48:01]Alex: That's pretty amazing. You talk about the dump process and people to understand it. Is that mostly a performance issue that they don't understand? Like, hey, if you go try and do a full dump on your production database, you're going to be just really screwing it up, or whatever issues are there with that?

[0:48:15]Sam:Well, yeah. You're very often, we'll just lock your database. You have to make sure you capture the right checkpoint for replication, basically. You have to discover you're in the right place for bin logs, for example. If you're not, and you replay from there, you could lose data. You can have all of these catastrophic problems, or you can get to an issue where it's just arduous and difficult. If the bin logs aren't fully turned on, don't have enough time to replicate, then you're going to miss certain transactions. It's really, really painful.

Cloud-prem delivery model

[0:48:52]Alex: Yeah, absolutely. Okay, earlier you talked about, you have some customers that are moving from on-prem. I also saw in an article you had last year with future.com talking about this cloud prem deploy model. Tell me a little bit about that.

[0:49:12]Sam:There's multiple reasons cloud prem is really cool and why – To on-prem first. There’s a lot of databases in data centers, and we don't want to provide an on-premise version of our product for a myriad of reasons. Number one, being we want to provide the absolute best database service we can possibly provide. Without our engineers and people being able to remediate the infrastructure, not possible. Upgrades, for example, if you want the latest PlanetScale feature, you get it, because we roll it out. If you have an on-prem PlanetScale, you have to beg your system administrator to go and upgrade your on-prem PlanetScale. We just don't want to get into that world. We want the magic of a SaaS platform. You just log in, it's gotten better.

I love to see the tweets where someone's like, “Oh, PlanetScale added this thing.” It's like, great. Without any user having to mess around. That's one reason we weren't doing on-prem product. It also makes you ship very slowly. If you're doing on-prem and cloud at the same time, it's really difficult to match your – you're doing what proper releases here. Then you're doing continuous releases. It just becomes a mess and a bad experience for everybody. Then we see a huge trend of lots of people having databases in the data center wanting to migrate to the cloud.

There's very few one hop ways to get into the cloud. What a lot of people do, and you'll be familiar with this is they take a data center architecture, and they just scoop it up and drop it into the cloud. They're just like, “Oh, it's just like other people servers. It’s just terrible.” When you got a database server next to you in a rack, or just two racks over, the latency difference and the availability profile is much, much different, right? When you're running raid on machines that you can go and recover and fix problems on, it's very different to volumes that may never come back.

[0:51:11]Alex:Just gone.

[0:51:12]Sam: Or across a network that you don't control. Because Vitess is built so natively to the hostile environment of the cloud, and because it's compatible with MySQL, and we can just replicate from one to the other, you give this olive branch from the old world and you help them pull into the new world, or bridge. This is bridge from the old to the new. Whereas, most people are migrating huge amounts of data out of the data center, they're having to re architect at the same time to meet the needs of the cloud, or they don't and things just get worse and more expensive and they probably should never have done it in the first place. That's why we have a lot of customers that move from the data center, because we give them a way of just replicating from one into a completely modern architecture by default. It's obviously very, very attractive to them.

We also have customers in regulated industries, or finance work that is regulated. But just generally, they would rather, data be secured in their own account. What does that lever? We could either spend decades of just constant trust building to get to that stage, or we can provide a cloud prem solution. What the cloud prem solution means is you give us a sub account in whichever cloud you use. Our backend gets provisioned there. You still use app.planetscale.com to create deploy requests, to delete a branch, or whatever you need to do, look at your insights, but the data lives inside your account.

The customers love it, because their security teams, they’re like, yeah, it's all inside our cloud account. We can monitor access. We can see what's happening. Then the operators and the developers, they love it too, because they get a fully managed database service. It just feels like another part of the AWS ecosystem, but usable by everybody.

[0:51:53] Alex:Yeah. Do you think this cloud prem idea, is it a particular moment in time where you have these on-prem companies, and they either have the regulatory requirements that you're talking about, or maybe just the cultural desire to be able to see it and touch it in their AWS account? Or do you think will in 30 years, cloud prem is still going to be a thing that people want where they want it in their account, where we could see?

[0:53:32]Sam:I think they're going to still want it. They also have massive commitments. Some of our customers have billion dollars of commitment to Amazon, or whatever, right? This helps, right? The more they can put into their account, the better they can get in terms of discounting. Also, just when this is possible, there's not going to need to – They’re getting all the benefits of cloud, it still runs inside their account for security, you know what I mean?

Yeah, I don't know. It'd be nice. But I can't imagine, tech moves wonderfully quickly in some areas and incredibly slowly and others. It's going to be a long time to convince government agencies, or banks to store their data into anyone's. I quite honestly, I would rather not have my data just shoved in some random database startups. There's a lot of them out there. We go very heavily on trust. We have all of the certifications, and we make trusting us really easy with all of these various layers and primitives.

A lot of the features, you don't see us tweeting about that larger customers experiences. We have a lot of features around security and securing people's data. Everything is encrypted at rest, in transit, you can't connect to PlanetScale database without a key. Everything is incredibly tailored towards security. It’s being one of the number one jobs of us.

[0:55:10]Alex:When you're running in that cloud prem model, are you running your own Kubernetes cluster? Will you do it on their Kubernetes cluster if they have one?

[0:55:15]Sam:We want the account to stay isolated, so we don't run on their Kubernetes. Or usually run on EKS, or the whichever cloud equivalent primitive is there. There's isolated within that account and orchestrated. They're not really supposed to touch it. It's not theirs to try and demonstrate. The operator itself handles that. It still remains to be highly available. All of the great things about the self-healing nature of Vitess, but the residency of that data is with them.

[0:55:54]Alex:Yeah. What about just, I guess, multi-tendency generally for PlanetScale? Everyone gets their own Vitess cluster. Are those running on shared Kubernetes clusters, or things within your own infrastructure?

[0:56:08]Sam:Yeah, it depends which. They're all obviously very isolated. The chances is next to zero. There's no way people can arbitrary run code, or breakout of the container isolation. If you are in the just sign up, and you run scalar, or scalar pro, you're going to be defaulted into multi-tendency. But again, exceptionally isolated. You can actually then, when you talk through sales, you can have single tendency, where we spin up a entire stack just for you. Then obviously, our managed customers that are using the cloud prem architecture, that is obviously also single-tenant.

Pricing of infrastructure services

[0:56:50]Alex:Okay, let's switch to pricing a bit. I know you have – you have the scalar model that's based on rose red and written like a DynamoDB, or FaunaDB type model there. Then now you more recently have the scalar pro that's more like, I'd say the traditional, here's how much compute and RAM you have. Was this a customer ask, like people didn't want to make that leap yet? Or how did this come about?

[0:57:18]Sam:The serverless pricing is very excited to a certain general – exciting to a certain generation of engineer and horrifying to others. We went out there with serverless pricing. For a number of reasons, it’s where modern generations are going, right? The serverless crew, they need a database that can scale to in crazy heights, unlimited, all the things we do. However, we found when you went to our pricing page and you were coming from a more traditional architecture, say running on RDS, you'd look at that pricing page and be like, “Oh, there's nothing here for me.” Which is not true. There's lots there.

We wanted to communicate that if you look at your RDS database and it has this many resources, you make this many resource on PlanetScale, and it will be fine. This is roughly how much it would cost. It doesn't get talked about in serverless much, but truly, businesses don't buy that way. Pretty much no businesses by that way. They have a budget. You budget for your infrastructure at the beginning of the year. If you're paying per request, budgeting that is so difficult. The pressure that puts on the back on the engineering team. It's like, well, what do you need us to ship this thing? Well, I can't plan how many – this features query outputs ahead of time, right? People want to set a budget, buy that amount, and then that's it. Then normally, that's doable and much easier when you're doing resource-based pricing, which is what scalar pro is. Actually, we've been really surprised with it’s only been out for three weeks and it's mind-blowing to me how many people have bought scalar pro databases.

[0:59:13]Alex:Yeah. Do you think we'll move more towards the users model, or do you think this is always just going to be the case of just like, “Hey, people want some predictability, especially about these large infrastructure costs, like your database?”

[0:59:22]Sam:I think we may get some usage fully everywhere, but not for a long time. It's like, where the abstraction layers are, right? Yeah. I mean, this new generation of developer, if you've been a developer for two or three years, this is not pejorative. A lot of people say these things negatively. I don't think they're negative. I think it's progress that this is the case. People have never racked servers. They don't think in CPU. It’s like, really, what does the V CPU mean when you're buying a managed service? It doesn't always mean something to everybody.

I think as that generation grows, and matures, and become leaders themselves in software, yeah, we'll see this proliferation of usage based. We'll see tooling come alongside. It will help people – you get tools to help predict your usage. There's all of these edge cases. Auto scaling is almost mythological in some ways. It's not truly real in a way that people believe. You can't paper over that with certain amounts of pricing and that dream of, “Oh, my infrastructure can burst to 5 million QPS and back to zero.” It is not real. No software has achieved that.

The servers pricing hints at that being possible. Maybe as services just get more and more abstracted and talk less about the underlying infrastructure being servers. Things will get more usage-based. Right now, it's just a sensible abstraction that people remember and understand.

[01:01:01]Alex:Yeah. Is there an architectural difference behind the scenes if when someone signing up for scalar versus scalar pro? Do you have to set up that Vitess cluster differently?

[01:13:02]Sam:No. Just how you're paying us.

[01:14:58]Alex:It’s just the ongoing abstraction. Yeah.

[01:17:10]Sam:We can help you figure it out. Some people mostly figure out on their own, which is cheaper, right? Some people have constantly, randomly, bursty workloads. It's good for them just to provision a bunch of hardware that's way above that burst. Other people have workloads that don't burst as much, that are lower traffic, or are just very consistent and predictable and it might just be cheaper to pay the scalar way.

[0:48:12]Alex:Can I switch my database from one to the other?

[0:53:17]Sam:Yup. Yeah, and we even have seen people switching, right? Just to see how they're built. I guess, they're checking how their build shakes out if they just pay for the hardware to be sat around. It will save some people money. It will be more expensive for others. We've proactively worked with customers to make sure they're on the cheapest and fairer version.

PlanetScale as a high-performance culture

[1:02:01]Alex:Yeah, sure. Okay, I want to switch gears a little bit into some, I don’t know, softer stuff, a little less tech. But you've written on the importance for maintaining a high-performance culture. I think it's hard to understand the culture of a company from the outside, but you can look to clues on whether they're high performance or not. I think some good clues from PlanetScale will just be the cadence of shipping major features, like Boost and insights, things like that. I would also say, just from what I can see on the outside really, really good reliability, like you don't see the Hacker News stories about PlanetScale that you do about some other providers.

I would also say, just a really strong and delightful design aesthetic, like around your site and in your social stuff, all that stuff. Then just really good educational content, whether that's regular blog posts, like Aaron Francis's course, all that stuff, I would say would be really good sign. How did you get that, or maintain that, or what principles do you have there for other people looking to do something similar?

[1:03:10]Sam:If you want a really high-performing culture, it has to be an active choice from the top. Because the other one, you can't push it and no one else can push it with insight inside your organization. It has to just be not a nice to have. It has to just be a non-negotiable part of the culture. We all have aligned, people kind of cross in terms of how they speak to each other, how they do – every company has that. Every company also has a talent bar and a performance bar. You just have to set that. It's completely arbitrary how you set it. Literally, it comes down to just what you're willing to tolerate or enforce. You just set that high and you try not to deviate from it at any point.

Every company I've ever seen that's gone off the rails is because they lower the talent bar. Average people with too much time can just run amuck around an organization and ruin it. Everyone's worked on a team, but with that lazy person undermines you. You don't then follow your passion and work as hard as you wish you could, or want to, because it's really hard to know that someone's going to just share in the win.

One of our company values is no passengers. We mean that. We're here to build a software company and a business and be wildly successful. Everyone who works at PlanetScale to be wildly successful. That means you can't tolerate passengers and people sitting around. You have to be very careful. It has to be part of the culture. We get this feedback routinely from employees. They say, “Yeah. I mean, it just feels good knowing that we will really try and address any problems that come up and we don't tolerate messing around.” It's not what we're here to do. It doesn't serve our users. Talked about our reliability. We take things like that extremely seriously. To do that, you have to have exceptional people. We work really hard in keeping and retaining successful and exceptional people.

[1:05:17]Alex:Has maintaining that talent bar, is that hard, or just the fact that you're working in enormous scale, hard technical problems, is that easier than maybe other companies have it?

[1:05:31]Sam: It's hard to start. If you have problems, like these types of problems interest smart people usually. They gravitate to difficult infrastructure problems and databases are very, very difficult. They’re extra difficult when you try and you're very complimentary about our taste and design and our style. We see most database companies, they do the back end piece. I'm not diminishing how difficult that is. But then they pat themselves on the back and don't go far enough to make it.

We try both. We want both have to have that rock solid foundation. It has to be delivered beautifully, and so it makes the challenge even harder. People love that challenge. It's tough. It's fun, so that attracts good people. When good people work somewhere and they enjoy it, they tell other good people who come along as well. Every time we open a role, we just get an incredible amount of people come and apply to work here. Because you get that reputation.

In the long run, I think it's the laziest and best way to run a company. What could be easier than just a small group of incredibly talented people? I see companies that have nowhere near the size of customer we have that have got engineering teams four or five times the size of ours. They must just be a flabby waste land of nonsense to have that many people. What are they doing?

[1:06:58]Alex:Yeah, yeah. How many people work for PlanetScale now?

[1:07:00]Sam:We have 85 people at the whole company.

[1:07:02]Alex:Under a 100. Given some of the workloads you're running and just the shipping cadence, that's pretty incredible.

[1:07:13]Sam: Most people want to run giant, big organizations and puff their chest up. I am more proud by every dollar we make per head, right? I want the best ratio of dollars per head of any company in the world would be my mission. It's just, it's the best way of doing it. You produce better products. You produce better experiences. Yeah, the company will grow and will continue to grow, but very, very deliberately. At the end of the day, people have to judge us by our service and the quality of the work that we do. I think the easiest way to get that done is have fewer people, fewer heads in the room.

[1:07:56]Alex:Yeah. You've published some notes on how you do management at PlanetScale. You've also mentioned that managers ruin companies. How does management ruin companies? Is it by pulling in average people, or tolerating average results? Or what does that look like?

[1:08:15]Sam:A manager can wage a war of attrition against a company, no matter how large the size is. They're normally not busy enough. They're normally not experts, or craft experts at what they do, so they don't understand the work. They try and measure everything by these impact. I know people are going to be like, “What? It's not about it.” It's definitely about the impact for work. Another one of our company values, experts leading experts, you have to be an expert in what you do. All of our managers are outstanding. The reason they are managers is because they love the work that their team does and they know by being a really good manager, more of that work gets done. It's just a proxy for doing more of the really cool thing.

When you get non practitioner managers, and they're just professional management, they're just the manager class. They just do management. Your org just goes into disarray, because they can't produce the work. Then they have to control people that do produce the work. They have to make up arbitrary evaluation criteria for those people, which leaves those people feeling disenfranchised. If you work for someone that doesn't truly understand how you reasoned around something, or made the tradeoffs you make, you just get a little pat on the head and then ask what impact you had.

I've seen some incredible engineers at companies just get completely railroaded, because their work couldn't be quantified by their boss. That's tragic. You just see a lot of companies just completely fail. I always upset people when I say, when I share my opinions of management on Twitter. People get really grumpy. Also, a lot of people agree. Yeah, I have my opinion.

Sam's talents + career evolution

[1:08:49] Alex:Yeah, yeah. When you talk about experts leading experts, as you've continued to climb up the executive ladder, like VPN, and then chief product, chief executive, how do you still stay sharp and technical? Do you do some hands-on stuff, even if it's in your spare time, you just keep up with design docs, or do you just know your spots more and where you aren't leading experts, or an expert leading the expert anymore?

[1:07:08]Sam: Yeah. I would hate to horrify our engineering team by submitting any of my work to. Here's the model ways. I know, I try and acknowledge the things I don't know, right? I do not spend time in the weeds, technically, as much as I could do at PlanetScale, because I have other things I should do, right? We're here to build a business. There’s unique things about my role that I should do. I gain a technical appreciation. I love technology. I still code in my free time for little personal projects, or build silly things, or websites, or just automation for my home is one thing I mess around with. I still write code, at least once a week.

You got to be careful. There's another profile of like, I was technical want style leaders that are like, “I was an engineer once,” and they try and write code for their team. It just becomes this embarrassing. I try not to do that. I also just hold technical conversations. Our VP of engineering is outstanding, where he communicates the changes we're making. You can just track along. Once this stuff once, you track along with the changes. You understand the tradeoffs.

The other thing is the role I have to play with the organization is understanding that the very technical things we have to do for a database, then we have to translate that into design and visual design and marketing and brand. Because I'm always in the middle of that, I very much appreciate brand. I very much appreciate our marketing and beautiful design. Again, that only comes from it being cared about at the top, because otherwise, very technical companies, you can tell that their designers are getting steamrolled. You have to hold all of those things in balance and appreciate all of those things, which means you think about them a lot and you learn about them a lot.

[1:11:15]Alex:Yeah. Is that something you've always had natural? I agree, like PlanetScale has really good design for an infrastructure company. If you told me like, “Hey, former MySQL DBA is their CEO,” I just wouldn't expect great design, or no offense to you. But you really have great design aesthetic and how much – is that something you've had a talent for that you've picked up over the years, or you just are able to find great designers that can help pull that out? Because I've worked at some technical companies that did not have that feeling.

[1:11:49]Sam: I've always appreciated design and style ever since I was young. I did best at art at school. I've loved art. My house is absolute teeming with art. I've always loved and appreciate aesthetics and style and fashion. Even though nowadays, all I wear is black t-shirts and a day of bass hat. I've always had a strong –

[1:12:14]Alex:Even that is great fashion, right?

[1:12:15]Sam: Yeah, maybe, maybe. I've always, just always appreciated – I love bags, for example. I have a ridiculous collection of bags and shoes and all of these various things. I just love aesthetic things, things that – these things about the iPhone, which is just a technical masterpiece presented so simply. I've got this obsession with power and simplicity and how hard it is to present incredibly powerful things simply. It's always just fascinated me. Then GitHub fully solidified this for me, that it is possible to build a thriving and exciting company while caring about aesthetic. The early engineers and designers of GitHub had the most perfect taste.

I don't think I could quantify what that magic was, because you don't have to when you have taste. Seeing the lengths that we went to to not ship things that were ugly, or shoddy was truly inspiring. We carried that over to PlanetScale, but it's something we all really care about. It's also, again, why you have to be careful with the talent bar, because you can't democratize taste. That’s why arguing with people on the internet, this is pointless, because they just don't see it the way you see it, right? You just have to hold it in front of their faces like, is this ugly or bad? Yes, or no? Some people want to hide their product in.

I mean, I'm just very proud of PlanetScale. We can take you to petabytes of data and you'll never have to write YAML. The things people put their users through is abysmal. We just try really hard not to do that. Over the long run, that really pays off. It's put MySQL in places that historically never would have been.

Well, first of all, I see our website. Very often, you see these top 10 best websites, or technical websites. It's Linear, Us, Notion, Stripe, GitHub, and you're just immensely proud. We have a technology that is most relevant – was started off as being most relevant to giant, or enterprises with giant scaling problems. The fact now we've got hundreds of thousands of developers using the product for the most ripping heart call projects is because we delivered incredible developer experience and taste and style. That that's all changed. It's how things are done nowadays.

Talking about competitors

[1:16:02]Alex:Yeah, yeah. I love it. That's super inspiring. Cool. I love this. I want to close off with a fun segment. I like your Twitter persona. It's very much like a blunt say to the point. It’s been on this podcast as well. What I want to do is I'm going to give a competitor. I want you to say one nice thing about this competitor, as each one as I say it. Let's start off with an easy one. Say one nice thing about PostgreSQL.

[1:16:59]Sam:Incredible community.

[1:16:37]Alex: We talked about this a little bit before we got on, but why do we see so many more managed PostgreSQL providers, or forks, or PostgreSQL compatible things compared to MySQL, even though MySQL runs so much more of the internet ranks higher and DV engines rankings and things like that?

[1:16:54]Sam: Well, you see, I think it's because people want to console it. They want to consolidate around a protocol. I think you're seeing a lot of PostgreSQL compatible databases. I think that's the difference. There's very few that are running pure hosted community PostgreSQL, like Superbase does, right? A lot of it is a lot of new generation databases thinking that, well, let's go where it's most applicable to have the compatibility and they mostly choose PostgreSQL. They'll then implement the parser, or they'll implement the wire protocol.

The thing is it's heavily fractured and it makes choice really, really difficult. I think it's something that is probably something to be concerned about in the long run for the community, because the PostgreSQL community is outstanding. If it's just getting torn in millions of directions, because every company cynically just wants to use the protocol because it's popular, I don't know what that makes PostgreSQL in a decade's time, right? Especially as we shift into the cloud. I think it's very interesting. They have a great community. The plugins community is fantastic. They've just made tradeoffs that I wouldn't make building a database.

[1:18:11]Alex:Yeah. Okay. All right, next one, you want to have to say something nice about Amazon RDS.

[1:18:13]Sam:The payments always work.

[1:18:16]Alex:What about Amazon Aurora?

[1:18:25]Sam:Good swing at a fundamentally bad architecture. That wasn't a compliment. Sorry. It's a good swing.

[1:18:31]Alex: Good swing. Yeah.

[1:18:33]Sam:At a certain level, I think it's got a number of very nice features. When you get to the upper limits, it's horrible though.

[1:18:43]Alex:Can you tell briefly what's the issue there?

[1:18:45]Sam:Well, that architecture, that shared everything across the file system is great when you want an instant read replica. When you want that 16th instant read replica, then you're in agony, because it doesn't scale past 16 nodes, or whatever. Or when you want to scale writers, single writer, or when you want the latest version of MySQL, it takes forever for them to update it, because they're not running real MySQL. You know what I mean? It's like these little things.

They do a lot for you. It's a lot of really cool stuff, but then the edge cases are brutal. We want a customer that found out the wrong way that Aurora cross region fail, it was a one way. It's like, the documentation, they bury that, you know what I mean? It's like, does Aurora have multi-region? Yes. Everyone's like, surely that would be two-way, right? Then they find out it's not. You know what I'm trying to say? It's just all these weird edge cases you get with the file systems like that. It's not like we don't have edge cases, we just try and communicate them a bit more loudly.

[1:19:41]Alex:Yeah. Yeah, sure thing. Okay, another shared everything infrastructure, at least shared storage, Neon, which is serverless PostgreSQL.

[1:19:51]Sam:Again, I think it's an interesting swing at an approach I wouldn't take.

[1:20:00]Alex:What about, last one, the edge-based SQLite databases? Turso, LiteFS, Fly.

[1:20:07]Sam:I think these are really exciting. Again, wouldn't use them for a big serious application, but there's a lot of smaller applications where people should just be using SQLite. I think it could be really interesting. There's a lot of database abuse that happens to people wedging models into poor traditional databases, where we have to – we have all of our constraints of high variability and data consistency, all these things that probably don't need to. We can just use SQLite.

[1:20:38]Alex:Yeah, yeah. Absolutely true. Well, Sam, I appreciate you coming on. It has been a great conversation. If people want to find out more about PlanetScale, about you, where can they find you?

[1:20:49]Sam:@isamlambert on Twitter, or planetscale.com's blog is probably the best way to see what I have to say.

[1:20:58]Alex:Awesome. Yeah, I highly recommend a follow for Sam on Twitter, if you want some just hot takes and real talk on databases and also, just good news from – seeing the cool stuff that comes out from PlanetScale.

[1:21:11]Sam: Thank you.

[1:21:12]Alex:Sam, thanks for joining today. I really, really enjoyed the conversation.

[1:21:15]Sam:Thanks, Alex. Thanks for having me.