All sessions of Transformation 2021 are now available on demand. Now look.
Like others, Ugabyte is a database company that is building high performance distributed databases to support large, geographically distributed cloud workloads. However, Ugabyte didn’t quite start from scratch. At the core of its code is PostgrassQual, an open source database with history spanning several decades. But PostGresQL was originally created to run on just one computer, so Ugabyte’s teams recreated the pride.
Venturebite sat down with CTO and co-founder Kartik Ranganathan to understand what the company has borrowed and what its team has created to create the tool. Ranganathan tells the story of being closely involved in the first wave of modern NoSQL activity as an engineering lead in Facebook.
This interview has been edited for interview and clarification.
VentureBet: In effect, you are creating a large copy and sharded version of PostgrassQL. Why Postgres?
Karthik Ranganathan: We see that Postgrass is actually the fastest growing database. It’s happening for a number of reasons, but I’ll focus on just three. No. 1 reason is that it is completely open. Those features are very transparent about the roadmap. Another reason is that it is a true open source database that any modern cloud companies can choose and run without having to worry about paying for Oracle or DB2 or SQL Server. And no. Reason because it is a fairly feature-rich open source database. It has features that really match other databases like Oracle, DB2 or SQL Server.
VentureBet: So how did Ugabyte set about changing that?
Ranganathan: Modernizing the app is really easy. There is a reason why we chose Postgrass. We are also completely open source. We reuse the upper part of the postgrass completely so that we are almost postgrass compatible for the fault. In the sense that if you have an app running on Postgrass, it just runs. But you need to figure out how to run it well in a distributed substrate. So our message that we’re trying to get across is that if people choose Postgrass to run the app in the cloud, we’ve worked to run Postgrass in the cloud. If you expect to grow an application in the cloud, whether you have high availability requirements or a built-in copy of the data model, these are things we can take care of exceptionally well.
VentureBet: I remember, 20 to 30 years ago, Postgrass and MySQL were both leaders. But MySQL really jumped in and became the foundation of the LAMP stack, which spread. Then it seems that in recent years, the postgrace has come to light and so began to generate more interest and therefore more excitement. Why do you think it is?
Ranganathan: First, 30 years ago, open source [databases were not] If you tell people, “Hey, here’s an open source database,” they’ll say, “Okay? What does that mean? What does this mean? What does it really mean? And why should I get excited?” And so on. More. I remember because on Facebook I was part of a team that created an open source database called Assandra, and we don’t know what will happen. We thought, “Well, here’s what we’re putting in the open source, and let’s see what happens.” And this is 2007.
Back that day, G.P.L. It was important to use such restricted licenses to encourage people to contribute and not just take content from open source and never return it. So that’s why so many projects ended up with licenses like GPL.
Now, MySQL has done a really good job of sticking to this workload on the web. They initially had a tier two workload. These weren’t too complicated, but over time they became very complicated, and the MySQL community got really well organized and gave them their momentum.
But over time, as you know, open source has become the key. And most pieces of infrastructure are beginning to become open source. Better to open, right? And [fewer] Restrictions mean that anyone can control the roadmap, anyone can contribute to it. If a big company wants improvement and no one has the time, they can invest in building a team around them. All this is made much easier with a very transparent and open community.
There is actually a day in the sun because of postgres, but also because postgres has an incredibly strong set of symptoms. When you compare it with Oracle and SQL Server and DB2 and the choice of triggers and stored procedures and partial indexes – a lot of complex features are built into it. Which will cause people to move these existing databases that are mostly running. Prime. If you want to run it in the cloud, you need to find a similar database that can support that application. And it just happened to be postgrace. If you associate the rise of MySQL with the rise of the LAMP stack, you can associate the rise of PostgrassQL with the rise of the cloud movement.
Venturebet: You mentioned that at the highest level, at the highest level, you are completely postgres-compatible. Does that mean you are at the bottom of the stored engine?
Ranganathan: It’s more than it really is. We have replaced the storage engine with other items, but we have made the database a full copy and very available. So failure is not really the only issue.
You can lower the upper part of the postgrass into the items that receive the query that do security checks and check the way you calculate the way the query is executed. And then, you know, go ahead and execute. We have retained it all.
What we have changed is not just the storage engine. It is also a replica engine. Your data may be sitting on one node or a bunch of other nodes, right? So this node just doesn’t need to understand that the data is in a separate storage engine. He also needs to know the location of different parts of the data. The second bit is now that your data has been recreated, if you fail you want to take some other nodes immediately. So you need to know how to fail on the right node to pick it up. It’s almost a dynamic race problem. And the third bit is around the system catalog. We have the location where the set of tables you created is stored. It is simply stored in PostGrass as a set of files. We really need to copy it and make it very available too.
And finally, we faced the problem [uncovered] When you create a table on machine numbers 1 and no. 2 should recognize it immediately. You cannot have this lag where the table says it is not there or you are triggering ALTER TABLE failed. We have to do all this kind of stuff when we change the bottom layer.
VentureBet: When I look at a lot of your literature, you push UgabyteDB as an SQL database. But you also have the NoSQL API. How does it work? Is NoSQL just a level that translates into SQL below? Or are they independent?
Ranganathan: It’s next door. That’s another major part of IP for us. Half of our team is Database Blood from Oracle, and the other team from the main team is Facebook, where we actually created the first few NoSQL databases, including Cassandra. I think our “aha!” The moment, after building both sides, is that it is possible to build a storage engine where the data format is the same. The query format in which you access data can be independent.
Our goal is to make it easy to create cloud-native applications. Naturally, we don’t want to take any side. We don’t want to say, “Look, we’re just SQL. You all no.sql [folks] He is doing it wrong. You need to go to SQL. “That message never works.
We said there is a real benefit to doing both. There are some things that NoSQL does that are really good. So we said, to create a complete database, we have to completely hybridize both sides. Picking up the SQL API and keeping all the nosculisms inside will take a very long time. That will be the case for many years to come.
Let me give you a simple example. If the SQL client driver – the JDBC driver – is aware of only one node, and you say, “Connect to this node,” it does everything. A NoSQL client is a smart client, where once you connect to a node, it will find all the other nodes. Find the nodes you add or remove. He will find the locations of these various nodes to say, “Look, this U.S. Is in the west. That U.S. Is in the middle. The U.S. Is in the east. I’m just a U.S. citizen. Can read from the west. “You can do all sorts of really powerful things with the NoSQL client.
Now it’s hard to hybridize these two because you need driver-level modifications on the SQL side, which is a key DB feature. It’s hard for a company to do that when making a catch. So we said we would follow an alternative approach, where we offer multiple APIs at the top of the database. We will create an extensible query layer that is more complete than the postgrace query layer. Of course, what we have is Express, but we also support the Apache Cassandra-compatible API. It is a completely different API, but the data is stored in the same storage. The replication methods are similar, but the patterns access pattern has been optimized for NoSQL.
VentureBet: Does that mean I can query SQL, select on a specific table, and it will find the right cumns and do it and then I can rotate and on the same table I just queried like Cassandra Can i
Ranganathan: Not at the same table. You can have an SQL table sitting right next to an SSQL table and you can have both of them practically compatible. All your replica, the rest of the encryption – it’s all taken care of for you. But not at the same table.
Our aim is to provide micro services that need either a tremendous scale and distribution or a great scale but also a tremendous amount of honesty. We can go both ways. But the reality is that your apps will appear perfectly one or the other. Either SQL or NoSQL.
VentureBet: You talked about practical compatibility. How can you maintain it in two different styles of tables? Get Cassandra-style Ugabyte Cloud Query Language (YCQL) on one side and SQL on the other?
Ranganathan: Tables can be multi-row transactional or single row. You can choose to do multi-row or multi-table transactions on the NoSQL side. We’re adding to that world – you can have an index, and those are the net new things we bring into that world. But on the SQL side, all tables are the default transactional to the highest degree. You can’t really choose to deal with SQL.
These two tables are silos that are related APIs. But you can use this related API. You can use postgrass foreign data wrappers to connect them. You can do interesting things. For example, you could declare an external table on the postgrace side to say “Look, it’s an external table that you can access.” You can do such things. But other than that, you can’t cross-access the data because we want to build the lowest race on both sides.
VentureBet: PostgrassQL has many extensions like geographic information or GIS tools. Can you work with them?
Ranganathan: They do. At least on the query layer, all extensions work. Not that which hits the postgrass storage layer because we replace the storage engine. So geographic information works, but we’re still building the GIST index. You can ask your questions, but the questions will not be effective today because we do not have GIST index support. That’s more of the lower half thing, right? We have to organize the data according to the GIS functions but once we do that, it will work beautifully. But the upper part is already functioning.
VentureByte: Do you think people are using one side of the API more than the other?
Ranganathan: Postgrass has set fire. It’s not even close. YCQL-side [NoSQL side] Big, but the sheer amount of use, the number of apps, and the number of people using it on the postgrace side are just incredible. It’s just amazing.
VentureBet’s mission is to become Digital Town Square for technical decision makers to gain knowledge about changing technology and transactions. Our site provides essential information on data technology and strategies to guide you as you lead your organizations. We invite you to become a member of our community for:
- Up-to-date information on topics of interest to you
- Our newsletters
- Gated thought-leader content and discounts of our precious events, such as Transformation 2021: Learn more
- Networking features and more
Become a member