All sessions of Transformation 2021 are now available on demand. Now look.


Was it a few years ago that terabytes were a huge dataset? Now when every random gadget from the Internet of Things is done by “calling” a few hundred bytes at a time and every website wants to track everything we do, it seems that those terabytes are just not the right unit. The log files get bigger, and the best way to improve performance is to study these endless records of each event.

Rockset is a problem that is facing this problem. It is dedicated to bringing real-time analytics to real work so that companies can absorb all the information from event streams as they happen. The company’s service is built on top of RoxDB, an open source, key-value database built for low latency ingestion. Rockset has configured it to handle an endless stream of bits that modern, interactive-heavy websites must see and understand to ensure they are performing properly.

VentureByte sat down with Venkat Venkataramani, CEO of Rockset, to talk about the technical challenges created in this solution. His approach to data was largely fabricated in the role of engineering leadership at Facebook, where a huge number of data management innovations occurred. In the conversation, we specifically clicked on the database that is at the center of the Rockset stack.

VentureBet: When I look at your webpage, I don’t really see the word “database” very often. There are words like “querying” and other verbs that you usually associate with databases. Does Rockset think of itself as a database?

Venkat Venkataramani: Yes, we have a database created for real-time analytics in the cloud. Databases were a type of database when they came into existence in the 1980s. It was a relational database and was used only for the transaction process.

Shortly after, almost 20 years later, companies had enough data that they wanted more powerful analytics to run their business better. So the data warehouse and the data lake were born. Now move 20 years faster from there. Each year, each enterprise produces more data than Google suggested in 2000. Every enterprise is now sitting on a lot of data, and they need real-time insight to create better products. Their end users are demanding interactive real-time analytics. They need professional performance to be repeated in real time. And that is what I will consider in our attention. We call ourselves a real-time analytics database or a real-time indexing database, essentially a database built from scratch in the cloud to power real-time analytics.

VentureBet: What’s the difference between traditional transactional processing and your version?

Venkataramani: Transaction processing systems are usually fast, but they don’t [excel at] Critical analytical questions. They perform simple operations. They just make a bunch of records. I can update the records. I can make it a record system for my business. They’re fast, but they’re not really built for computational scaling, right? They are both for reliability. You know: don’t lose my data. This is a source of my truth and a record system of mine. It provides point-in-time recovery and practical consistency.

But if they all require transactional compatibility, transactional databases cannot run a single node transaction database faster than 100 seconds per second. But we are talking about data torrents that do millions of events per second. They’re not even in the ballpark.

Then you go to the warehouses. They give you scalability, but it’s too slow. Too slow to get into the data system is like living in the past. They are often hours behind or even days behind.

Warehouses and lakes give you scale, but it won’t give you the speed you expect from a record system. Real-time databases are what demand both. Data never stops coming, and it will continue to come in torrents. It’s going to come in millions of events in seconds. That is the purpose here. That is the ultimate goal. This is the market demand. Speed, scale and ease.

VentureBet: So you are able to add index to the mix but at the cost of avoiding some transaction process. Are, at least for some users, choosing to trade in the solution?

Venkataramani: Update. We’re saying we’ll speed you up like an old database, but leave the transaction because you’re writing real-time anyway. You don’t need transactions, and it allows us to scale. The combination of a converted index with a distributed SQL engine is what makes Rockset fast, scalable and fairly easy to operate.

Another thing about real-time analytics is that the speed of queries is also very important. This is important in terms of data latency, such as how fast the query comes into the system for processing. But more than that, query processing should also be faster. Let’s say you are able to create a system where you can collect data in real time, but whenever you ask a question, it will take 40 minutes to come back. No sense. My data ingestion is fast but my queries are slow. I’m still not able to get visibility into it in real time, so it doesn’t matter. That’s why indexing is like a means to an end. The end is very fast query performance and very short data latency. So quick queries on fresh data is the real goal of real-time analytics. If you only have quick questions on stale data, it’s not real-time analytics.

VentureBet: When you look at the world of log-file processing and real-time solutions, you often find elastic search. And at its core is Lucien, a text search engine like Google. I’ve always thought that elastic was a kind of overkill for log data. How much do you imitate Lucien and other text-search algorithms?

Venkataramani: I think the technology you see in Lucien is very compelling for when it was created and how far it has come. But it’s not really built for this type of real-time analytics. So the biggest difference between Elastic and RoxDB comes from the fact that we support full-featured SQL, including Joins, Group Buy, Order Buy, Window Functions and everything you can expect from a SQL database. Rockset can do this. Can’t search elastic.

When you cannot connect to datasets at the time of the query, there is a large amount of operational operational complexity laid on the operational parameter. That’s why people don’t use elastic search more for business analytics and use it mainly for log analytics. One of the great properties of logging analytics is that you don’t have to connect. You have a bunch of logs and you need to search through those logs, there is no join.

VentureBet: The problem becomes more complicated when you want to do more ,?

Venkataramani: OK. For business data, connect all of this with this or that. If you cannot connect to datasets at the time of the query, then you are forced to normalize the data at the time of ingestion, which is difficult to deal with. Data compatibility is difficult to achieve. And it also comes with a lot of storage and computing overhead. So Leucine and ElasticSearch have something in common in Rockset, such as the idea of ​​using indexes for efficient data retrieval. But we’ve built our real-time indexing software from scratch in the cloud, using new algorithms. Implementation is entirely in C ++.

We use converged indexes, which deliver both what you get from a database index and what you can get from an in-search index in the same data structure. Leucine gives you half of what the converted index will give you. The data warehouse or column column database will give you the other half. Convergent indexing is a very efficient way to create both.

VentureBet: Does this converted index extend to multiple cummins? Is it a mystery?

Venkataramani: A converted index is a general purpose index that has all the advantages of both the search index and the column lamer index. The default color formats are data warehouses. They work really well for batch analytics. But the minute you get into real-time apps, you have to do spinning computing and storage 24/7. When that happens, you need a computer-optimized optimization system, not a storage-optimized optimization system. Rockset is computer-optimized optimized. We will give you 100 times query performance, as we will index. We build a whole bunch of indexes on your data and, byte-byte, the same data set will take up more storage in RoxDB – but you get extreme computing efficiency.

VentureBet: I noticed that you say things like connecting to your traditional databases as well as event backbones like Kafka Streams. Does that mean you can separate data storage from indexing?

Venkataramani: Yes, that is our approach. For real-time analytics, there will be some data sources such as Kafka or Kinesis where the data does not necessarily reside elsewhere. It is coming in large quantities. But for real-time analytics you need to connect to these event streams with some system of records.

Some of your clickstream data may come from Kafka and then be converted to a quick SQL table in Rockset. But it contains user ID, product ID and other information that needs to be linked to your device data, product data, user data and other records that need to come from your records system.

That’s why Rockset also has real-time data connectors with transactional systems such as Amazon Dynamodib, Mangodib, MySQL, and PostgrassQL. You can continue to make your changes to your record system, and those changes will be reflected in Rockset in real time. So now that you have the real-time tables in Rockset, one coming from Kafka and one coming from your practical system, you can now join it and do analytics on it. That is the promise.

VentureBet: That’s the technologist’s answer. How does this help non-tech staff?

Venkataramani: A lot of people say, “I don’t really need real time because my team sees these reports once a week and my marketing team doesn’t have them at all.” The reason you don’t need this anymore is because your current systems and processes don’t expect real-time insight. When you go to real time there is a time when no one needs to look at these reports once a week anymore. If any discrepancies occur, you will immediately page. You do not have to wait for the weekly meeting. Once people go in real time, they never go back.

The real value prop of such real-time analytics accelerates the growth of your business. Your business is not running on a weekly or monthly basis. Your business is really innovative and responsive all the time. There are windows of opportunity that are available to improve something or take advantage of the opportunity and you need to respond to it in real time.

When it comes to technology and databases, this is often lost. But the value of real-time analytics is so vast that people are just turning it around and accepting it.

Venturebet

VentureBet’s mission is to become Digital Town Square for technical decision makers to gain knowledge about transformative technology and transactions. Our site provides essential information on data technology and strategies to guide you as you lead your organizations. We invite you to become a member of our community for:

  • Up-to-date information on topics of interest to you
  • Our newsletters
  • Gated thought-leader content and discounted access to our precious events, e.g. Transformation 2021: Learn more
  • Networking features and more

Become a member