All sessions of Transformation 2021 are now available on demand. Now look.


Let OSS Enterprise Newsletter Your Open Source Travel Guide! Sign up here.

While data ponds and data warehouses are similar in concept, they are ultimately very different animals. If a company wants to have easy-to-use query structured data for anyone, a data warehouse is probably its best bet. Conversely, if a company wants to leverage big data in its purest, most flexible form, they are mostly looking for a data pool – in its original rudimentary format, there are unlimited ways to query this data as business needs evolve.

However, large data pools that generate petabytes of different datasets can be undesirable and difficult to manage. And this is a problem that startups want to solve with a new open source platform called Trivers LakeFS, designed to help enterprises manage their data leaks in the way they manage their code – “Git your object budget storage – Like the repository, “as the company puts it. This means version control and other git-like operations such as branch, commit, merge and retrieve; and full reproducibility of all data and code.

“The number one problem solved by LakeFS is the management of large-scale data pools featuring many datasets that are maintained by many – on this scale, many workflows are familiar to people from the beginning of the breakdown,” said Trivers Cofunder and CEO Inat Ore Venturebit. “Operations like Git released by LakeFS can solve these problems, in the same way that Git allows many developers to collaborate on a larger codebase without causing code quality issues.”

Before the Tel Aviv was founded in 2020, Trevors flew largely under radar, but today the Israeli company announced that it had raised 23 230 million in the series from Dell Technologies Capital, Norvest Venture Partners and Ziv Ventures. . These funds will be used to accelerate both the development and adoption of LakeFS in enterprise data teams, while users have already been sued at companies such as Volvo, Intuit, and Sarkhiweb.

Above: LakeFS Data Lake “Repositories”

How it works

As an open source platform, LakeFS is flexible and can be deployed to the cloud – AWS, Azure or Google Cloud – or space. It also works out-of-the-box works with most modern data frameworks including Kafka, Apache Spark, Amazon Athena, Delta Lake, Databrix, Presto and Hadop.

But where does LakeFS sit in the data stack, right? And what other devices can fit in that stack?

Modern enterprise data stacks typically include a variety of tools, including datatrain smarts from companies like FiveTrain and cloud-based Data Lex or data warehouses such as Snowflake or Google’s BigQuery. The process of pooling data from multiple sources (e.g. CRM and marketing tools), consolidating it into a standard format to make it easier to run queries and analytics against it, is usually done through “extract, change, and load” (ETL). Where data is converted before entering the warehouse or by “Extract, Load and Transform” (ELT), where demand changes within the data warehouse or pond.

LakeFS sits between ELT technology and Data Lake. “Integrating ELT technologies with LakeFS enables you to write new data in a designated branch and test it to ensure quality before contacting customers.” “This provides important assurances about product data to customers of workflow data.”

Above: Where the LakeFS sits in the stack

Products that exist in a market comparable to LakeFS include machine learning operations operations (MLOPS) tools, such as DVC, developed last month by a company called Iterative.com, which raised 20 million. However, they are primarily aimed at data scientists building machine learning. “LakeFS adopts a holistic structured approach and provides data version control capabilities across all data providers and customers through the application they use.”

Elsewhere, open table storage formats, such as Datbrix’s Delta Lake, offer something similar to allow “time travel” (return to data in the previous form) on a per-table basis, although LakeFS enables this on a full data repository that includes various tables. Stretch to thousands.

Data play

Significant activity has occurred in the late expanded data engineering space. Fishtown Analytics recently renamed itself DBT Labs and raised 150 150 million in funding to help analysts transform data into warehouses at a 1.5 billion valuation. Was obtained. And GitLab recently launched a new data integration platform called Melta as an independent company.

One thing all these trading companies have in common is that they are built on open source projects. And so, when any young VC-backed company pulls its open source vehicles the most obvious question remains: what is your business model? For Trevers, the answer to that question is that there are no immediate plans to monetize right now, although, of course, the long-term plan is to build a commercial product on top of LakeFS.

“Our goal is to develop the open source project and promote the living community around it.” “Once we achieve the target there, we will focus on providing an enterprise version of LakeFS that offers general premium features such as managed-hosting and predefined workflows that bring best practices and ensure high quality data and resilient pipelines.”

Venturebet

VentureBet’s mission is to become Digital Town Square for technical decision makers to gain knowledge about transformative technology and transactions. Our site provides essential information on data technology and strategies to guide you as you lead your organizations. We invite you to become a member of our community for:

  • Up-to-date information on topics of interest to you
  • Our newsletters
  • Gated thought-leader content and discounted access to our precious events, e.g. Transformation 2021: Learn more
  • Networking features and more

Become a member