Anirban Ghoshal
Senior Writer

Snowflake adopts open source strategy to grab data catalog mind share

news
Jun 3, 20245 mins
Data ManagementData Warehousing

With its plan to make its Polaris data catalog open source, Snowflake hopes the new offering will be seen as vendor-neutral, boosting its attractiveness when compared to Databricksโ€™ Unity Catalog.

Snowflake says it will open up the source code to its new Polaris Catalog, a strategy that suggests it wants to lure data catalog users away from rival Databricksโ€™ Unity Catalog while bolstering the attractiveness of its own offering, analysts said.

โ€œThe move to launch Polaris Catalog provides a competitive response to Databricksโ€™s Unity Catalog, thereby enhancing Snowflakeโ€™s value proposition, attracting a broader range of customers, and fostering a vibrant community around the new data catalog,โ€ said Jayesh Chaurasia, analyst at research and advisory services firm Forrester.

How Polaris Catalog is different from Databricksโ€™ Unity Catalog

Databricksโ€™ Unity Catalog, which was made generally available in June 2022 and was later updated with Okeraโ€™s capabilities the following year, is a closed-sourced unified governance offering that provides centralized access control, auditing, lineage, and data discovery capabilities across Databricks workspaces.

Polaris Catalog, released during Snowflakeโ€™s annual conference this week, offers similar capabilities to Unity Catalog, but is built atop the popular open source Apache Iceberg data table format.

โ€œWith Polaris Catalog, users now gain a single, centralized place for any engine to find and access an organizationโ€™s Iceberg tables with consistent security and full, open interoperability,โ€ Snowflake said in a statement, adding that Polaris Catalog relies on Icebergโ€™s open source REST protocol, which provides an open standard for users to access and retrieve data from any engine that supports the Iceberg Rest API, including Apache Flink, Apache Spark, Dremio, Python, and Trino among others.

The complexity and diversity of data systems, coupled with the universal desire of organizations to leverage AI, necessitates the use of an interoperable data catalog, which is likely to be open source in nature, according to Chaurasia.

โ€œAn open-source data catalog addresses interoperability and other needs, such as scalability, especially if it is built on top of a popular table format as Iceberg. This approach facilitates data management across various platforms and cloud environments,โ€ Chaurasia said.

Separately, market research firm IDCโ€™s research vice president Stewart Bond pointed out that Polaris Catalog may have leveraged Apache Icebergโ€™s native Iceberg Catalogs and added enterprise-grade capabilities to it, such as managing multiple distributed instances of Iceberg repositories, providing data lineage, search capability for data utilities, and data description capabilities among others.

Polaris Catalog, which Snowflake expects to open source in the next 90 days, can be either be hosted in its proprietary AI Data Cloud or can be self-hosted in an enterpriseโ€™s own infrastructure using containers such as Docker or Kubernetes.

ย โ€œSince Polaris Catalogโ€™s backend implementation will be open source, organizations can freely swap the hosting infrastructure while retaining all security controls and eliminating vendor lock-in,โ€ the company said, adding that Polaris Catalog inside Snowflakeโ€™s AI Data Cloud is currently in public preview.

Is Polaris Snowflakeโ€™s ticket to garnering community goodwill?

While experts such as Forresterโ€™s Chaurasia and dbInsightโ€™s Tony Baer think that Polaris Catalog is an extended strategy for the company to broaden its reach to acquire new customers, The Futurum Groupโ€™s research vice president Steven Dickens thinks it is a โ€œdesperateโ€ attempt to garner โ€œgoodwillโ€ from customers and the open source community.

The soon-to-be-open-sourced data catalog, according to Dickens, is a direct consequence of Snowflakeโ€™s shortcomings and limitations, including poor interoperability, vendor lock-in, exorbitant costs, lack of innovation, and dependency on partnerships.

โ€œSnowflake is notoriously expensive, and its cost structure has driven many customers to seek alternatives. Polaris can be seen as a last-ditch effort to retain customers by offering a potentially cheaper, open-source alternative,โ€ Dickens said.

Further, Dickens sees Snowflakeโ€™s move to open-source Polaris Catalog as a way to counter its โ€œslower, insular development paceโ€.

โ€œPolaris is an attempt to leverage external innovation to compensate for Snowflakeโ€™s internal stagnation,โ€ Dickens explained.

Polaris Catalog has open source rivals

Chaurasia and Dickens also pointed out that Polaris Catalog isnโ€™t the only open source data catalog available in the market.

โ€œThere are several other open-source projects in the dataย cataloguing and metadata management space, including Apache Atlas, Amundsen, and LinkedInโ€™s DataHub. Each provides capabilities for data discovery, governance, and metadata management,โ€ Chaurasia said.

While Apache Atlas is designed for governance and compliance within Apache Hadoop environments, offering scalable metadata management, lineage, and governance capabilities for Hadoop and associated big data technologies, Amundsen, originating from Lyft aims to enhance the productivity of data analysts, scientists, and engineers by indexing data resources (metadata) and facilitating the discovery and exploration of datasets based on usage and relevance.

Another alternative is LinkedInโ€™s DataHub, which provides real-time metadata architecture that supports various data systems and environments through pluggable integration.

โ€œIt focuses on metadata ingestion, indexing, data discovery, and governance,โ€ Chaurasia said, adding that Amundsen and DataHub have become popular due to their emphasis on user experience, support for multiple integrations (both real-time and batch), and data discovery capabilities in the wake of demand for efficient data management offerings.