With its plan to make its Polaris data catalog open source, Snowflake hopes the new offering will be seen as vendor-neutral, boosting its attractiveness when compared to Databricksโ Unity Catalog.
Snowflake says it will open up the source code to its new Polaris Catalog, a strategy that suggests it wants to lure data catalog users away from rival Databricksโ Unity Catalog while bolstering the attractiveness of its own offering, analysts said.
โThe move to launch Polaris Catalog provides a competitive response to Databricksโs Unity Catalog, thereby enhancing Snowflakeโs value proposition, attracting a broader range of customers, and fostering a vibrant community around the new data catalog,โ said Jayesh Chaurasia, analyst at research and advisory services firm Forrester.
How Polaris Catalog is different from Databricksโ Unity Catalog
Databricksโ Unity Catalog, which was made generally available in June 2022 and was later updated with Okeraโs capabilities the following year, is a closed-sourced unified governance offering that provides centralized access control, auditing, lineage, and data discovery capabilities across Databricks workspaces.
Polaris Catalog, released during Snowflakeโs annual conference this week, offers similar capabilities to Unity Catalog, but is built atop the popular open source Apache Iceberg data table format.
โWith Polaris Catalog, users now gain a single, centralized place for any engine to find and access an organizationโs Iceberg tables with consistent security and full, open interoperability,โ Snowflake said in a statement, adding that Polaris Catalog relies on Icebergโs open source REST protocol, which provides an open standard for users to access and retrieve data from any engine that supports the Iceberg Rest API, including Apache Flink, Apache Spark, Dremio, Python, and Trino among others.
The complexity and diversity of data systems, coupled with the universal desire of organizations to leverage AI, necessitates the use of an interoperable data catalog, which is likely to be open source in nature, according to Chaurasia.
โAn open-source data catalog addresses interoperability and other needs, such as scalability, especially if it is built on top of a popular table format as Iceberg. This approach facilitates data management across various platforms and cloud environments,โ Chaurasia said.
Separately, market research firm IDCโs research vice president Stewart Bond pointed out that Polaris Catalog may have leveraged Apache Icebergโs native Iceberg Catalogs and added enterprise-grade capabilities to it, such as managing multiple distributed instances of Iceberg repositories, providing data lineage, search capability for data utilities, and data description capabilities among others.
Polaris Catalog, which Snowflake expects to open source in the next 90 days, can be either be hosted in its proprietary AI Data Cloud or can be self-hosted in an enterpriseโs own infrastructure using containers such as Docker or Kubernetes.
ย โSince Polaris Catalogโs backend implementation will be open source, organizations can freely swap the hosting infrastructure while retaining all security controls and eliminating vendor lock-in,โ the company said, adding that Polaris Catalog inside Snowflakeโs AI Data Cloud is currently in public preview.
Is Polaris Snowflakeโs ticket to garnering community goodwill?
While experts such as Forresterโs Chaurasia and dbInsightโs Tony Baer think that Polaris Catalog is an extended strategy for the company to broaden its reach to acquire new customers, The Futurum Groupโs research vice president Steven Dickens thinks it is a โdesperateโ attempt to garner โgoodwillโ from customers and the open source community.
The soon-to-be-open-sourced data catalog, according to Dickens, is a direct consequence of Snowflakeโs shortcomings and limitations, including poor interoperability, vendor lock-in, exorbitant costs, lack of innovation, and dependency on partnerships.
โSnowflake is notoriously expensive, and its cost structure has driven many customers to seek alternatives. Polaris can be seen as a last-ditch effort to retain customers by offering a potentially cheaper, open-source alternative,โ Dickens said.
Further, Dickens sees Snowflakeโs move to open-source Polaris Catalog as a way to counter its โslower, insular development paceโ.
โPolaris is an attempt to leverage external innovation to compensate for Snowflakeโs internal stagnation,โ Dickens explained.
Polaris Catalog has open source rivals
Chaurasia and Dickens also pointed out that Polaris Catalog isnโt the only open source data catalog available in the market.
โThere are several other open-source projects in the dataย cataloguing and metadata management space, including Apache Atlas, Amundsen, and LinkedInโs DataHub. Each provides capabilities for data discovery, governance, and metadata management,โ Chaurasia said.
While Apache Atlas is designed for governance and compliance within Apache Hadoop environments, offering scalable metadata management, lineage, and governance capabilities for Hadoop and associated big data technologies, Amundsen, originating from Lyft aims to enhance the productivity of data analysts, scientists, and engineers by indexing data resources (metadata) and facilitating the discovery and exploration of datasets based on usage and relevance.
Another alternative is LinkedInโs DataHub, which provides real-time metadata architecture that supports various data systems and environments through pluggable integration.
โIt focuses on metadata ingestion, indexing, data discovery, and governance,โ Chaurasia said, adding that Amundsen and DataHub have become popular due to their emphasis on user experience, support for multiple integrations (both real-time and batch), and data discovery capabilities in the wake of demand for efficient data management offerings.


