Now in public preview, Snowpark Connect promises to reduce latency and complexity by moving analytics workloads where the data is.
Snowflake is preparing to run Apache Spark analytics workloads directly on its infrastructure, saving enterprises the trouble of hosting an Apache Spark instance elsewhere, and eliminating data transfer delays between it and the Snowflake Data Cloud.
The new offering will rely on a feature of Apache Spark introduced in version 3.4, Spark Connect, which enables usersโ code to run separately from the Apache Spark cluster doing the hard work.
[ Related: More Snowflake news and insights ]
With Spark Connect, โYour application, whether itโs a Python script or a data notebook, simply sends the unresolved logical plan to a remote Spark cluster. The cluster does all the heavy lifting and sends back the results,โ Snowflake explained in a blog post.
Snowpark Connect for Apache Spark is Snowflakeโs take on that technology, enabling enterprises to run their Spark code on Snowflakeโs vectorized engine in Data Cloud.
For Snowflake customers, the new capability will make it easy for developers to move their code to Snowpark, essentially offering a combination of Sparkโs familiarity and Snowflakeโs simplicity, according to Sanjeev Mohan, chief analyst at SanjMo.
Snowpark Connect for Spark will also help enterprises lower their total cost of ownership as developers can take advantage of Snowflakeโs serverless engine and no longer have to tune Spark, Mohan said.
Other benefits include faster processing due to Snowflakeโs vectorized engine and bypassing challenges such as finding staff with Spark expertise, he said.
Everest Group senior analyst Shubham Yadav sees the launch as timely as enterprises are seeking to simplify infrastructure and lower costs amid growing AI and ML adoption.
Risk of confusion
Enterprises should take care not to confuse Snowpark Connect for Apache Spark with Snowflakeโs existing Snowflake Connector for Spark.
If the Connector is like a bridge connecting two cities, allowing data to travel back and forth, Snowpark Connect is more like relocating the entire Spark city into Snowflake.
That leads to limitations for Snowflake Connector, with the movement of data across systems adding to latency and cost due to Spark jobs running outside Snowflake.
Migrating from the existing Connector to the newer Snowpark Connect for Apache Spark can be done without needing to convert any code, Snowflake said.
The new capability is in public preview, and works with Spark 3.5.
Rival Databricks offers similar capabilities via its Databricks Connect offering.


