Paul Krill
Editor at Large

AWS Glue upgrades Spark engines, backs Ray framework

news
Nov 29, 20222 mins
Amazon Web ServicesApache SparkPython

Serverless data integration service in the Amazon cloud also adds support for built-in Pandas APIs and the Apache Hudi, Apache Iceberg, and Delta Lake formats.

Cyber space, digital lines, data grid
Credit: blackdovfx

AWS Glue, a serverless data integration service provided by Amazon Web Services, showcases Python and Apache Spark capabilities in a version 4.0 release introduced this week.

The upgrade adds engines for Python 3.10 and Apache Spark 3.3.0. Both engines include performance enhancements and bug fixes, with Spark offering capabilities such as row-level runtime filtering and improved error messages.

New engine plugins in Glue 4.0 support the Ray compute framework, the Cloud Shuffle Service for Spark, and Adaptive Query Execution. Support for the Pandas data analysis and manipulation tool, built on top of Python, also is featured. New data format support covers Apache Hudi, Apache Iceberg, and Delta Lake. Glue 4.0 also includes the Parquet vectorized reader, with support for additional encodings and data types.

AWS Glue provides data discovery, data preparation, data transformation, and data integration capabilities, with autoscaling based on workload size. AWS said Glue also now offers visual transforms for customers to use and share business-specific ETL logic among teams.

AWS announced a preview of AWS Glue for Ray as a new engine option. Data engineers can use AWS Glue for Ray to process large data sets with Python and popular Python libraries. Distributed processing of Python code is done over multi-node clusters.

Glue 4.0 is available now in several AWS regions of the US including Ohio, Northern Virginia, and Northern California.

Paul Krill

Paul Krill is editor at large at InfoWorld. Paul has been covering computer technology as a news and feature reporter for more than 35 years, including 30 years at InfoWorld. He has specialized in coverage of software development tools and technologies since the 1990s, and he continues to lead InfoWorldโ€™s news coverage of software development platforms including Java and .NET and programming languages including JavaScript, TypeScript, PHP, Python, Ruby, Rust, and Go. Long trusted as a reporter who prioritizes accuracy, integrity, and the best interests of readers, Paul is sought out by technology companies and industry organizations who want to reach InfoWorldโ€™s audience of software developers and other information technology professionals. Paul has won a โ€œBest Technology News Coverageโ€ award from IDG.

More from this author