Databricks framework for managing machine learning projects will go to an open governance model
Databricks, the company behind the commercial development of Apache Spark, is placing its machine learning lifecycle project MLflow under the stewardship of the Linux Foundation.
MLflow provides a programmatic way to deal with all the pieces of a machine learning project through all its phases โ construction, training, fine-tuning, deployment, management, and revision. Itย tracks and manages the the datasets, model instances, model parameters, and algorithms used in machine learning projects, so they can be versioned, stored in a central repository, and repackaged easily for reuse by other data scientists.
MLflowโs source is already availableย under the Apache 2.0 license, so this isnโt about open sourcing a previously proprietary project. Instead, itโs about giving the project โa vendor neutral home with an open governance model,โ according to Databricksโs press release.
Projects for managing entire machine learning pipelines have taken shape over the past couple of years, providing single overarching tools for governing what is typically a sprawling and complex process involving multiple moving parts. Among them is a Google project, Tensorflow Extended, but better known isย its descendent project Kubeflow, which uses Kubernetes to manage machine learning pipelines.
MLflow differs from Kubeflow in several key ways. For one, it doesnโt require Kubernetes as a component; it runs on local machines by way of simple Python scripts, or in Databricksโs hosted environment. And while Kubeflow focuses on TensorFlow and PyTorch as its learning systems, MLflow is agnostic โ it can work with models from those frameworksย and many others.ย


