Microsoft Releases SynapseML 0.1.0 with .NET and Cognitive Services Support

MMS Founder
MMS Edin Kapic

Article originally posted on InfoQ. Visit InfoQ

Microsoft announced the first .NET-compatible version of SynapseML, a new machine learning (ML) library for Apache Spark distributed processing platform. Version 0.1.0 of the SynapseML library adds support for .NET bindings, allowing .NET developers to write ML pipelines in their preferred language.

SynapseML, formerly known as MMLSpark (Microsoft Machine Learning for Apache Spark), is a library that integrates several ML algorithms into a coherent API for building heterogeneous machine learning solutions that will be running on top of the Apache Spark platform. For example, it allows developers to leverage other powerful libraries such as OpenCV (for computer vision), VowpalWabbit (fast reinforcement learning algorithm), or LightGBM (for decision-tree learning models). It is an ongoing initiative for Microsoft, outlined by a paper published in 2019 by Mark Hamilton, one of Microsoft’s engineers behind the SynapseML project.

SynapseML runs on Apache Spark and requires Java installation, as Spark uses the JVM to run Scala. However, it has bindings for other languages such as Python or R. The current 0.10.0 version adds bindings for .NET languages.

.NET support for SynapseML runs on top of .NET for Apache Spark library. It is contained in a set of SynapseML NuGet packages. The packages haven’t been published to the main NuGet feed, and their source has to be added manually. Once installed, SynapseML API is then available to be called from .NET applications. 

The following code fragment illustrates how SynapseML API can be called from a C# application.

// Create LightGBMClassifier
var lightGBMClassifier = new LightGBMClassifier()
    .SetFeaturesCol("features")
    .SetRawPredictionCol("rawPrediction")
    .SetObjective("binary")
    .SetNumLeaves(30)
    .SetNumIterations(200)
    .SetLabelCol("label")
    .SetLeafPredictionCol("leafPrediction")
    .SetFeaturesShapCol("featuresShap");

// Fit the model
var lightGBMClassificationModel = lightGBMClassifier.Fit(trainDf);

// Apply transformation and displayresults
lightGBMClassificationModel.Transform(testDf).Show(50);

SynapseML allows developers to call other services in their pipeline. The library has support for Microsoft’s own Cognitive Services, a set of general-purpose AI services powered by models trained by Microsoft. In addition, the current release of SynapseML permits developers to leverage pre-trained OpenAI models in their solutions, such as GPT-3 for natural language understanding and generation and Codex for code generation. The use of OpenAI models currently requires access to Azure OpenAI Service.

Finally, the current version adds support for MLflow, a platform to manage the ML lifecycle. Developers can use it to load and save models and to log messages during the execution of the models.

It is conspicuous that Microsoft has renamed the SynapseML library to match the name of their existing Azure Synapse Analytics, which according to the product description is “a unified experience to ingest, explore, prepare, transform, manage, and serve data for immediate BI and machine learning needs.”

SynapseML joins the community of .NET machine-learning libraries:

  • ML.NET is a .NET library for running single-machine workloads using .NET languages.
  • Microsoft Cognitive Toolkit (CNTK) is a Microsoft ML library that stopped being actively developed last month. It also has a .NET API.
  • Accord.NET is an ML library for .NET geared towards vision and audio processing.
  • Other popular general ML libraries have .NET versions:

In the .NET community, there is confusion among developers about how all these libraries compare against each other or whether they replace one another. SynapseML project members appear to actively answer those questions on Reddit.

About the Author

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.