MMS • Robert Krzaczynski
Article originally posted on InfoQ. Visit InfoQ
Microsoft recently published information about adding the Sentence Similarity scenario in Model Builder. This scenario allows the training of custom sentence similarity models. Together with the addition of this scenario to the Model Builder, it is no longer necessary to install the Model Builder GPU extension. Moreover, Microsoft informs about work in the coming months on developments in the areas of deep learning, the LightGBM algorithm, and AutoML.
A few months ago, Microsoft released a preview version of the Sentence Similarity API, which provides a way to train a sentence similarity-based machine learning model using custom data. This is achieved by integrating TorchSharp‘s NAS-BERT implementation with ML.NET. This is the same basic transformer-based model used by the Text Classification API. Applying a pre-trained version of this model, the sentence similarity API leverages custom data to fine-tune the model.
In order to start using this scenario, Model Builder must be installed or upgraded to the latest version 16.14.4. The Sentence Similarity Scenario supports local training on both CPU and GPU. For GPUs, a CUDA-compatible GPU is required. More information on configuring the GPU is available in the ML.NET GPU guide.
It is no longer necessary to install the GPU extension from version 16.14.4 of Model Builder. In previous versions, in order to support the GPU in Model Builder, in addition to meeting the hardware requirements and installing the appropriate drivers, it was required to install the GPU extension of Model Builder.
The addition of this scenario received positive feedback from the community. For instance, a Reddit user wrote that this is a solution that he has been working on for some time and will apply this scenario in his project.
Microsoft additionally informed about development plans for machine learning solutions in the coming months. First of all, it will be expanding a deep learning scenario. This scope includes scenario APIs such as text classification and sentence similarity for object detection, question answering, and named entity recognition. Another point relates to updating the version of LightGBM supported in ML.NET and improving interoperability by allowing LightGBM models to be loaded in their native format. There are also further improvements planned for the AutoML API over the next year to enable new scenarios and customizations to simplify machine learning workflows.
The entire list of changes is available in the Model Builder release notes.