ML.NET 2.0 improves text classification

Microsoft has introduced ML.NET 2.0, a new version of its open source, cross-platform machine learning framework for .NET. The upgrade includes text classification and automatic machine learning capabilities.

Unveiled on November 10, ML.NET 2.0 is coupled with a new version of ML.NET Model Builder, a visual developer tool for building machine learning models for .NET applications. Model Builder introduces a text classification scenario powered by the ML.NET text classification API.

Previewed in June, the Text Classification API enables developers to train custom models to classify raw text data. The Text Classification API uses a pre-trained TorchSharp NAS-BERT model from Microsoft Research and the developer’s own data to fine-tune the model. Model Builder scenarios support local training on CPUs or CUDA-compatible GPUs.

Also in ML.NET 2.0:

  • Binary classification, multiclass classification, and regression models using preconfigured automated machine learning pipelines make it easy to get started using machine learning.
  • Data preprocessing can be automated using the AutoML featurer.
  • Developers can choose which trainers to use as part of the training process. They can also choose the tuning algorithms used to find the optimal hyperparameters.
  • Advanced AutoML training options are introduced to select the trainer and choose an evaluation metric to optimize.
  • A sentence similarity API, using the same underlying TorchSharp NAS-BERT model, computes a numeric value representing the similarity of two sentences.

Future plans for ML.NET include expanding the coverage of deep learning and emphasizing the use of the LightBGM framework for classical machine learning tasks such as regression and classification. The developers behind ML.NET also intend to enhance the AutoML API to enable new scenarios and customizations and simplify machine learning workflows.

Copyright © 2022 IDG Communications, Inc.