Comparing Machine Learning as a Service: Amazon, Microsoft Azure, Google Cloud AI, IBM Watson
MMS • RSS
Article originally posted on Data Science Central. Visit Data Science Central
For most businesses, machine learning seems close to rocket science, appearing expensive and talent demanding. And, if you’re aiming at building another Netflix recommendation system, it really is. But the trend of making everything-as-a-service has affected this sophisticated sphere, too. You can jump-start an ML initiative without much investment, which would be the right move if you are new to data science and just want to grab the low hanging fruit.
One of ML’s most inspiring stories is the one about a Japanese farmer who decided to sort cucumbers automatically to help his parents with this painstaking operation. Unlike the stories that abound about large enterprises, the guy had neither expertise in machine learning, nor a big budget. But he did manage to get familiar with TensorFlow and employed deep learning to recognize different classes of cucumbers.
By using machine learning cloud services, you can start building your first working models, yielding valuable insights from predictions with a relatively small team. We’ve already discussed machine learning strategy. Now let’s have a look at the best machine learning platforms on the market and consider some of the infrastructural decisions to be made.
What is machine learning as a service
Machine learning as a service (MLaaS) is an umbrella definition of various cloud-based platforms that cover most infrastructure issues such as data pre-processing, model training, and model evaluation, with further prediction. Prediction results can be bridged with your internal IT infrastructure through REST APIs.
Amazon Machine Learning services, Azure Machine Learning, Google Cloud AI, and IBM Watson are four leading cloud MLaaS services that allow for fast model training and deployment. These should be considered first if you assemble a homegrown data science team out of available software engineers. Have a look at our data science team structures story to have a better idea of roles distribution.
Within this article, we’ll first give an overview of the main machine-learning-as-a-service platforms by Amazon, Google, Microsoft, and IBM, and will follow it by comparing machine learning APIs that these vendors support. Please note that this overview isn’t intended to provide exhaustive instructions on when and how to use these platforms, but rather what to look for before you start reading through their documentation.
Machine learning services for custom predictive analytics tasks
If you’re in the market for automation and drag-and-drop interface, first check Microsoft ML Studio. In terms of platforms for custom modeling, all four providers above suggest similar products
Predictive analytics with Amazon ML
Amazon Machine Learning services are available on two levels: predictive analytics with Amazon ML and the SageMaker tool for data scientists.
Amazon Machine Learning for predictive analytics is one of the most automated solutions on the market and the best fit for deadline-sensitive operations. The service can load data from multiple sources, including Amazon RDS, Amazon Redshift, CSV files, etc. All data preprocessing operations are performed automatically: The service identifies which fields are categorical and which are numerical, and it doesn’t ask a user to choose the methods of further data preprocessing (dimensionality reduction and whitening).
Prediction capacities of Amazon ML are limited to three options: binary classification, multiclass classification, and regression. That said, this Amazon ML service doesn’t support any unsupervised learning methods, and a user must select a target variable to label it in a training set. Also, a user isn’t required to know any machine learning methods because Amazon chooses them automatically after looking at the provided data.
This high automation level acts both as an advantage and disadvantage for Amazon ML use. If you need a fully automated yet limited solution, the service can match your expectations. If not, there’s SageMaker.
Amazon SageMaker and frameworks-based services
SageMaker is a machine learning environment that’s supposed to simplify the work of a fellow data scientist by providing tools for quick model building and deployment. For instance, it provides Jupyter, an authoring notebook, to simplify data exploration and analysis without server management hassle. Amazon also has built-in algorithms that are optimized for large datasets and computations in distributed systems. These include:
- Linear learner is a supervised method for classification and regression
- Factorization machines is for classification and regression designed for sparse datasets
- XGBoost is a supervised boosted trees algorithm that increases prediction accuracy in classification, regression, and ranking by combining the predictions of simpler algorithms
- Image classification is based on ResNet, which can also be applied for transfer learning
- Seq2seq is a supervised algorithm for predicting sequences (e.g. translating sentences, converting strings of words into shorter ones as a summary, etc.)
- K-means is an unsupervised learning method for clustering tasks
- Principal component analysis is used for dimensionality reduction
- Latent Dirichlet allocation is an unsupervised method used for finding categories in documents
- Neural topic model (NTM) is an unsupervised method that explores documents, reveals top ranking words, and defines the topics (users can’t predefine topics, but they can set the expected number of them)
- DeepAR forecasting is a supervised learning algorithm used for forecasting time series that employs recurrent neural networks (RNN)
- BlazingText is a natural language processing (NLP) algorithm built on the Word2vec basis, which allows it to map words in large collections of texts with vector representations
- Random Cut Forest is an anomaly detection unsupervised algorithm capable of assigning anomaly scores to each data point
Built-in SageMaker methods largely intersect with the ML APIs that Amazon suggests, but here it allows data scientists to play with them and use their own datasets.
If you don’t want to use these, you can add your own methods and run models via SageMaker leveraging its deployment features. Or you can integrate SageMaker with TensorFlow, Keras, Gluon, Caffe2, Torch, MXNet, and other machine learning libraries.
Generally, Amazon machine learning services provide enough freedom for both experienced data scientists and those who just need things done without digging deeper into dataset preparations and modeling. This would be a solid choice for companies that already use Amazon cloud services and don’t plan to transition to another cloud provider.
Microsoft Azure Machine Learning Studio
Azure Machine Learning platform is aimed at setting a powerful playground both for newcomers and experienced data scientists. The roster of Microsoft machine learning products is similar to the ones from Amazon, but Azure, as of today, seems more flexible in terms of out-of-the-box algorithms.
Services from Azure can be divided into two main categories: Azure Machine Learning Studio and Bot Service. Let’s find out what’s under the hood of Azure ML Studio. We’ll return to Bot Service in the section dedicated to specific APIs and tools.
ML Studio is the main MLaaS package to look at. Almost all operations in Azure ML Studio must be completed using a graphical drag-and-drop interface. This includes data exploration, preprocessing, choosing methods, and validating modeling results.
Approaching machine learning with Azure entails some learning curve. But it eventually leads to a deeper understanding of all major techniques in the field. The Azure ML graphical interface visualizes each step within the workflow and supports newcomers. Perhaps the main benefit of using Azure is the variety of algorithms available to play with. The Studio supports around 100 methods that address classification (binary+multiclass), anomaly detection, regression, recommendation, and text analysis. It’s worth mentioning that the platform has one clustering algorithm (K-means).
Another big part of Azure ML is Cortana Intelligence Gallery. It’s a collection of machine learning solutions provided by the community to be explored and reused by data scientists. The Azure product is a powerful tool for starting with machine learning and introducing its capabilities to new employees.
Microsoft Azure Machine Learning Services
September 2017 saw Microsoft introduce a new set of ML-focused products that received the umbrella name Azure Machine Learning Services. The release has been responsible for some confusion in the Azure developer community as it required engineers to choose between the two different platforms that can’t be cross-integrated. So, we asked Matt Winkler, a group program manager at Microsoft, who works with Azure AI products to give us some inside info about their platform:
“Azure ML Services is our next generation infrastructure for building and deploying models at scale, using any tool or framework. Azure ML Services provides end-to-end lifecycle management, keeping track of all of your experiments across your entire team, storing code, config, parameter settings, and environment details to make it easy to rank, search, and replicate any experiment done by your team. Once you have a model you like, you can easily encapsulate it in a container and deploy to Azure, on-prem, or IOT devices, and it’s easy to scale and manage as it is “just another container” that runs on Kubernetes. Azure ML Services makes it easy to start locally, in a Python editor or notebook of your choice, and then easily compute in Azure and scale up/out when you need it.” says Matt.
Basically, the services suggest a support environment to build models, experiment with them, and use a broad variety of open source components and frameworks. Unlike ML Studio, it doesn’t have built-in methods and requires custom model engineering. The platform is aimed at rather experienced data scientists to operate it. And if you have the right team, ML Services offers a powerful toolset to manage ML experiments, use popular frameworks like TensorFlow, scikit-learn, etc. (which isn’t available with ML Studio), and deploy models into production in a third-party service like Docker.
Let’s have a closer look at what the ML Services platform suggests.
Python packages. These proprietary packages have libraries and functions that aim at four main groups of tasks: computer vision, forecasting, text analysis, and hardware acceleration.
Experimentation. With any Python tools and frameworks, engineers can build different models, compare them, set the project to specific historic configuration, and continue development from any moment in history.
Model management. The tool provides an environment to host, version, manage, and monitor models that run on Azure, on-premises, or even Edge devices.
Workbench. This product is a convenient desktop and command-line environment with dashboards and evaluation tools to track model development.
Visual Studio Tools for AI. Basically, this extension adds tools to the VS IDE to work with deep learning and other AI products.
If you still don’t know whether you should stick with Azure ML Studio or ML Services, Matt Winkler suggests, “We think of them as two different capabilities of the same service – Azure Machine Learning – that serve different types of customers. People newer to the ML space, and those not familiar with coding (analysts, data folks, etc.) love the convenience of Azure ML Studio, whereas professional data scientists and AI developers who are comfortable with Python prefer the capabilities in Azure ML services.”
Google Prediction API
Google provides AI services on two levels: a machine learning engine for savvy data scientists and highly automated Google Prediction API. Unfortunately, Google Prediction API has been deprecated recently and Google has pulled the plug on April 30, 2018.
The Prediction API resembled Amazon ML. Its minimalistic approach narrowed down to solving two main issues: classification (both binary and multiclass) and regression. Trained models could be deployed through the REST API interface.
Google doesn’t disclose exactly which algorithms were utilized for drawing predictions and didn’t allow engineers to customize models. On the other hand, Google’s environment was the best fit for running machine learning within tight deadlines and the early launch of the ML initiative. But it seems that the product wasn’t nearly as popular as Google expected. It’s a shame that those who were using Prediction API will have to “recreate existing models” using other platforms as the end-of-life FAQ suggests.
So, what’s coming instead?
Google is testing Cloud AutoML. The product is currently in alpha, so it doesn’t even have documentation yet. The early pieces of information claim that AutoML will allow people without data science expertise to train models on their data in an automated manner. The first product to be launched will be AutoML vision capable of building custom image recognition models. In the longer run, Google expects to cover even more areas.
Google Cloud Machine Learning Engine
High automation of Prediction API was available at the cost of flexibility. Google ML Engine is the direct opposite. It caters to experienced data scientists, it’s very flexible, and it suggests using cloud infrastructure with TensorFlow as a machine learning driver. Additionally, Google is testing a number of other popular frameworks like XGBoost, scikit-leran, and Keras. So, ML Engine is pretty similar to SageMaker in principle.
TensorFlow is another Google product, which is an open source machine learning library of various data science tools rather than ML-as-a-service. It doesn’t have visual interface and the learning curve for TensorFlow would be quite steep. However, the library is also targeted at software engineers that plan transitioning to data science. Google TensorFlow is quite powerful, but aimed mostly at deep neural network tasks.
Basically, the combination of TensorFlow and Google Cloud service suggests infrastructure-as-a-service and platform-as-a-service solutions according to the three-tier model of cloud services. We talked about this concept in our whitepaper on digital transformation. Have a look, if you aren’t familiar with it.
IBM Watson Machine Learning Studio
IBM suggests a single machine learning platform both for experienced data scientists and newcomers in the sphere. Technically, the system offers two approaches: automated and manual – for expert practitioners. Similar to a deprecated Google Prediction API or still operational Amazon ML, Watson Studio has a model builder which brings to mind a fully automated data processing and model building interface that needs little to no training to start processing data, preparing models, and deploying them into production.
The automated part can solve three main types of tasks: binary classification, multiclass classification, and regression. You can choose either a fully automated approach or manually pick the ML method to be used. Currently, IBM has ten methods to cover these three groups of tasks:
- Logistic regression
- Decision tree classifier
- Random forest classifier
- Gradient boosted tree classifier
- Naive Bayes
- Linear regression
- Decision tree regressor
- Random forest regressor
- Gradient boosted tree regressor
- Isotonic regression
Separately, IBM offers deep neural network training workflow with flow editor interface similar to the one used in Azure ML Studio.
If you’re looking for advanced capabilities, IBM ML has notebooks such as Jupiter to program models manually using popular frameworks like TensorFlow, scikit-learn, PyTorch, and others.
To wrap up with machine learning as a service (MLaaS) platforms, it seems that Azure has currently the most versatile toolset on the MLaaS market. It covers the majority of ML-related tasks, provides two distinct products for building custom models, and has a solid set of APIs for those who don’t want to attack data science with their bare hands.
Machine learning APIs from Amazon, Microsoft, Google, and IBM comparison
Besides full-blown platforms, you can use high-level APIs. These are the services with trained models under the hood that you can feed your data into and get results. APIs don’t require machine learning expertise at all. Currently, the APIs from these four vendors can be broadly divided into three large groups:
1) text recognition, translation, and textual analysis
2) image + video recognition and related analysis
3) other, that includes specific uncategorized services