MMS • Doris Xin
Article originally posted on InfoQ. Visit InfoQ
Key Takeaways
- Automating the ML pipeline could put machine learning into the hands of more people, including those who do not have years of data science training and a data center at their disposal. Unfortunately, true end-to-end automation is not yet available.
- AutoML is currently most frequently used in hyperparameter tuning and model selection. The failure of AutoML to be end-to-end can actually cut into the efficiency improvements; moving aspects of the project between platforms and processes is time-consuming, risks mistakes, and takes up the mindshare of the operator.
- AutoML systems should seek to balance human control with automation: focusing on the best opportunities for automation and allowing for humans to perform the parts that are most suited for humans. By combining the manual and automated strategies, practitioners can increase confidence and trust in the final product.
- The collaboration between humans and automation tooling can be better than either one working alone. By combining the manual and automated strategies, practitioners can increase confidence and trust in the final product.
- ML practitioners are the most open to the automation of the mechanical engineering tasks, such as data pipeline building, model serving, and model monitoring, involved in ML operations (MLOps). The focus of AutoML should be on these tasks since they are high-leverage and their automation has a much lower chance of introducing bias.
The current conversation about automated machine learning (AutoML) is a blend of hope and frustration.
Automation to improve machine learning projects comes from a noble goal. By streamlining development, ML projects can be put in the hands of more people, including those who do not have years of data science training and a data center at their disposal.
End-to-end automation, while it may be promised by some providers, is not available yet. There are capabilities in AutoML, particularly in modeling tasks, that practitioners from novice to advanced data scientists are using today to enhance their work. However, seeing how automation and human operators interact, it may never be optimal for most data science projects to completely automate the process. In addition, AutoML tools struggle to address fairness, explainability, and bias—arguably the central issue in artificial intelligence (AI) and ML today. AutoML systems should emphasize ways to decrease the amount of time data engineers and ML practitioners spend on mechanistic tasks and interact well with practitioners of different skill levels.
Along with my colleagues Eva Yiwei Wu, Doris Jung-Lin Lee, Niloufar Salehi, and Aditya Parameswaran, I presented a paper at the ACM CHI Conference on Human Factors in Computer Systems studying how AutoML fits into machine workflows. The paper details an in-depth qualitative study of sixteen AutoML users ranging from hobbyists to industry researchers across a diverse set of domains.
Questions asked aimed to uncover their experiences working with and without AutoML tools and their perceptions of AutoML tools so that we could ultimately gain a better understanding of AutoML user needs; AutoML tool strengths / weaknesses; and insights into the respective roles of the human developer and automation in ML development.
The many faces of AutoML
The idea of automating some elements of a machine learning project is sensible. Many data scientists have had the dream of democratizing data science through an easy-to-use process that puts those capabilities in more hands. Today’s AutoML solutions fall into three categories: cloud provider solutions, commercial platforms, and open source projects.
Cloud provider solutions
The hosted compute resources of Google Cloud AutoML, Microsoft Azure Automated ML, and Amazon SageMaker Autopilot provide all the familiar benefits of cloud solutions: integration with the cloud data ecosystem for more end-to-end support, pay-as-you-go pricing models, low barrier to entry, and no infrastructure concerns. They also bring with them common cloud service frustrations a project may face: lack of customizability, vendor lock-in, and an opaque process. They all offer no-code solutions that appeal to the citizen scientist or innovator. But for the more advanced user, the lack of transparency is problematic for many scenarios; as of this writing, only Amazon SageMaker Autopilot allows the user to export code and intermediate results. Many users find themselves frustrated by the lack of control over model selection and limited visibility into the models utilized.
It is a familiar story with the Google, Amazon, and Microsoft solutions in other categories; cloud products seek to provide as much of an end-to-end solution as possible and an easy-to-use experience. The design tradeoff is low visibility and less customization options. For example, Google AutoML only provides the final model; it fails to yield the training code or any intermediate results during training. Microsoft Azure gives much more information about the training process itself, but it does not give the code for the training models.
Commercial platforms
The most frequently discussed commercial AutoML products are DataRobot and H20 Driverless AI. The companies behind these products are seeking to provide end-to-end AI and ML platforms for business users as well as data scientists, particularly those using on-premises compute. They focus on operationalizing the model: launching the application or data results for use.
Open source projects
There are a lot of open source projects in the machine learning space. Some of the more well-known are Auto-sklearn, AutoKeras, and TPOT. The open source process allows for the best of data science and development to come together. However, they are often lacking in post-processing deployment help that is a primary focus of commercial platforms.
What AutoML brings to a project
Those we surveyed had frustrations, but they were still able to use AutoML tools to make their work more fruitful. In particular, AutoML tools are used in the modeling tasks. Although most AutoML providers stated that their product is “end-to-end,” the pre-processing and post-processing phases are not greatly impacted by AutoML.
The data collection, data tagging, and data wrangling of pre-processing are still tedious, manual processes. There are utilities that provide some time savings and aid in simple feature engineering, but overall, most practitioners do not make use of AutoML as they prepare data.
In post-processing, AutoML offerings have some deployment capabilities. But Deployment is famously a problematic interaction between MLOps and DevOps in need of automation. Take for example one of the most common post-processing tasks: generating reports and sharing results. While cloud-hosted AutoML tools are able to auto-generate reports and visualizations, our findings show that users are still adopting manual approaches to modify default reports. The second most common post-processing task is deploying models. Automated deployment was only afforded to users of hosted AutoML tools and limitations still existed for security or end user experience considerations.
The failure of AutoML to be end-to-end can actually cut into the efficiency improvements; moving aspects of the project between platforms and processes is time-consuming, risks mistakes, and takes up the mindshare of the operator.
AutoML is most frequently and enthusiastically used in hyperparameter tuning and model selection.
Hyperparameter tuning and model selection
When possible configurations quickly explode to billions, as they do in hyperparameters for many projects, automation is a welcome aid. AutoML tools can try possibilities and score them to accelerate the process and improve outcomes. This was the first AutoML feature to become available. Perhaps its maturity is why it is so popular. Eleven of the 16 users we interviewed used AutoML hyperparameter-tuning capabilities.
Improving key foundations of the modeling phase saves a lot of time. Users did find that the AutoML solution could not be completely left alone to the task. This is a theme with AutoML features; knowledge gained from the human’s prior experience and the ability to understand context can, if the user interface allows, eliminate some dead ends. By limiting the scope of the AutoML process, a human significantly reduces the cost of the step.
AutoML assistance to model selection can aid the experienced ML practitioner in overcoming their own assumptions by quickly testing other models. One practitioner in our survey called the AutoML model selection process “the no assumptions approach” that overcame their own familiarity and preferences.
AutoML for model selection is a more efficient way of developing and testing models. It also improves, though perhaps not dramatically, the effectiveness of model development.
The potential drawback is that, left unchecked, the model selection is very resource heavy – encompassing ingesting, transforming, and validating data distributions data. It is essential to identify models that are not viable for deployment or valuable compute resources are wasted and can thus lead to crashing pipelines. Many AutoML tools don’t understand the resources available to them which causes system failures. The human operator can pare down the possible models. But this practice does fly in the face of the idea of overcoming operator assumptions. The scientist or operator has an important role to play discerning what experiments are worth providing the AutoML system and what are a waste of time.
Informing manual development
Despite any marketing or names that suggest an AutoML platform is automatic, that is not what users experience. There is nothing entirely push-button, and the human operator is still called upon to make decisions. However, the tool and the human working together can be better than either one working alone.
By combining the manual and automated strategies, practitioners can increase confidence and trust in the final product. Practitioners shared that running the automated process validated that they were on the right track. They also like to use it for rapid prototyping. It works the other way as well; by performing a manual process in parallel with the AutoML process, practitioners have built trust in their AutoML pipeline by showing that its results are akin to a manual effort.
Reproducibility
Data scientists do tend to be somewhat artisanal. Since an automated process can be reproduced and can standardize code, organizations can have a more enduring repository of ML and AI projects even as staff turns over. New models can be created more rapidly, and teams can compare models in an apples-to-apples format.
In context: The big questions of responsible AI
At this point in the growth of AI, all practitioners should be considering bias and fairness. It is an ethical and social impetus. Documenting efforts to minimize bias is important in building trust with clients, users, and the public.
The key concept of explainability is how we as practitioners earn user trust. While AutoML providers are making efforts of their own to build in anti-bias process and explainability, there is a long way to go. They have not effectively addressed explainability. Being able to reveal the model and identify the source of suspect outcomes helps build trust and, should there be a problem, identify a repair. Simply having a human involved does not assure that there is no bias. However, without visibility, the likelihood of bias grows. Transparency mechanisms like visualization increase user understanding and trust in AutoML, but it does not suffice for trust and understanding between humans and the tool-built model. Humans need agency to establish the level of understanding to trust AutoML.
In our interviews, we discovered that practitioners switch to completely manual development to have the user agency, interpretability, and trust expected for projects.
Human-automation partnership
A contemporary end-to-end AutoML system is by necessity highly complex. Its users come from different backgrounds and have diverse goals but are all likely doing difficult work under a lot of pressure. AutoML can be a significant force multiplier for ML projects, as we learned in our interviews.
While the term “automated ML” might imply a push-button, one-stop-shop nature, ML practitioners are valuable contributors and supervisors to AutoML. Humans improve the efficiency, effectiveness and safety of AutoML tools by instructing, advising and safeguarding them. With this in mind, the research study also identified guidelines to consider for successful human-automation partnerships:
- Focus on automating the parts of the process that consume valuable data engineering time. Currently there are only 2.5 data engineer job applicants for every data engineer position (compare this to 4.76 candidates per data scientist job listing or 10.8 candidates per web developer job listing according to DICE 2020 Tech Job Report); there is much demand and little supply for this skill set.
- Establish trust with users as well as agency for those users. When AutoML systems deprive users of visibility and control, those systems cannot be a part of real-world uses where they can have an impact.
- Develop user experience and user interfaces that adapt for user proficiency. The trend for AutoML systems is to develop more capabilities for novice and low-code users. Although some users have the ability to export raw model-training code, it is often a distraction from their workflow, leading to wasted time exploring low-impact code. Further, when novices are afforded the option to see raw model-training code it requires them to self-select the appropriate modality (information and customizability provided to the user). Matching expert users with the “novice modality” causes frustration due to lack of control. In the other direction, novices get distracted by things they don’t understand.
- Provide an interactive, multi-touch exploration for more advanced users. Iteration is the habitual and necessary work style of ML practitioners, and the tool should aid in the process rather than interrupt it.
- Let the platform be a translator and shared truth for varied stakeholders. The conversations between business users and ML practitioners would greatly benefit from a platform and shared vocabulary. Providing a trusted space for conversation would remove that burden that frequently falls to the ML engineers.
AutoML systems should seek to balance human control with automation: focusing on the best opportunities for automation and allowing for humans to perform the parts that are most suited for humans: using contextual awareness and domain expertise.
Is AutoML ready?
As a collection of tools, AutoML capabilities have proven value but need to be vetted more thoroughly. Segments of these products can help the data scientist improve outcomes and efficiencies. AutoML can provide rapid prototyping. Citizen scientists, innovators, and organizations with fewer resources can employ machine learning. However, there is not an effective end-to-end AutoML solution available now.
Are we focusing on the right things in the automation of ML today? Should it be hyperparameter tuning or data engineering? Are the expectations appropriate?
In our survey, we found that practitioners are much more open to automating the mechanical parts of ML development and deployment to accelerate the iteration speed for ML models. However, AutoML systems seem to focus on the processes over which the users want to have the most agency. The issues around bias in particular suggest that AutoML providers and practitioners should be aware of the deficiencies of AutoML for model development.
The tasks that are more amenable to automation are involved in machine learning operations (MLOps). They include tasks such as data pipeline building, model serving, and model monitoring. What’s different about these tasks is that they help the users build models rather than remove the user from the model building process (which, as we noted earlier, is bad for transparency and bias reasons).
MLOps is a nascent field that requires repetitive engineering work and uses tooling and procedures that are relatively ill-defined. With a talent shortage for data engineering (on average, it takes 18 months to fill a job position), MLOps represents a tremendous opportunity for automation and standardization.
So rather than focus on AutoML, the future of automation for ML and AI rests in the ability for us to realize the potential of AutoMLOps.