Powering MLOps with Anomaly Detection, Natural Language Processing, and Recommender Systems

Return to NUCSL Home Page

Makpar’s AI/ML and MLOps journey has been both difficult and rewarding . Like many other organizations the lack of a singular proven path to machine learning success left us unsure of how to take the next (or even the first) step on our journey. We decided to use the pressure tested templates, and lessons learned, of our DevSecOps journey as a lighthouse to guide us through this fog. This prior experience showed that business and technical teams must work together and share the same priorities. This required that the machine learning effort must be supported from the highest levels, with goals set by executive champions and an investment in the technology and the processes that enable success.

One of the largest cultural changes that our organization had to go through was becoming fault tolerant. Machine learning is an iterative process, one which can only succeed through constant experimentation (and often failure of those experiments). When measuring the results of machine learning efforts, the traditional “project ROI” viewpoint is reductive and can be detrimental to the initiative’s success. To combat this, we created a center of excellence that can rally the community and continue to push for new initiatives to avoid this.

It is common practice in industry for data scientists who developed a machine learning model to oversee deployment, refreshes, maintenance, and monitoring. As they develop more models, the amount of time spent on support and monitoring can become a significant overhead, limiting the new analyses their experience can be applied to. Through automation of these processes, data scientists can spend their time on adding new features to existing models and innovating to solve other business challenges.

Automated provisioning of cloud infrastructure can be a challenging process that requires you to perform manual actions, write custom scripts, maintain templates, or learn domain-specific languages. Additionally, though cloud computing is often cited as a cheaper alternative to traditional on-premises options, ML/analytics solutions in particular can be a heavy drain on resources. Ameliorating this cost requires solutions designed intentionally with cloud consumption considerations in mind.

This still leaves one bottleneck in the development lifecycle of ML (Machine Learning) projects, the siloed functions of data scientists and MLOps engineers. To break down these siloes, and avoid excessive training/spin up cycles, the skill level required by MLOps must be simplified. Once accomplished, we can put the power of provisioning/deployment into the hands of the ML developer. Enter MLOps powered by machine learning.
Our aim with this project is threefold, use anomaly detection to automate common MLOps tasks and suggest system improvements, natural language processing to determine development requirements from plain English requests, and using a recommender system to suggest a best fit from our IAC cookbook of optimized MLOps configurations. The machine learning components of this system will be layered over our existing MLOps orchestrator patterns(based off AWS open source repo here: https://github.com/aws-solutions/mlops-workload-orchestrator).

Risks

Data Availability
Right Data Models
Achieving Pipeline Independence

Rationale

For government agencies to reach the full potential of the benefits machine learning represents to all forms of industry, they require a diverse and complex set of skill sets. To further complicate things, these skill sets are often specialized with little crossover. This project aims to simplify/automate all aspects of the machine learning development lifecycle that is not development of new models, or addition of new features to an existing system. Our aim is to augment the productivity of the government's existing machine learning talent, so that they can quickly launch new pilot projects as well as easily scale up existing projects, without sacrificing quality.

As-Is

We are currently in the design, data collection, and data grooming stage.

To-Be

This solution is built with three primary components:

The orchestrator component, created by deploying the solution through Terraform to desired cloud provider.
The machine learning models layered over the orchestrator. These models will perform the monitoring, cloud infrastructure provisioning/configuration recommendations, and cost optimization that would typically be required of the ML developer.
The pipelines deployed from either calling the solution’s API, or by committing a configuration file into a code repository.

Business Process Model

Use Case Business Process Model

Date

Dec 2022

Organization

Makpar

Life Cycle

Ideation

Organization Type

Industry

Vertical Market

Cloud

Data

IT Modernization

Technology

Artificial Intelligence/Machine Learning

Cloud

SecDevOps