Last week I gave a talk to a packed Azure User Group on operationlizing machine learning. It's a subject I've spent a lot of time thinking about for the last couple of years. At Elastacloud I've been looking at solutionizing for customers on this for some time now so it was fairly timely that this was released Ignite.
For those of you that don't know there are some new services in Azure which together represent how to build a model locally and push into production. Before talking about the services themselves let's review first why we need this on the Microsoft stack.
Models are trained and then continually retrained based on new data. AzureML Microsoft's machine learning framework trained a model but took ages. This is because training works across a lot of data sometimes samples, sometimes population but inevitably takes a long time unless you have a lot of resources. Apache Spark deals with the scale proposition of training very well by distributing the workload, however, most machine learning is normally built using Python or R. The former now has a new library revoscalepy which is made to scale across infrastructure.
The other part of machine learning operations is scoring, or the ability to provide a production based on a trained model from a new set of data inputs or features. AzureML did both things but the second one it did better than many other frameworks.
Microsoft decided to split up the responsibilities of these things and provide the Batch AI framework for training data and Microsoft Model Experimentation and Model Management to push trained models into production so that they could be versioned and scored. As part of this initiative a new command line plugin "az ml" has been written which used the APIs of the two new services and also a GUI tool called Workbench which allows models to be run and tested and the results to be evaluated before they are put into production.
A key part of workbench is the addition of a data flow and data preparation tool which allows datasets to be altered using scripts and inbuilt transformers and inspectors. This creates a dataprep package which can be scaled and parallelised when a model is trained and scored in production.
There are several remote and local modes of operation for these new services.
- Local Python
- Local Docker
- Remote Kubernetes cluster
- HDInsight
The first two will require local scripts to push to such as the use of Python's sci-kit learn whereas the third can use revoscalepy or another distributed python framework if distribution is needed (it probably isn't for scoring) and the third will use Spark's mllib or Microsoft's mmlspark .