I recently had the opportunity to spend some time speaking with Maxim Lukiyanov, Principal Program Manager at Microsoft’s AzureML team. I asked him what exciting things he’d been working on and he gave me an in-depth overview of the new Notebook VMs in the Azure ML Workspace service. I previously touched on these and how they share a heritage with the Data Science VMs and give Data Scientists a simple, standard and manageable way to provision compute and access the resources of their ML workspace.
Even though the VMs are still in preview, he mentioned the significant investment going into them and touched on how it was possible to boost the development experience for a python based Data Scientist by using one of my favourite recent Microsoft announcements, Visual Studio Code Remote SSH. This provides people with the ability to develop against remote machines and use the VS Code features. I had to try it myself because that this represents another level of tooling for Data Scientists; the combination of a leading editor like VS Code and interoperability with Jupyter. Here are the three steps to using Visual Studio Code to create a great environment for your ML workspace. If code's not your thing, keep scrolling for a transcript of my interview with Maxim:
1. Creating the Notebook VM
Getting started is really easy with Azure ML Notebook VMs; you go into the Azure portal, access your Azure ML Workspace and create a new “Notebook VMs”. This provisions the Azure VM and installs all of the required tooling, configuring Jupyter and some samples. It also includes some Quickstarts to let you get going. When I did this, it took 14 minutes from creating the new Azure ML Workspace to having a running Jupyter notebook completing quickstart. Here are the three steps to using Visual Studio Code to create a better Data Science environment:
2. Configuring the connection securely
Once the VM is up and running you can use it as a Jupyter host, but if you want to try out the Visual Studio Code integration, you would start by setting up the VM that you created in the Azure Portal as a SSH host by configuring OpenSSH.
Back in the Azure portal, you can find the private key for the VM and information such as its IP address, port and username. Copy these down and create them in a certain place locally, and you’re pretty much ready.
For reference, you copy the private key to a file in ~/.ssh/ and then add an entry into ~/.ssh/config with the format:
Host [what I want to call my VM as host name]
HostName [ip address from Azure portal]
Port [port number from Azure portal]
User azureuser
IdentityFile [the file I created with the PK in]
You should then be able to SSH straight to the machine by executing in a terminal (Ctrl+’ in VS Code)
SSH [what I want to call my VM as host name]
That’s then a shell session executing on the Notebook VM. That’s powerful because I can interact with that VM with SSH however I want.
Maxim pointed out that on the SSH session, if you go to the “code” area of ~/cloudfiles/code then his team have used Azure-storage-fuse (https://github.com/Azure/azure-storage-fuse) to mount an Azure blob storage account as a virtual file system, so you can interact with that location and collaborate with your team members. This is again very powerful.
3. Using Visual Studio Code
Once you’ve achieved the SSH connection as above, you can go into Visual Studio Code and use the Remote-SSH extension to connect to the Notebook VM and interact with it - with all the features of Visual Studio Code.
Remembering the hostname you gave the VM in the SSH setup earlier, simply right click and connect to the VM. You can then interact with the VM through Visual Studio Code.
I created a very simple python module, then tested it out in python at the shell before going into Jupyter, importing the module and then invoking it. This example is very contrived but works out of the box. There are some really interesting and more advanced features such as remote debugging that I am keen to try out also.
I think using Visual Studio Code is going be super useful for Data Scientists - it’s a great tool for easily accessing the resources of an ML workspace.
Interview with Maxim!
I asked Maxim Lukiyanov, Principal Program Manager at Microsoft’s AzureML team a few questions about why they'd build this functionality and who it might help most of all.
AC: What is the top feature of Visual Studio Code for Data Scientists? Do you think any other roles will use this tool?
ML: When we talk to data scientists we hear different opinions. Some like Jupyter interactivity, others prefer full featured python IDE and many use both. VS Code is one of the most popular IDE choices today and it really completes the code authoring story of Notebook VMs. Another aspect of Notebook VM that is valuable in enterprise setting is its improved security and compliance with IT policies. It works really well as cloud workstation and as such can be also be used by engineers, data analysts and new role of ML engineers.
AC: Do you think there's an ongoing trend towards Jupyter, or will you support other tools?
ML: Jupyter and VS Code style editors are trending, and both are popular. It doesn’t seem one is overtaking another, so we support a combination of them. R Studio is also popular within R community, this is something we will look at in the future.
AC: There's a lot of notebooks on Azure now - Notebooks, ML Notebook VMs, stuff in HDInsight, stuff in Databricks ... can you give me a view on which is best of what or any insight on them?
ML: Azure notebooks are designed for sharing in academia setting and really works well in those scenarios. Enterprise setting, with more locked down and more powerful compute scenarios, is where NBVM [Notebook Virtual Machines] shines. Databricks and HDI are for scale out analytics, not purely ML. It’s much more natural to do deep learning with native framework support in NBVM/AzML than in Azure Databricks. Azure databricks requires you to change your code, it’s very opinionated in that sense. Notebook VM is also brings full customizability of the VM something which is not available in other offerings.
AC: Does this plug in to the MLOps trend well? Can I create these via the az ml cli?
ML: MLOps is a new feature we recently introduced in Azure ML. You might want to check out our blog post we recently published on that topic. But Notebook VM is still in Preview and lacks some of the integration points. It is certainly on our roadmap though, we receive frequent asks from our customers to support scripting for Notebook VM.
That's it for now!
Thanks for reading. Let me know if you’ve used Visual Studio Code and what you think of it in the comments below.
Best post, very informative