In this post I walk you through one of the preview services provided by the (for me) new Azure Machine Learning service workspace: Visual Interface. The aim is to give you an overview of how to use the service in the scope of a project you might have been working with already, and to show you the differences and similarities between this, and Azure Machine Learning Studio. There are some questions opened at the MSDN forums, which I refer to throughout the post, check them out for news and details! 🙂
For testing, we use the same dataset like we did in the previous posts about the Nepal earthquakes project:
Or simply find the dataset (new_dataset) on GitHub.
Introduction and getting started
The Visual interface of Azure Machine Learning service can be created by generating a workspace on the Azure Portal. Some of these services you can try for free, or with an Azure subscription. Get started with some useful tutorials at the Azure documentation.
If you have worked with Azure ML Studio before, you can see that the Visual interface has basically the same UI, with more intense colors, and cleaner left side panel. You can just drag and drop the modules you want to use, and run the experiment to train your model. As of the date when this post is released, there are some missing modules, and some missing features, but the dev team works very hard to provide you the same, or even better experience than you are used to. Feel free to reach out in a comment on this post, if you have any issues or inquiries regarding to this service.
Let’s build an experiment to predict the earthquake damages in Nepal and to investigate the differences between the two machine learning services, step-by-step. I will mention some of my personal opinion as well, I will be very happy if you reflect on these, I’m interested in your view!
Start up the Visual interface, which you can also do via the Azure Portal, Machine Learning service workspace. In the menu on the left, choose Visual interface.
The data we are going to use is in the the new_dataset folder in this repo. Now go to the Visual interface and choose the Datasets menu, and click on the New button.
The first thing you notice when you click on new, that it opens up like this, even though you chose the datasets. In Studio it offers you to create a new dataset, or experiment, depending on which menu you are in. I prefer the latter version, because when you want to add a new dataset in the Visual interface, you must click twice. So add the new values (demographic data of buildings) and labels (damage_grade).
Now click on the new button again and create a blank experiment. We are going to build a similar model like we did in the Predicting earthquake damages with Azure ML post. Maybe revisit this, if you want to follow the next sections easier, as I might exclude some details that I wrote about already.
Build the experiment
Pull in the two datasets from the left side panel, and the Join data module. Join the two datasets by the building_id. Now you need to drag and drop the Select columns in dataset module, connect it to the joined data and then click on the module. On the right side panel, choose the Edit columns option, and the With rules tab. Now you should see a window that looks something like this:
Begin with all columns, and exclude the following columns.
Exclude column names: has_superstructure_other,geo_level_1_id,geo_level_2_id,geo_level_3_id,has_secondary_use_agriculture,has_secondary_use_hotel,has_secondary_use_rental,has_secondary_use_institution,has_secondary_use_school,has_secondary_use_industry,has_secondary_use_health_post,has_secondary_use_gov_office,has_secondary_use_use_police,has_secondary_use_other,building_id,building_id (2),land_surface_condition,foundation_type,roof_type,ground_floor_type,other_floor_type,position,plan_configuration,legal_ownership_status
The next step is to pull in the Normalize data module, and choose ZScore as algorithm, make sure you exclude the damage_grade (Edit columns). Then the Split data module is needed, connect it together, and set the fraction to 0.75. Now this is how your experiment should look like:
Now it’s time to choose our algorithms. When we used Studio, we worked with Neural Networks and Decision Jungle. As I mentioned before, some of the modules are not available yet in the Visual interface. We need multiclass classification algorithms, because we have to predict the damage_grade, which has more than two categories (1 – low damage, 2 – medium damage, 3 – high damage).
in Visual interface
Multiclass Classification models
You can see that there is a difference between the options you can choose from. At this point I suggest you to use the environment which provides you the algorithms you need to use for your case. Before I prepared this post I tried out the new algorithms available in Visual interface, and only use two of them which performed the best with the dataset.
So to continue to the next cool feature of Visual interface, let’s use the Neural Network and the Logistic Regression algorithms, get the Train model for both, click on Edit columns and choose damage_grade to train on. Drag and drop the Score model module also for both sides, and then connect the Evaluate model. Save your experiment, which should look like this:
Run the experiment
When you wanted to run your experiment with Azure Machine Learning Studio, you simply had to click on the run, or choose whether you want to run only the selected modules or the whole experiment. This functionality is currently missing, which makes it very painful when the training has to run more than once, because it takes a while from the beginning to the end.
With Visual interface you are allowed to choose a machine type (size, cores, CPU, etc.) which will be used to train your model. This is a really good feature, you can easily scale from a small to a bigger type of compute, to make it fit your model. Click on Run, and create a new computer. The predefined compute target is a bit slow maybe for this experiment, but you can create another one at the workspace of Azure Portal, just go to the Compute menu. (I use this size: STANDARD_DS5_V2).
When the experiment has to run more than once, I found only one – tiny bit – annoying feature: every time you run the experiment, you have to choose a compute target. What if I want to use the same target for the next few days, or even longer? I wish there was a possibility to set a default size, and use that, unless I want to scale up. Without this, every time you want to run the experiment, you have to make three clicks – instead of the one click inside of Studio. Is this only annoying to me?
To decide how and where to improve our model, the best way is to look at the result of the evaluation. If you think back how did it look for the classification algorithms in Studio, it had the accuracy, and a confusion matrix for each observed algorithms. In Visual interface, when you visualize the evaluation, it looks like the following:
The first issue I have with this is that I’m not sure which row represents which algorithm. On the other hand, I really miss the confusion matrix and some extra details about the training results. These 5 values don’t tell me much, except maybe that the accuracy is not high enough.
This post is basically an intro to this service, and I will improve the content as soon as new features are presented, or an inquiry is added to the forums and relates to the steps we took in this project. You should make sure before you start working with this service that it fits the solution you build, and if you are unsure, stick with the Azure ML Studio.
Follow me 🙂