Hey! If you haven’t created an Azure Machine Learning resource within the Azure Portal yet, you’ll need to do that to follow along with this. Here’s a quick guide.

Introduction

Today, we’ll learn how to run Python scripts in the cloud on Azure ML. All Azure ML is doing is taking some code we wrote, spinning up a Virtual Machine (VM), installing any dependencies we need, and running the script we asked it to run.

So, naturally, the first step is to write a Python script that works on your local machine. We are going to write a script that:

  • Accepts an optional command line argument --message .
    • The default will be “Hello, world!”.
  • Takes the value of --message and saves it to a file logs/message.txt
    • If the logs/ directory doesn’t exist, it’ll create it.

On your local computer, the name of logs/ directory is meaningless. But, on Azure ML, this is a reserved directory specifically for saving things from your Run. So, don’t change it unless you wanna have a bad time. 😉

Writing a Simple Example

Lets set up a very simple example to understand the basics. First, we make a new directory azureml-examples and prepare some files within. Ignore run_on_azureml.py for now, but create the rest of the files shown below.

azureml-examples
├── config.json        # AzureML Resource config.json file from the portal 
├── requirements.txt   # File that lists Python dependencies
├── run_on_azureml.py  # The file we'll use to execute code on AzureML
└── simple_example     # Directory holding your script + any related files
    └── run.py         # The script we'll run
  • config.json

      {
          "subscription_id": "<YOUR SUBSCRIPTION ID>",
          "resource_group": "<YOUR RESOURCE GROUP>",
          "workspace_name": "<YOUR WORKSPACE NAME>"
      }
    
  • requirements.txt

      azureml-defaults
      mlflow
      azureml-mlflow
      torch
      torchvision
      pytorch-lightning
      cmake
    
  • simple_example/run.py

      import os
      from argparse import ArgumentParser
      from pathlib import Path
    
      from pprint import pprint
    
      if __name__ == '__main__':
          parser = ArgumentParser()
          parser.add_argument('--message', type=str, default='Hello, world!')
          args = parser.parse_args()
            
          logdir = Path('./logs')
          logdir.mkdir(exist_ok=True, parents=True)
          outfile = logdir / 'message.txt'
          outfile.write_text(args.message)
          print(f"Message: {args.message}")
          print('-'*40)
          print()
          pprint(dict(os.environ))
          print('-'*40)
          print(f"Current Directory: {Path.cwd()}")
          print('-'*40)
    

Note - we wont be using the dependencies in requirements.txt in this first example. But, by specifying them now, we can avoid the lengthy environment preparation phase as we move ahead with more complex examples…More on this later!

Run it locally

First, I suggest setting up a virtual environment and installing the requirements.txt file we defined by running pip install -r requirements.txt.

Then, from the azureml-examples directory, you should be able to run:

python simple_example/run.py

You should see a new folder appear called logs/ with your file logs/message.txt saved within. If you open up that file, you should see:

Hello, world!

Run the file again, this time supplying a different value to our message argument:

python simple_example/run.py --message Howdy!

Your message.txt file should now show the message you passed when you open it.

Howdy!

Cool, now that that’s working, lets try to run it on Azure ML.

Running Scripts via the azureml Python SDK

Logical Flow

Here’s the logic for taking that script you just ran locally and running it on AzureML via the azureml Python SDK:

  • Authenticate with your AzureML Workspace via the config.json file.
  • Create or reference a Compute Target. These can be CPU-only or GPU-enabled machines. We’ll use cheap CPU-only instances in this tutorial, since our code isn’t using the GPU.
  • Define the script you’d like to run on Azure. In our case, that’s simple_example/run.py.
  • Define the folder that holds your code and any other related helper files. In our case, that’s simple_example/.
  • Define the Environment in which your code will run. This is where you can point to a requirements.txt file if ya have one (which we do).

Below, you can see a higher level overview of the components involved in our task today. We aren’t going to use the components marked in grey for now.

https://nateraw.com/images/azml-diagram.png

Writing the runner script

This script can technically be run from anywhere. We’ll run it locally, but you could also run this from a function app/logic app, another VM, the backend code of your API, etc…

Keep in mind that you’ll need your config.json file in the same directory as this script in order for it to work.

First, lets start off with some imports

from pathlib import Path

from azureml.core import (
    Environment,
    Experiment,
    Run,
    ScriptRunConfig,
    Workspace,
)
from azureml.core.compute import AmlCompute, ComputeTarget

We can Authenticate with our AzureML instance by creating a Workspace object using the config.json file you downloaded earlier.

workspace_config = "config.json"  # the path to your config.json file
ws = Workspace.from_config(workspace_config)

Next, we can either create or reference a ComputeTarget . This is where we specify the size of the VM we want to use for our runs. We do that by supplying an ‘instance type’, which is basically an identifier name of an available machine from Azure. You can see a list of them and their prices here.

Let’s say we want to name our target "my-compute" . We can first check if a target with that name exists. If it does, let’s use that. If not, we’ll create a new ComputeTarget with that name given the instance type we want. Here’s what that would look like:

def find_or_create_compute_target(
    workspace,
    name,
    vm_size="Standard_D8_v3",
    min_nodes=0,
    max_nodes=1,
    idle_seconds_before_scaledown=900,
    vm_priority="lowpriority",
):

    if name in workspace.compute_targets:
        return ComputeTarget(workspace=workspace, name=name)
    else:
        config = AmlCompute.provisioning_configuration(
            vm_size=vm_size,
            min_nodes=min_nodes,
            max_nodes=max_nodes,
            vm_priority=vm_priority,
            idle_seconds_before_scaledown=idle_seconds_before_scaledown,
        )
        target = ComputeTarget.create(workspace, name, config)
        target.wait_for_completion(show_output=True)
    return target

compute_target_name = "my-compute"
compute_target = find_or_create_compute_target(ws, compute_target_name)

Next, we can define any dependencies needed by our script by creating an Environment and supplying the path to our requirements.txt file.

requirements_file = 'requirements.txt'  # Assumes its in current directory
env = Environment.from_pip_requirements("my-pip-env", requirements_file)

Now we can configure the run by creating a ScriptRunConfig .

  • source_directory - a directory that holds your script and any related files it may need. We just use pathlib to specify this is the direct parent directory of our script, simple_example/.
  • script - The relative path of the script you’d like to run relative to source_directory. In our case, we can specify this is just the name of our file using pathlib, run.py.
  • arguments - Any args to pass to your script when it executes on Azure. This is in argparse format, so you have to pass it a list like this: ["--message", "Howdy!"].
script_path = 'simple_example/run.py'
script_args = ['--message', 'Howdy!']

run_config = ScriptRunConfig(
    source_directory=Path(script_path).parent,
    script=Path(script_path).name,
    arguments=script_args,
    compute_target=compute_target,
    environment=env,
)

Finally, we submit the run - grouping it under an Experiment. Experiments let you group related runs together. In our case, we’ll submit the run under an experiment called simple-example. Don’t worry - if it doesn’t exist, Azure will create it for you.

experiment_name = 'simple-example'
exp = Experiment(ws, experiment_name)
exp.submit(run_config)

Optionally, we can stream the logs of the run directly to our terminal by using the run.wait_for_completion() method and then delete the cluster so it doesn’t keep costing us money.

# Wait for the run to finish
run.wait_for_completion(show_output=True)

# Delete our compute target so it doesn't cost us money
compute_target.delete()

Putting it all together

  • The full script

      from pathlib import Path
    
      from azureml.core import (
          Environment,
          Experiment,
          Run,
          ScriptRunConfig,
          Workspace,
      )
      from azureml.core.compute import AmlCompute, ComputeTarget
    
      def find_or_create_compute_target(
          workspace,
          name,
          vm_size="Standard_D8_v3",
          min_nodes=0,
          max_nodes=1,
          idle_seconds_before_scaledown=900,
          vm_priority="lowpriority",
      ):
    
          if name in workspace.compute_targets:
              return ComputeTarget(workspace=workspace, name=name)
          else:
              config = AmlCompute.provisioning_configuration(
                  vm_size=vm_size,
                  min_nodes=min_nodes,
                  max_nodes=max_nodes,
                  vm_priority=vm_priority,
                  idle_seconds_before_scaledown=idle_seconds_before_scaledown,
              )
              target = ComputeTarget.create(workspace, name, config)
              target.wait_for_completion(show_output=True)
    
          return target
    
      workspace_config = 'config.json'
      requirements_file = 'requirements.txt'
      compute_target_name = 'my-compute'
      experiment_name = 'simple-example'
      script_path = 'simple_example/run.py'
      script_args = ['--message', 'Howdy!']
    
      # Authenticate with your AzureML Resource via its config.json file
      ws = Workspace.from_config(workspace_config)
    
      # The experiment in this workspace under which our runs will be grouped
      # If an experiment with the given name doesn't exist, it will be created
      exp = Experiment(ws, experiment_name)
    
      # The compute cluster you want to run on and its settings.
      # If it doesn't exist, it'll be created.
      compute_target = find_or_create_compute_target(ws, compute_target_name)
    
      # The Environment lets us define any dependencies needed to make our script run
      env = Environment.from_pip_requirements("my-pip-env", requirements_file)
    
      # A run configuration is how you define what youd like to run
      # We give it the directory where our code is, the script we want to run, the environment, and the compute info
      run_config = ScriptRunConfig(
          source_directory=Path(script_path).parent,
          script=Path(script_path).name,
          arguments=script_args,
          compute_target=compute_target,
          environment=env,
      )
    
      # Submit our configured run under our experiment
      run = exp.submit(run_config)
    
      # Wait for the run to finish
      run.wait_for_completion(show_output=True)
    
      # Delete our compute target so it doesn't cost us money
      compute_target.delete()
    

What happens when you run it?

It’ll first spin up a cluster for you. You can watch its status in the portal from the compute tab.

https://nateraw.com/images/azml1.png

Once a node is up, your code should be done preparing and will start running. Check out the experiment and the run created for you from the Experiments tab.

https://nateraw.com/images/azml2.png

Once it finishes running, you should be able to see the output file we wrote in the ./logs directory via the “Outputs + Logs” tab of the run. It should contain the message you supplied via the --message arg.

https://nateraw.com/images/azml5.png


Conclusion

In this tutorial, you learned:

  • The basics of the Azure ML Python SDK
  • How to run a simple Python script on Azure ML
  • How to save files from your runs so you can view/download/use them later

In the next tutorial, we’ll learn the basics of deploying models to API endpoints on AzureML.