Introduction
An Azure Machine Learning (Azure ML) Workspace is a centralized platform within Azure Machine Learning Services that enables data scientists and developers to efficiently manage machine learning (ML) projects. It acts as a collaborative environment for building, training, deploying, and monitoring ML models while ensuring security and scalability.
Use OpsRamp Azure Public Cloud Integration to discover and collect metrics against Machine Learning Services Workspaces
.
Setup
To set up the Azure integration and discover the Azure Machine Learning Services Workspaces resources, do the following:
- Create an Azure Integration, if not available in your installed integrations. For more information on how to install the Azure Integration, refer to Install Azure Integration.
- Create a discovery profile. For more information on how to create a discovery profile, refer to Create Discovery Profile.
- Select
Machine Learning Services Workspaces
under the Filter Criteria in the Edit Discovery Profile page. - Save the discovery profile to make them available in the list of Discovery Profiles.
- Scan to discover the resources at any time independent of the predefined schedule.
- Once the scan is completed, you can view the Machine Learning Services Workspaces resources under Infrastructure > Resources > Microsoft Azure category.
Event support
OpsRamp supports Azure events for Machine Learning Services Workspaces. Configure Azure Events in the OpsRamp Azure integration discovery profile.
See Process Azure Events for more information on how to configure Azure events.
Supported metrics
OpsRamp Metric | Azure Metric | Metric Display Name | Unit | Description | Aggregation Type |
---|---|---|---|---|---|
azure_ml_services_workspaces_Active_Cores | Active Cores | Active Cores | Count | Number of active cores in the Azure ML workspace. | Average |
azure_ml_services_workspaces_Active_Nodes | Active Nodes | Active Nodes | Count | Number of active nodes in the Azure ML workspace. | Average |
azure_ml_services_workspaces_Cancel_Requested_Runs | Cancel Requested Runs | Cancel Requested Runs | Count | Number of runs where cancelation was requested. | Total |
azure_ml_services_workspaces_Cancelled_Runs | Cancelled Runs | Cancelled Runs | Count | Number of runs canceled in the workspace. | Total |
azure_ml_services_workspaces_Completed_Runs | Completed Runs | Completed Runs | Count | Number of successfully completed runs. | Total |
azure_ml_services_workspaces_CpuUtilization | CpuUtilization | CPU Utilization | Percent | Percentage of memory utilization on a CPU node. | Average |
azure_ml_services_workspaces_Errors | Errors | Errors | Count | Number of run errors in this workspace. | Total |
azure_ml_services_workspaces_Failed_Runs | Failed Runs | Failed Runs | Count | Number of failed runs. | Total |
azure_ml_services_workspaces_Finalizing_Runs | Finalizing Runs | Finalizing Runs | Count | Number of runs in the finalizing state. | Total |
azure_ml_services_workspaces_GpuUtilization | GpuUtilization | GPU Utilization | Percent | Percentage of memory utilization on a GPU node. | Average |
azure_ml_services_workspaces_Idle_Cores | Idle Cores | Idle Cores | Count | Number of idle cores. | Average |
azure_ml_services_workspaces_Idle_Nodes | Idle Nodes | Idle Nodes | Count | Number of idle nodes. | Average |
azure_ml_services_workspaces_Leaving_Cores | Leaving Cores | Leaving Cores | Count | Indicates the number of cores that are no longer in use. | Average |
azure_ml_services_workspaces_Model_Deploy_Failed | Model Deploy Failed | Model Deploy Failed | Count | Number of failed model deployments. | Total |
azure_ml_services_workspaces_Model_Deploy_Started | Model Deploy Started | Model Deploy Started | Count | Number of started model deployments. | Total |
azure_ml_services_workspaces_Model_Deploy_Succeeded | Model Deploy Succeeded | Model Deploy Succeeded | Count | Number of successful model deployments. | Total |
azure_ml_services_workspaces_Model_Register_Failed | Model Register Failed | Model Registration Failure | Count | Counts the total instances of model registration failures in this workspace. | Total |
azure_ml_services_workspaces_Model_Register_Succeeded | Model Register Succeeded | Model Registration Success | Count | Counts the total instances of successful model registrations in this workspace. | Total |
azure_ml_services_workspaces_Not_Responding_Runs | Not Responding Runs | Unresponsive Runs | Count | Indicates the total number of runs that are unresponsive for this workspace. | Total |
azure_ml_services_workspaces_Not_Started_Runs | Not Started Runs | Pending Runs | Count | Counts the number of runs that are in a Not Started state for this workspace. | Total |
azure_ml_services_workspaces_Preempted_Cores | Preempted Cores | Preempted Cores | Count | Indicates the number of cores that were preempted. | Average |
azure_ml_services_workspaces_Preempted_Nodes | Preempted Nodes | Preempted Nodes | Count | Indicates the number of nodes that were preempted. | Average |
azure_ml_services_workspaces_Preparing_Runs | Preparing Runs | Preparing Runs | Count | Counts the total number of runs currently in preparation for this workspace. | Total |
azure_ml_services_workspaces_Provisioning_Runs | Provisioning Runs | Provisioning Runs | Count | Counts the total number of runs that are currently provisioning in this workspace. | Total |
azure_ml_services_workspaces_Queued_Runs | Queued Runs | Queued Runs | Count | Number of runs in the queue. | Total |
azure_ml_services_workspaces_Quota_Utilization_Percentage | Quota Utilization Percentage | Quota Utilization Percentage | Percent | Percentage of quota utilized in the workspace. | Average |
azure_ml_services_workspaces_Started_Runs | Started Runs | Active Runs | Count | Counts the number of runs that are actively running for this workspace. | Total |
azure_ml_services_workspaces_Starting_Runs | Starting Runs | Starting Runs | Count | Counts the total number of runs that have been initiated for this workspace. | Total |
azure_ml_services_workspaces_Total_Cores | Total Cores | Total Cores | Count | Total number of cores available. | Average |
azure_ml_services_workspaces_Total_Nodes | Total Nodes | Total Nodes | Count | Total number of nodes available. | Average |
azure_ml_services_workspaces_Unusable_Cores | Unusable Cores | Unusable Cores | Count | Number of unusable cores in the workspace. | Average |
azure_ml_services_workspaces_Unusable_Nodes | Unusable Nodes | Unusable Nodes | Count | Number of unusable nodes in the workspace. | Average |
azure_ml_services_workspaces_Warnings | Warnings | Warnings | Count | Number of warnings related to runs in this workspace. | Total |