Creating Learning-based Escalation Policy

Describes how to create learning-based escalation policy.

Leave Feedback

Introduction

Learning-based alert escalation is a method that automatically routes incidents to appropriate groups, priority or category. The model requires a training file (in CSV format) that includes example incident assignments for different type of alerts.

Machine-learning is applied to learn patterns from the input training file and uses the learned pattern to drive learning-based alert escalation. The learned models are applied against the incoming alerts.

Configuring alert escalation is a manual configuration that requires incident information like incident assignee name, category, and priority as mandatory fields.

Prerequisites

Prior to defining rules to escalate alerts as an incident, configure name, scope, and resources for the alert escalation policy.

To configure the alert escalation policy:

Step 1: Define name and scope

To define name and scope:

  1. From All Clients, select the client.
  2. Select Setup > Alerts > Alert Escalation.
  3. Click the Add button and provide Name and Description.
  4. From the Mode drop-down list, select one of the options.
    • ON: Alert escalation policy is created and escalation is performed
    • Observed: An escalated alert in Observed mode is only for information purpose.
      • Observed mode enables you see potential alerts that would be escalated by the policy without creating a real alert escalation policy.
      • In observed mode, an observed alert is created for each alert that is escalated. You can view these Observed alerts in the Alerts browser.
        These observed alerts are indicated with Observed status.
      • You can only perform Close action on an observed alert, no other action is allowed. The close action performed on the observed alert does NOT affect the actual alert.
    • OFF: Alert escalation policy is created, but no escalation is performed.
  5. Select the organization whose users will receive the escalations from this policy. Example: If you choose partner organization, then only partner users can receive escalation.
  6. Click Next: Select Resources.
  7. Select resources for the escalation policy.
    • Resources can be selected from one or more clients.
    • Add up to 100 resources.
    • When escalating alerts for users of a specific client, add only resources from that client.
  8. (Optional) filter the resources by:
    • Resource Name
    • Resource Type
    • Service Group
    • Device Group
    • Site

Step 2: Define escalation rules

In this step, the escalation rules define the escalation as an incident.

To define the escalation rule:

  1. In Define Escalation Rules section, select Escalate automatically as follows and then select Escalate as Incident.
  2. Enable Continuous Learning for machine-learning to continuously learn patterns from alert data.
    • If continuous learning is enabled, machine learning models are continuously retrained on a weekly-basis and are
      based on recent alert data (past three month alert data).
    • The continuous learning complements user provided training data, via training files. Patterns learned from user provided training data and continuous learning are both incorporated into auto incident creation actions.
    • In the combined data set (recent alert data and user provided training data), the user provided training data is considered first followed by recent alert data.
      Continous Learning Option

      Continous Learning Option

  3. Select Modify to modify the incident subject or description.
  4. Click on the toggle button against an attribute for machine-learning to learn patterns from the training file. Machine-Learning is applicable for the below attributes:
    • Assignee Group
    • Category
    • Sub-Category
    • Business Impact
    • Priority
    • Cc
  5. Click Import a Dataset and Train Model.

Step 3: Import datasets and apply training model

Upload a training file. Only one training file can be uploaded per client.

  • You can also create multiple alert escalation policies by filtering specific alert and resource attributes in the Resources and Alert Condition tabs, but the machine-learning model is just one that simplifies the alert escalation configuration.
  • Changing the training file affects all learned policies of the client.
  • If a change is made on the training file, a user must delete the existing file and re-upload it in OpsRamp.
  • Alerts which are already escalated are NOT impacted by the changes.

To import the dataset and apply the training model:

  1. Click Drop the training data file here or browse to upload the training file.
  2. Select the file from your local folder. On uploading the file to OpsRamp, click Manage Data and Train Model.
  3. Select the Input and Output Columns for Model training.
    • Input columns are the columns specified in the training file.
    • Output columns are the learned configurations on which machine-learning is enabled.
  4. Click Continue to Model Training. The accuracy of the trained alert escalation model appears in the Summary section.
  5. Click Train Model and then click Review.
Training Summary

Training Summary

Viewing alert escalation policies

The alert escalation policy is created and appears on the Alert Escalation Policies page. ML indicates that policy is based on a machine-learning algorithm rather than a user-defined model.

  • If the ML icon is blue: Accuracy of trained alert escalation model is above 80% and the policy is used for alert escalation.
  • If the ML icon is red: Accuracy of trained alert escalation model is below 80%.

In addition:

  • If the accuracy is below 80%, the policy is temporarily disabled until the accuracy of the model moves above 80% after the next training.
  • If a modal accuracy is low, OpsRamp creates an Incident using the default values mentioned in the escalation policy. For example, in a certain alert escalation policy the default value provided for the field Priority is High and Continuous Learning is also enabled for the policy policy.
  • If the accuracy of the trained model is low, then OpsRamp considers the default value for creating an Incident. In the above example, value for Priority is considered as High.