Introduction

The alert definition feature allows you to set alerts on a metric using a PromQL query.

The alert definition can currently be defined at the client level.

Note: The user should have the following permissions to create and manage alert definitions.

  • Administrator permission
  • Manage Alerts

This is a Feature flag enabled functionality. Contact OpsRamp Support for assistance.

Create an Alert Definition

Follow these steps to create an alert definition:

  1. Navigate to Setup > Account. The Account Details page is displayed.

  2. Click the Alert Definitions tile. The Alert Definitions page is displayed.

  3. Click Add.

  1. Enter the following information on the Definition Details page:

    1. Name: Provide a unique name for the alert definition.
    2. Metric Query: Build a valid PromQL query using a metric. Use the filters and operations for the query as needed. See PromQL for more information.
      You can change the time-frame using the calendar icon.

    The query result (time series) is displayed in the form of a graph.

    Alert Conditions

      Static Threshold: The Static Threshold feature allows you to set thresholds for a metric value. You can also set conditions based on which the alerts are triggered.

      • Critical Threshold: Enter a critical threshold value. Enter a number or a range.
        Examples: <3, >1, 2-5, 10-15
      • Warning Threshold: Enter a warning threshold value. Enter a number or a range.
        Examples: <3, >1, 2-5, 10-15
      • Note: You can set both critical and warning thresholds or set only one threshold based on your requirements.
      • Trigger alert if conditions persist for: To avoid anomalous spikes, you can set a condition for an alert to trigger only if the metric value exceeds the thresholds persistently for some time.

        The default time is set as 60 seconds. Enter the value in either seconds or minutes.

        Example:
        The above screenshot shows the latest data point as 53.2. If the metric value is above the threshold for 60 seconds continuously, only then the alert is triggered.
        • Set the critical threshold as 50 and warning threshold as 40.
          If the metric value reached 80 and came back to 45, then a warning alert will be triggered.
      • If there is no data: If there is no data coming in, then you can choose one of the options:
        • Do not trigger alert - No alert will be triggered, if no data comes in.
        • Trigger critical alert - A critical alert will be triggered, if no data comes in.
        • Trigger warning alert - A warning alert will be triggered, if no data comes in.
        If the device stops sending data due to some reason, like the agent is offline, then the graph will show empty. There will be no graph data.

      Dynamic Change Detection: The Dynamic Change Detection feature allows you to set conditions to trigger alerts. You can evaluate the data over a learning period, which can be specified in either hours or days.

      • Provide the information in the fields.
      • Example: Trigger alert when an increase of more than 5 standard deviations away from the mean is detected.
        Evaluate the data over a learning period of the last 4 HOURS.
        How it works: It will look at the last 4 hours (default value. You can change according to your requirements) in the time series, and it will identify if the metric value deviated from the mean value. It will trigger an alert when there is an increase of more than 5 standard deviations away from the mean is detected.

        Note: Operations are not supported, while building a query.

      Forecast: The forecasting typically refers to predicting or estimating potential issues or events that might trigger an alert. This involves analyzing historical data, patterns, and trends to anticipate situations that could lead to issues or other predefined conditions.

      • Provide the information in the fields.
      • Example: Trigger alert when metric is projected to exceed a limit of 1 within a forecast period.
        Critical Threshold: Enter a critical threshold value. Enter a number.
        Example: 3 days
        Warning Threshold: Enter a warning threshold value. Enter a number.
        Example: 5 days
        How it works: It will predict the occurrence when the specified limit is about to be reached and trigger an alert based on the timeframe specified in the critical or warning threshold.
        The forecasting process will occur once daily starting from the creation of the alert definition.

        Note: Operations are not supported, while building a query.

    Notification Format

    The Subject and Description entered here will reflect in the alert details page.

    • Subject: Enter the subject for the alert.
    • Description: Enter the alert description.

    Alert Identification

    The alert identification section defines the scope of the alert.

    • Entity Type: Select either Resource or Client. Alerts can be on a specific resource like a server, or a client-level alert.
      Note: For Dynamic Change Detection, you can select the Entity Type only as Resource.
    • Component: Select a component. This is to identify the alert.

    • Resource Attributes: Define a resource attribute to the alert. These attributes are added to the alert.
      Note: The resource attributes can be defined only for Resource entity type.
      • Select the attribute key and the attribute value from the dropdown boxes. These attributes can be seen in the alert details.
      • Note: The maximum number of attributes you can select is 4, that is, host, name, UUID, and IP.
        If you select the attribute value as $name, it will go to the metric and get the value of name and display it in the alert details page.

    • Labels: Assign a value to a label. This is reflected in the alert details page.
      • Enter the name of the label in the Name box.
      • Enter the value of the label in the Value box.
      • Example: If name is id and value is 10, then it will be set as id is 10.

  1. Click Save. The alert definition is saved successfully.
    You can enable or disable an alert definition, from the Alert Definitions listing page.

Actions on an alert definition

Below are the actions you can perform on an alert definition.

ActionDescription
SearchTo search for an alert definition:
  1. Click on the search icon available on the Alert Definitions listing page.
  2. Type the alert definition name in the search box. The search result is displayed.
FilterFilter alert definitions based on Entity Type and Status:
  1. Click on the Filter dropdown available on the Alert Definitions listing page.
  2. Select the Entity Type and Status and click Filter. The alert definitions matching the filter criteria are displayed.
View and EditTo view an alert definition:
  1. Search for the alert definition you want to view.
  2. Hover over the row and click the action menu (three dots).
  3. Select View Details.
Alternatively, click on the Alert definition name to view the details.

To edit an alert definition:
  1. Make the necessary changes on the Definition Details page.
  2. Click Save. The changes are saved.
  3. Note: If you made any changes to the Metric Query or in the Alert Identification section, a popup dialog box is displayed.
    1. Select the checkbox to mark the alerts associated with the alert definition as Obsolete. They will be available for a period of three months from the current date.
    2. Click Save to save the changes.
View Failure LogsTo view failure logs:
  1. Search for the alert definition for which you want to view the failure logs.
  2. Hover over the row and click the action menu (three dots).
  3. Select View Failure Logs. The failure logs are displayed with the date and time.
  4. Alternatively, click on the Alert definition name and then click Failure Logs on the Definition Details page to view the failure logs.
RemoveTo remove an alert definition:
  1. Search for the alert definition you want to remove.
  2. Hover over the row and click the action menu (three dots).
  3. Select Remove. A confirmation popup is displayed.
  4. Click Remove to remove the alert definition.
    Alerts associated with the alert definition will be marked as Obsolete. They will be available for a period of three months from the current date.