Overview
Resource availability is a state of a resource and it is identified based on the alert on the availability metric and when the resource is onboarded on the OpsRamp.
OpsRamp continuously monitors the resources and keeps track of all the metrics samples. Whenever the availability metric reaches the critical threshold limit an alert will be raised and based on the alert the availability state of the resource will be changed.
Availability Calculation
Most of OpsRamp out-of-the-box templates include at least one or two metrics for availability calculation. These metrics will help to identify the availability state of a resource.
How to Configure the Availability
Follow the steps to configure the availability:
- Select any template of your choice and Edit the template.
- Go to the Metrics section and then select the metric that is more important for the resource.
- Select the Availability checkbox and Save the template.
- Apply the Template to the resources.
Note
You can select more than one metric so that the availability calculation will consider two metrics instead of one metric. Ideally, irrespective of the number of templates, two or three key metrics should be sufficient for identifying the availability state of a resource.Availability States
Availability State | Description | Color Indication |
---|---|---|
UP | No critical alert on availability metrics. | GREEN |
DOWN | Critical alert on availability metrics. | RED |
UNKNOWN | Data samples are not available for the availability metrics. | GREY |
UNDEFINED | No availability metric on the resource. | BROWN |
UNMONITORED | These resources are not supported for monitoring. | LIGHT TEAL |
The onboarded resources in your client fall under any of the above categories.
Availability Rules
When you apply a template, the first option ALL is selected by default, but you can change it to ANY if you prefer. To change, select the Resource, then click the Monitors tab on the right side, and then click Availability Rule.
Note
The Availability rule applies to the resources with more than one Availability metric. If you have only one availability metric then the ALL/ANY rule does not apply.Availability calculation is divided into two parts:
- ALL: This option means, if all the Availability metrics do not have any critical alert, then the resource is considered UP (OK). If any of the Availability metrics has a critical alert, then the resource is considered as DOWN.
- ANY: This option means, if any of the Availability metrics do not have a critical alert, then the resource is considered as UP (OK). If all the Availability metrics have a critical alert, then the resource is considered as DOWN.
You will find the below options and you have the option to switch between them.
- Resource is UP, if ALL availability metrics are OK. Otherwise, the resource is DOWN.
- Resource is UP, if ANY availability metric is OK. Otherwise, the resource is DOWN.
Possible States for Availability Rule
The below table explains the state of a resource based on all the possible combinations of availability metrics.
Assuming you have two availability metrics on a resource.
How will the state be calculated for ALL rules ?
Resource is UP, if ALL availability metrics are OK. Otherwise, the resource is Down.
Metric Sample#1 | Metric Sample#2 | Sample#1 Critical Alert? | Sample#2 Critical Alert? | Availability | |
---|---|---|---|---|---|
Resource A | Collected | Collected | No | No | UP |
Resource A | Collected | Collected | Yes | Yes | DOWN |
Resource A | Collected | Collected | Yes | No | DOWN |
Resource A | Collected | Not collected | Yes | N/A | DOWN |
Resource A | Collected | Not collected | No | N/A | UNKNOWN |
Resource A | Not collected | Not collected | Yes | N/A | DOWN |
Resource A | Not collected | Not collected | N/A | N/A | UNKNOWN |
How will the state be calculated for ANY rules ?
Resource is UP, if ANY availability metric is OK. Otherwise, the resource is DOWN.
Metric Sample#1 | Metric Sample#2 | Sample#1 Critical Alert? | Sample#2 Critical Alert? | Availability | |
---|---|---|---|---|---|
Resource A | Collected | Collected | No | No | UP |
Resource A | Collected | Collected | Yes | Yes | DOWN |
Resource A | Collected | Collected | Yes | No | UP |
Resource A | Collected | Not collected | Yes | N/A | UP |
Resource A | Collected | Not collected | No | N/A | UP |
Resource A | Not collected | Not collected | Yes | N/A | UNKNOWN |
Resource A | Not collected | Not collected | N/A | N/A | UNKNOWN |
When to go for the ALL Availability rule ?
If you are really concerned about ALL availability metrics and expect those metrics to be always healthy, i.e., metric samples are below the critical threshold limits, then you should go with this rule.
Therefore, if you want your resource to be in the UP state, then all availability metrics should be below the critical threshold limit.
When to go for ANY Availability rule ?
If you are only concerned about ANY one of the availability metrics and you expect one of the metrics in healthy i.e., the metric sample is below the critical threshold limits, then you should go with this rule.
Therefore, if you want your resource to be in UP state, then any one of the availability metrics should be below the critical threshold limit.
Resource Availability Score
Resource availability score is calculated based on the state of the availability metric.
Example: If the availability of a resource is DOWN for sometime, then the overall resource availability score is impacted.
Availability Score (%) = 100 - (Downtime Score)
Downtime score is determined when there is a critical alert(s) on the availability metrics in-combination with the availability rule on the resource.
Note
The UNKNOWN state of the resource is excluded when computing the downtime period since OpsRamp does not know if the resource is truly down or has simply stopped transmitting metric data for some reason. So this is purposefully ignored.Generate alert for the resources with unknown availability state
When does a resource go into an Unknown Availability State?
A template that has at least one availability metric, is applied on a resource. The resource goes into an UNKNOWN state when there is no data sample collected for the metric(s) for the last 30 minutes.
How will the user know if a resource goes into an UNKNOWN availability state?
A client-level critical alert will be generated every 30 minutes, if the resource availability state changes to the UNKNOWN state.
The critical alert will contain the link to the list of resources that have no monitoring data for the last 30 minutes. When you click the link, the Infrastructure > Resources page is displayed, with the list of UNKNOWN resources.
The alert is auto healed, if all the resources in the alert move out of the UNKNOWN state.
This alerting option is, by default, in the disabled state. You can Enable/Disable the option from the Setup > Accounts > Clients page.
See Create a Client for more information.