To provide notification when there is a change in the availability of agent-monitored metrics, the agent self-monitoring feature generates alerts when agent-monitored metrics are interrupted and, again, when metric monitoring is restored.

This provides client visibility into the window when the monitored metrics might differ from what is expected.

Operation

Periodically, every six hours by default, the agent compares the metrics it is configured to collect with the metrics actually collected. If there is a change in the metrics collected over the interval, either fewer than expected metrics are collected, or full or partial metric collection is restored, the agent alerts on the change in status. If there is no change in status over the interval, an additional alert is not generated even if the same metrics continue to be unmonitored. If communication to the agent is interrupted, the agent sends the unmonitored metrics alert when communication is restored.

If metrics collection fails and is restored in the six-hour interval, an alert is not generated. You can change the self-monitoring interval to between one and 12 hours.

Self-monitoring generates one alert per device or agent resource, not an alert per metric. The alert is one of the following types:

Alert typeConditionAlert message
criticalmissing metrics or a change in missing metrics since the last auditAgent: no metric samples collected from some monitors
healedno missing metricsAgent: metric samples collected from all monitors

Critical- and warning-level alerts include a list of metrics that are expected but not currently monitored. Metrics are grouped by agent template categories. For example, all missing performance metrics are grouped in the Performance Monitoring category. Each application or custom monitor is a separate category with the application monitor name as the group name.

Monitoring agent template categories:

  • Performance Monitoring
  • Process Monitor
  • Windows Services Monitor
  • Other Monitors
  • custom monitor name
  • G2/application monitor name

The minimum metrics sampling interval is one hour.

Constraints

  • If a metric is defined for alert notification only and not for graph data, self-monitoring can generate a false alert because the agent does not send graph data for these metrics. This constraint primarily applies to custom monitors.

  • Self-monitoring works on the metric level, not the component level, so any component agent that gets data is assumed to be working.

    For example, for a disk.utilization metric with three components, C:, D:, and F:, if any of the components can collect data and the other two components fail, self-monitor does not send an alert.

  • For KVM and Docker monitoring, the virtual machine or container is considered to be a component. Again, if any virtual machine or container is missing graph data, self-monitoring does not generate an alert.

  • If the wrong template is applied, such as a Windows-based template applied to Linux or, conversely, a Linux-based template is applied to Windows, self-monitoring sends an alert.

Metric monitoring status alert format

  • Alert Type: MONITORING
  • Sub-Alert Type: agent_self_monitoring_error
FieldContentDescription
SubjectCritical alert: Agent: no metric samples collected from some monitors.
Warning alert:
Healed alert: Agent: metric samples collected from all monitors.
Alert status
Date Createdformatted timeTime alert created.
Created Time At Sourceformatted timeTime alert created at source.
DescriptionExample:
performance Monitoring: CPU, DISK, FREEDISK, MEMORY;
For each alert category:
  • Monitor category name:
    • Performance Monitoring
    • Process Monitor
    • Windows Services Monitor
    • Other Monitors
    • custom monitor name
    • G2 or application monitor name
  • List of metrics not collected, comma-separated.
This field contains an aggregated list of metrics not collected, categorized by agent template category. It is not an alert on a metric.

You can also view the alert in the agent log.

Enable agent self-monitoring when the client is created

  1. Navigate to Setup > Accounts > Clients.
  2. On the CLIENTS page, click + Add.
  3. In the Agent Monitoring Capabilities section, select Yes to enable agent self-monitoring. Agent self-monitoring is disabled by default.

All client agents are notified when self-monitoring is enabled.

Enable or disable agent self-monitoring

  1. Navigate to Setup > Accounts > Clients.
  2. On the CLIENTS page, select the client you want to change the agent self-monitoring status for.
  3. Click Edit.
  4. In the Agent Monitoring Capabilities section, select Yes to enable agent self-monitoring. Select No to disable agent self-monitoring.
  5. Click Finish.

All client agents are notified when self-monitoring is disabled.

Change the self-monitoring frequency

The agent self-monitoring frequency can be changed, in minute units, at the device level, in the agent configuration.

  • default frequency: 360
  • minimum frequency: 60
  • maximum frequency: 720

You can set the self-monitoring frequency for each client agent, and each agent can be set to a different frequency:

  1. Open the configuration.properties file in the opsramp/agent/conf folder.
  2. In the Misc section, find the self_monitor_timer_min key.
  3. Change the value to the frequency you want, in minutes.
  4. Save the file.
  5. Restart the agent service.