What to do when an agent is offline?

When installed/restarted an agent first performs a web post to api.opsramp.com on port 443 to get the server to connect.

  1. Verify if DNS resolution is working.
  2. Telnet to port 443 on api.opsramp.com.
  3. In the browser, enter https://api.opsramp.com and verify the redirect to OpsRamp .

What if the agent cannot reach api.opsramp.com?

If api.opsramp.com is not reachable, the agent falls back to cn01-sjc.opsramp.com after six (6) checks to get the server URL to connect.

After getting the first response from api.opsramp.com, the agent connects to the servers based on which node the client is connected to. An agent log file is displayed and provides information about the agent connected to DNS.

How do I validate the agent if it is displayed offline?

  1. Make sure DNS resolution is working.
  2. Telnet to port 443 on above URLs based on what URL agent is connected to.
  3. In the browser, type https://Above URL/stats. It displays a page with Agent count text.
  4. If above validations are successful and agent remains offline, create a support ticket with the AgentLog file attached. Mention if the agent is a proxy agent or direct agent.

What are the steps that the agent shield uses for direct agent and proxy agent?

ProblemSolution
Agent process does not runAgent Shield restarts the agent.
Agent process does not run
  • For Direct Agent: Agent Shield checks the reachability of the API server URL and connection node URL. If both the URLs are reachable, agent shield restarts the agent service.
    If the agent cannot connect to the cloud, the shield records the details into a log file: `shield.log`. (The shield does not restart the agent.)
  • For Proxy Agent: Agent Shield checks the proxy IP URL. If the URL is reachable, the agent shield restarts the agent service.
Agent process consumes more than 250 MB memoryAgent Shield restarts the agent.
Agent process consumes more than 2500 handle count of resourcesAgent Shield restarts the agent.
Agent process consumes more than 250 thread count of executionAgent Shield restarts the agent.

How to find the agent connection status directly from the host?

The inbuilt script in the agent will provide the agent’s online/offline status and a few other details.

Login to the Agent installed host and run the following command:

Operating SystemScript
Windows
  • Script type: Powershell
  • Default Installation: C:\Program Files (x86)\OpsRamp\Agent\plugins\
    Custom Installation: {custom_dir}\OpsRamp\Agent\plugins\
  • Script Name: agentstatusinformation_windows.ps1
Linux
  • Script type: Shell
  • Default Installation: /opt/opsramp/agent/plugins/
    Custom Installation: {custom_dir}/opsramp/agent/plugins/
  • Script Name: agentstatusinformation_linux.sh
FreeBSDNot available

Here is the example output:

 {  
 "name": "DESKTOP-6CQTB02",
 "hostName": "DESKTOP-6CQTB02",
 "ipAddress": "192.168.43.232",
 "macAddress": "08:00:27:EC:8D:F9",
 "osName": "Microsoft Windows 10 Home 10.0.19042 Build 19042.1466",
 "osArchitecture": "64-bit",
 "agentInstalled": true,
 "agentStatus": "ONLINE",
 "agentLastConnectedTime": "2022-02-18T05:21:15"  
 }

Why did patch scan fail for Windows agent with “Missing Patch Scan Job failed” alert?

You may receive missing patch scan alert while scanning the Windows agent. To troubleshoot the “Missing Patch Scan Job failed” alert, follow below steps:

  1. Open Alert Description: Open the alert description to access more information about the issue.
  2. Examine Hexadecimal Error Code: Look for the hexadecimal error code provided in the alert description. These error codes are typically Microsoft-related.
  3. Search the Error Code: Copy the hexadecimal error code (e.g., 0x80072ee2, 0x80080005) and perform a Google search. Refer to the documentation provided in the search results for more details about the error and possible solution.

Graphs Not Displaying for Metrics Data

Issue:
Metrics data is being collected by the agent, but the graphs are not populating in the OpsRamp portal.

Reason:
This issue may arise due to discrepancies in the device’s time settings. If the device’s clock is set to a future or past time, the collected metric data might not be uploaded to the database properly, resulting in graphs not displaying on the portal.

Solution:

  1. Check Time Settings: Ensure the device’s time zone is set correctly according to its geographical location.
  2. Synchronize with an NTP Server: Use a Network Time Protocol (NTP) server to automatically synchronize the device’s time with a standard reference clock. This helps ensure the time is accurate and consistent.
  3. Manual Adjustment: If NTP synchronization is not available, manually adjust the time and date settings on the device.
  4. Verify and Restart: After adjusting the time settings, verify that the device’s time is correct. Restart the device if necessary to ensure the changes take effect.

How to Enable Container Discovery?

Solution:

We have disabled the container discovery by default, users should have to pass -d false as an argument to the following configure command:

sudo /opt/opsramp/agent/bin/configure -K [Key] -S [Server] -s [API Key]