Troubleshooting VMware discovery and monitoring

The following are the troubleshooting steps for vmware discovery and monitoring

  1. If the discovery of the vmware integartion fails, please verify the following steps.
    • Check whether the device ip address is pinging from the gateway it’s being discovered using the following command.
      ping <ip_address of the vcenter/esxi host>
    • Check whether the device able to telnet from the gateway to 443 port.
      telnet <ip_address of the vcenter/esxi host> 443
    • Check whether the vcenter is able to connect to the SOAP using the below curl command.
      curl -vvv -k -X POST -H “Content-Type: text/xml; charset=utf-8” -H “SOAPAction: urn:vim25/5.5” -d@data.xml https://<ip_address>/sdk
      While executing the command, the following should be the data.xml file where the username and password should be replaced accordingly.
      <?xml version="1.0" encoding="UTF-8"?>
      <soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:urn="urn:vim25">
       <soapenv:Header/>
       <soapenv:Body>
       <urn:Login>
        <urn:_this type="SessionManager">SessionManager</urn:_this>
        <urn:userName>(User_name)</urn:userName>
        <urn:password>(Password)</urn:password>
       </urn:Login>
      </soapenv:Body>
      </soapenv:Envelope>
      

      This should return a response code of 200.
      The prerequisite should be met for the vcenter for performance,appliance and CIM monitoring. For more information see Prerequisites.

    • Ensure the vCenter meets all prerequisites outlined in the document for enabling performance monitoring, appliance monitoring, and CIM monitoring.
    • The following are the steps to enable additional discovery and monitoring logs, and collect VMware discovery data from the gateway and its logs.
      Step 1:
      Log in to the gateway, and then enter the Gateway command line using the following command: gcli
      Step 2:
      • Enable following logs in GCLI to collect inventory bin/Json from Cloud i.e, the VMware discovery information.
        VMware discovery flags
        flag add vcenter.inventory.bin.log on 60
        flag add vcenter.inventory.json.log on 60
        flag add vcenter.inventory.log on 60
        flag add vxrail.log on 60
        flag add vxrail.inventory.log on 60
        flag add vxrail.inventory.log1 on 60
        flag add vxrail.inventory.log2 on 60
      • Enable the following monitoring gcli flags for monitoring logs.
        VMware monitoring flags
        flag add vmware.log4 on 60
        flag add vmware.log5 on 60
        flag add vmware.log6 on 60
        flag add vmware.log7 on 60
        flag add vmware.log8 on 60
        flag add vmware.alrm.log on 60
        flag add vmware.evt.log on 60
        flag add vmware.evt.log2 on 60
        flag add vmware.evt.log2 on 60
        flag add vmware.alarm.info on 60
        flag add vmware.alarm.def on 60
        flag add vmware.validate.auth.alert on 60
        flag add vxrail.mon.log on 60
        flag add vsan.mon.log on 60
        flag add vmware.exception.log on 60
        flag add vmware.monitor.log on 60
        flag add vmware.metric.log on 60
        flag add vmware.event.log on 60
        flag add vmware.log on 60
        flag add vmware.collector.mgr.log on 60
        flag add vmware.collector.mgr.log1 on 60
        flag add vmware.alarm.log on 60
        flag add vmware.event.log on 60
        flag add vmware.log on 60
        flag add vmware.newalarm on 60
        flag add vmware.dbalarm on 60

      Step 3:
      Exit from gcli using the command exit.
      Step 4:
      Scan the vmware discovery profile.
      The following is the command to check vprobe logs.
      tail -100f /var/log/app/vprobe.log
      Check in logs whether bin/Json files generated or not. If generated check in /var/log/app/tmp location. Observe generated JSON file that contains all the required inventory information and share the /var/log/app/vprobe.log file and all the files from /var/log/app/tmp/.

      • To set log level from gcli if required.
        Syntax: loglevel set <absolute path of the class> <log level> <number of minutes>
        Example: loglevel set com.vistara.gateway.plugin.smis.ibm.IBMSMISCommunicator INFO 60.
  2. Follow these steps to check the hardware component state on the respective host if there is any issue related to hardware monitoring:
    • Login to the host through vmware host client (UI). Navigate to monitor > hardware to find the hardware components and their states.

    • Also check the component state through vmware vsphere client. Navigate to Monitor > Hardware Health under the required host.

  3. If there is any issue with the metric data (i.e, if customer claims that opsramp graph is showing wrong metric data), we can check the value for the same metrics in the graphs through vsphere client and compare the data with opsramp graphs. The following are the steps for the same:
    1. Login to vsphere client using vcenter credentials.
    2. Select the required device (for example, host).
    3. Navigate to monitor > performance > overview to get the basic metrics graphs.
    4. Navigate to monitor > performance > Advance to get all metric graphs. Filtering metrics and components in each metric can also be done here using Chart Options.

  4. To check alarm and event data in vcenter.
    1. Login to vsphere client using vcenter credentials.
    2. Click on vcenter device and navigate to Monitor > Issues and alarms to check alarms and Tasks and Events > Events to check events data.
  5. If there are large number of sessions created on vcenter with user agent “pyvmomi”, check if vSAN and VxRail templates got assigned on the resources (host, clusters, VMs) in opsramp portal and remove if those are not required (i.e, if vSAN and VxRail support are not available in the vcenter).

FQAs

  1. What are the steps to be taken when vCenter shows high CPU usage during VMware integration monitoring?

Answer: To resolve high CPU usage on vCenter during VMware integration monitoring:

  • If gateway version is prior to 20.0.0, unassign the vCenter Performance template from the vCenter Resource, if any.
  • 20.0.0 gateway release includes a permanent fix for this issue.