Risks, Limitations & Assumptions

  • Application can handle Critical/Recovery failure notifications for below two cases when user enables App Failure Notifications in configuration:
    • Connectivity Exception (ConnectTimeoutException, HttpHostConnectException, UnknownHostException).
    • Authentication Exception (UnauthorizedException).
  • Application will not send any duplicate/repeat failure alert notification until the already existed critical alert is recovered.
  • Application cannot control monitoring pause/resume actions based on above alerts.
  • Metrics can be used to monitor NVIDIA Bright Cluster Manager resources and can generate alerts based on the threshold values.
  • The Template Applied Time will only be displayed if the collector profile (Classic and NextGen Gateway) is version 18.1.0 or higher.
  • This application supports both Classic Gateway and NextGen Gateway.
  • Application is not returning any data if any SSH connectivity issues (based on monitoring and discovery frequency our App will try to establish SSH connections to the target device).
  • Application will work only with SSH credentials, with ssh port 22 in the open state.
  • Virtual Nodes discovery and Monitoring will not be supported.
  • To discover and monitor the NVIDIA BCM Linux Server and its metrics, sshpass should be installed on the BCM Cluster. If the sshpass package is not available, ensure that the credentials of both the BCM cluster and the Linux Server are the same.