The HPE Performance Cluster Manager (HPCM) integration provides centralized monitoring, discovery, and management of large-scale HPC (High-Performance Computing) environments. Designed for scientific, AI, ML, and enterprise HPC workloads, HPCM helps IT teams efficiently monitor thousands of nodes with high availability and performance.

With this integration, you can:

  • Automatically discover hardware, nodes, network interfaces, and firmware details.
  • Monitor critical metrics, track performance, and generate real-time alerts.
  • Organize resources hierarchically and visualize the cluster structure within OpsRamp.

Here is the highlevel architecture of HPE Performance Cluster Manager (HPCM):


Supported Target Version

HPE Performance Cluster Manager (HPCM) v1.12

Resource Hierarchy of HPCM

  • HPE Performance Cluster Manager

    • HPC Nodes
      • HPC NICs

Key Use Cases

Discovery Use cases

Gain visibility into all HPCM components with relationship mapping:

  • Automatically detects compute, management, and storage nodes.
  • Identifies network interfaces, switches, and fabric topology.
  • Discovers hardware (CPU, memory, GPU, disk), BIOS, and firmware versions.
  • Groups system components for streamlined management.

Monitoring Use cases

Track system health and performance with actionable insights:

  • Collects time-series metric data to detect anomalies and threshold breaches.
  • Triggers alerts to support fast issue resolution and reduce downtime.
  • Monitors job scheduling time, status, and related performance indicators.

Version History

Application VersionBug fixes / Enhancements
1.1.0This implementation introduces third-party CI alert mapping and OpsQL-based enhancements.
Configure alert mappings to target CI systems through the application configuration. After configuration, the system automatically forwards alerts to the corresponding third-party platforms and maps them to the specified CIs, ensuring consistent integration and efficient alert management.
Previously, resource filters in the app configuration required manual entry of resource core and custom attributes. With this enhancement, the configuration is moved to OpsQL-based filtering, where users can see the keys auto-populated as needed.
1.0.1Added metricGroup and Units as metric labels.
1.0.0Initial Version with discovery and monitoring features.