The following figure represents the basic system components or building blocks, which cooperate to implement the ITOM feature set:

Documentation Information Model

The arrows indicate a generalized workflow. Resources are discovered and resource management is updated accordingly. Monitoring, using managed-resource information, scans or waits for resource fault/recovery alerts, and forwards any alert condition for alert correlation. The alert is resolved to a context-sensitive event and alert management applies the management logic to remediate the alert condition, using automation to take the appropriate response.

Integrations are provided for the platform layer and for discovery and monitoring and event management, in the solution layer, to provide interactivity with external devices and services, extending the functionality, compatibility, and scalability of the core platform.

Platform layer

The platform layer implements the core functionality on which the higher-level functions of the solution layer are built. In general, configuration and policy are set in the platform layer and govern the operation of the solution layer.

The following integrations are provided to support platform functionality:

  • Password Management
  • SSO
  • Duo Security
  • Stream exports

In addition to the elements supporting enterprise resource management,

  • resource management
  • dashboards
  • ticketing
  • reporting

the platform layer supports a multi-tenancy model for managing system accounts and users, which is a key construct for scoping the platform. Tenancy partitions the platform in a hierarchical arrangement of a partner entity with multiple client entities, each client hosting multiple users. Authorizations, permissions, and roles provide complete, user-based management functionality, shown as Users, Groups, and RBAC in the figure.

Agent and gateway components enable distributed operation in a cloud environment. Coresident with the service or network resource they monitor, they aggregate and forward data from the managed devices to the cloud. They can also be configured to run automated scripts and enter other housekeeping functionality.

Finally, the API provides a full-featured REST interface to automate management operations at scale. The following figure generalizes the API interface:

API

Solution layer

The solution layer, building on the services of the platform later, implements the functionality needed for ITOM and IAOps. It consists of hybrid discovery and monitoring, event and incident management, and remediation and automation.

Integrations are provided to support the functionality of each of these areas:

  • discovery and monitoring integrations:

    • Public cloud
    • Cloud native
    • Compute
    • Data exports
    • Network
    • Storage
  • event management integrations:

    • Collaboration
    • Configuration automation
    • Custom integration
    • Patch management
    • Third-party events
    • Ticketing and ITSM

Hybrid discovery and monitoring

A broad range of IT resources across datacenter, public cloud, and cloud native environments can be discovered and monitored with agent-based and agentless monitors. These include:

  • Datacenter applications, URLs, containers, servers, and network resources.
  • Public cloud environments of compute instances, databases, load balancers, and PaaS services.
  • Cloud native environments with containers and orchestrators.

Built-in monitors are provided that capture availability and performance metrics and observer optimal threshold limits for supported resources. You can extend the platform to monitor any kind of IT resource by writing custom monitor scripts.

Event and incident management

Events represent business-impacting issues that require a response. Event and incident management uses escalation policies to aggregate, interpret, and act on events detected by monitors, resource diagnostics, and third-party integrations.

Using service maps, you can visualize the relationship between monitored resources and assess business and user impact based on resource health.

Event interpretation and response can be automated. Automation correlates and suppresses alerts, notifies users, and creates incident tickets for alerts that need operator intervention.

Remediation and automation

Event remediation and automation can also be automated by composing workflows to handle events. This includes SMS, voice, and email notification. Remote SSH is also supported for alert resolution.