Access controls provide a mechanism for authorizing user access to the platform and involves:
Authenticating users using one of the following authentication options:
- Native user management and authentication
- Single Sign-On (SSO)
- Two-factor authentication
Role-based Access Control (RBAC) to grant users permissions based on their assigned role.
Using RBAC, you can control which actions an authenticated user is permitted, restricting access to:
- Resources a user can manage, such as only manage network resources.
- Credentials to which a user has access, such as non-administrator credentials on servers.
- The actions a user is permitted to take, such as accessing remote consoles.
- The locations, or domains, from which a user is permitted access to the platform.
Agents and gateways
Agents and gateway are distributed platform components that discover and monitor infrastructure resources. Agents monitor servers and applications, and gateways monitor non-server devices, such as network and storage devices.
An agent is an executable application that runs on managed resources within both on-premise and cloud infrastructures.
A gateway is a virtual appliance that provides secure communication, non-server resource monitoring, and limited data storage in the event of connectivity failure.
You can use automation to automatically act on resource faults, remediating issues in response to events, or performing routine maintenance tasks. There are two automation models:
|Automate discrete tasks||Use this model for a single task that needs to be executed on multiple servers.|
|Automate a sequence of tasks||Use this model to execute a sequence of tasks across multiple resources. This model is called a process automation workflow.|
An up/down state indicates resource availability for providing the prescribed service. Evaluating metrics or using a simple acknowledgment from a resource can be used to determine the up/down state of the resource.
Alerts that can be inferred to be due to the same cause are automatically grouped into similar types and two types of alert correlation are performed:
|Deduplication||Repeated alerting occurs for an alert that is currently unresolved, such as network devices sending SNMP traps for as long as an issue persists. Repeated alerts are deduplicated.|
|Inferencing||Different alerts originating from different IT resources but it can infer the alerts are likely due to the same cause.|
A dashboard is a collection of widgets that provide visualizations of collected metrics.
Partner-scoped dashboards are visible only to users defined for the partner. Client-scoped dashboards are only visible to users who are client members.
Discovery is the process of finding resources deployed in the enterprise. Resources need to be discovered before they can be monitored and metrics collected. When discovering resources, a model that includes all resources is dynamically built and is used to interpret and present the state of the environment.
Events are activities of operational significance that occur on a monitored resource. Examples of events include:
- Hardware failures
- Server CPU utilization thresholds exceeded
- Application failures
- Configuration change
The following mechanisms are used to detect events:
- Native instrumentation
- Third-party reporting by integrated third-party monitors
The goal of event management is to minimize the time spent responding to an event. The following event management lifecycle standardizes and automates the efficient handling of events:
- First Response
The initial alert response can be governed by:
- Inferred seasonal patterns, so the alert might be automatically suppressed if it remains open past a historical norm.
- Learning algorithms, which can be trained to suppress alerts that match specific patterns.
Metrics can be evaluated against threshold limits. Two types of thresholds are supported. A static threshold is a fixed value that represents a fault condition when exceeded. A change-based threshold is a value computed automatically that measures unexpected changes in the threshold value. Change-based thresholds are more applicable to metrics where a static value is difficult to determine.
The goal of monitoring is to assess the availability and performance of managed resources. This is done by collecting, storing, and evaluating resource metrics.
Resource performance is the measure of whether the resource is operating within user-defined limits. Fault conditions such as exceeding predefined thresholds can indicate performance issues.
Service maps organize resources into a hierarchical structure. This makes it possible to associate resource health with the level of user and business impact.
Tenancy divides the enterprise into independent management domains, called tenants, where each tenant is a logical container of managed resources. Dashboards, management policies, and integrations are scoped to a tenant.
The tenancy model defines two core constructs:
- A partner is a master tenant and is associated with your account.
- A client is a partner sub-tenant. Different management policies can be applied to different clients.
Partners and clients can each have separate sets of user accounts and a user account can be part of one and only partner or client tenant.
User privileges within a tenant can be specified using the following RBAC criteria:
|User||An account within a tenant.|
|User Group||A group of users.|
|Permission||Authorization controls limiting user access and activities.|
|Role||An association of a user or user group with permissions against managed resources. A user or user group can be permitted specific actions on specific resources.|
A topology map is automatically built from relationships determined during discovery. Each node in a topology map represents a managed resource and an edge between nodes represents the type of connection between those resources. With a topology map, you can visualize and explore your infrastructure, drilling down to an increasingly greater level of detail. Topology maps can also be used to model the impact of planned changes.