Introduction

The Agent Observability dashboard provides centralized visibility into AI agent operations, LLM usage, cost metrics, and performance indicators. It enables administrators and BYOLLM (Bring Your Own LLM) customers to monitor, optimize, and control OpsPilot usage.


Dashboard Sections

Executive Summary

Provides a high-level overview of key performance indicators (KPIs) for overall OpsPilot activity.

MetricDescription
Total Agent RequestsTotal number of agent invocations
Total LLM CallsNumber of LLM API calls made by all agents
Success RatePercentage of agent requests completed successfully
Total Tokens UsedTotal token consumption (input and output combined)
Avg Tokens per RequestAverage token usage per agent request

Agent Usage and Efficiency

Displays usage distribution and efficiency metrics by agent type (for example, ALERT, AUTOMATION, PRC).

MetricDescription
Requests per AgentNumber of invocations per agent type
LLM Calls per AgentNumber of LLM API calls per agent
Tokens per AgentTotal tokens consumed by each agent
Avg Tokens / Request per AgentAverage token usage per request for each agent

LLM Provider and Model Insights

Displays usage patterns across LLM providers and models.

MetricDescription
LLM Calls by ProviderDistribution of LLM calls by provider (for example, GCP, Azure, AWS)
Tokens by ProviderToken consumption per provider
LLM Calls by ModelNumber of calls per model
Tokens by ModelToken consumption per model

User Adoption and Behavior

Tracks individual user activity and efficiency.

MetricDescription
Top Users by TokensUsers ranked by total token consumption
Top Users by LLM CallsUsers ranked by number of LLM API calls
User Success %Success rate per user
Avg Tokens / Request (User)Average token usage per request per user

Enhanced Metrics

The following sections provide additional visibility into cost, performance, reliability, and optimization.


Cost and Billing

Tracks LLM usage costs for accurate attribution and optimization.

MetricDescription
Total Cost (USD)Total LLM cost per conversation
Cost Per TurnAverage cost per agent reasoning turn
Input Token CostCost of input tokens
Output Token CostCost of output tokens
Reasoning CostCost of reasoning or extended processing tokens

Dashboard Widgets:

  • Cost Trend
  • Cost Breakdown
  • Top Conversations by Cost
  • Cost by Model

Cache Efficiency

Measures the effectiveness of prompt caching in reducing token usage and cost.

MetricDescription
Cache Read TokensTokens served from cache
Cache Creation TokensTokens written to cache
Cache Hit Rate (%)Percentage of tokens served from cache
Cache Savings (USD)Cost savings achieved through caching

Dashboard Widgets:

  • Cache Hit Rate
  • Cache Savings Counter
  • Cache vs Uncached Token Trend
  • Non-cached Conversations

Performance and Latency

Monitors response time, system performance, and SLA compliance.

MetricDescription
Total LatencyEnd-to-end duration of a conversation
Turn CountNumber of reasoning turns per conversation
Avg Turn DurationAverage duration per turn
Max Turn DurationLongest turn duration
LLM Call CountNumber of LLM calls per conversation
Avg LLM LatencyAverage LLM response time

Dashboard Widgets:

  • Latency Percentiles (P50, P95, P99)
  • Turn Distribution
  • Slow Conversations
  • Turn Duration Trend

Reliability and Error Tracking

Provides insights into failures, errors, and system reliability.

MetricDescription
Request StatusOutcome of requests (success or error)
Tool Calls TotalTotal tool executions
Tool Errors TotalNumber of failed tool executions
Tool Error Rate (%)Percentage of failures in tool execution
Consecutive ErrorsMaximum number of consecutive errors
Stop ReasonReason for agent termination

Dashboard Widgets:

  • Success Rate
  • Error Rate Trend
  • Errors by Tool
  • Stop Reason Distribution
  • Failed Conversations

Budget Management

Tracks resource usage against configured limits.

MetricDescription
Max Token BudgetConfigured token limit
Tokens UsedActual token usage
Budget Utilization (%)Percentage of budget consumed
Budget ExceededIndicates if limit is reached
Max Turn LimitMaximum allowed turns
Turn Utilization (%)Percentage of turns used

Dashboard Widgets:

  • Budget Utilization Heatmap
  • Budget Exceeded Alerts
  • Budget Efficiency

Context Health

Monitors context usage and identifies pruning requirements.

MetricDescription
Context CompressionsNumber of context pruning events
Messages CompressedTotal messages removed
Compression Rate (%)Percentage of conversations requiring pruning
Avg Messages Per TurnRate of context growth

Dashboard Widgets:

  • Compression Trend
  • Context Distribution
  • Long Conversations

Tool Analytics

Evaluates usage and performance of integrated tools.

MetricDescription
Tool Call CountNumber of executions per tool
Tool DurationAverage execution time
Tool Success Rate (%)Success rate per tool
Sandbox Tool CallsTool executions in sandbox
Sandbox LatencyExecution overhead in sandbox

Dashboard Widgets:

  • Tool Usage
  • Tool Performance
  • Sandbox Comparison
  • Failing Tools Alerts

Filtering and Time Range

The dashboard supports filtering to refine data analysis.

FilterDescription
Time RangePredefined or custom time ranges
Agent TypeFilter by agent type
ModelFilter by LLM model
ProviderFilter by LLM provider
UserFilter by user
StatusFilter by request outcome

Metrics Infrastructure

All metrics are collected using OpenTelemetry instrumentation and processed through the following pipeline:

Agent Runtime -> POST /v2/tenants/{tenantId}/aiAgentMetrics -> Kafka -> Cortex -> Dashboard (PromQL)

Common Labels

Metrics include standard labels for filtering and grouping:

LabelExample Values
tenant_idclient1234
conversation_idconv-abc-123
model_nameclaude-sonnet-4-6, gemini-2.5-flash, gpt-4o
model_providergcp, azure, aws
statussuccess, error
stop_reasonagent_decided, max_turns, token_budget, max_errors
user_iduser-123
partner_idpartner-xyz