Introduction
The Agent Observability dashboard provides centralized visibility into AI agent operations, LLM usage, cost metrics, and performance indicators. It enables administrators and BYOLLM (Bring Your Own LLM) customers to monitor, optimize, and control OpsPilot usage.
Dashboard Sections
Executive Summary
Provides a high-level overview of key performance indicators (KPIs) for overall OpsPilot activity.
| Metric | Description |
|---|---|
| Total Agent Requests | Total number of agent invocations |
| Total LLM Calls | Number of LLM API calls made by all agents |
| Success Rate | Percentage of agent requests completed successfully |
| Total Tokens Used | Total token consumption (input and output combined) |
| Avg Tokens per Request | Average token usage per agent request |
Agent Usage and Efficiency
Displays usage distribution and efficiency metrics by agent type (for example, ALERT, AUTOMATION, PRC).
| Metric | Description |
|---|---|
| Requests per Agent | Number of invocations per agent type |
| LLM Calls per Agent | Number of LLM API calls per agent |
| Tokens per Agent | Total tokens consumed by each agent |
| Avg Tokens / Request per Agent | Average token usage per request for each agent |
LLM Provider and Model Insights
Displays usage patterns across LLM providers and models.
| Metric | Description |
|---|---|
| LLM Calls by Provider | Distribution of LLM calls by provider (for example, GCP, Azure, AWS) |
| Tokens by Provider | Token consumption per provider |
| LLM Calls by Model | Number of calls per model |
| Tokens by Model | Token consumption per model |
User Adoption and Behavior
Tracks individual user activity and efficiency.
| Metric | Description |
|---|---|
| Top Users by Tokens | Users ranked by total token consumption |
| Top Users by LLM Calls | Users ranked by number of LLM API calls |
| User Success % | Success rate per user |
| Avg Tokens / Request (User) | Average token usage per request per user |
Enhanced Metrics
The following sections provide additional visibility into cost, performance, reliability, and optimization.
Cost and Billing
Tracks LLM usage costs for accurate attribution and optimization.
| Metric | Description |
|---|---|
| Total Cost (USD) | Total LLM cost per conversation |
| Cost Per Turn | Average cost per agent reasoning turn |
| Input Token Cost | Cost of input tokens |
| Output Token Cost | Cost of output tokens |
| Reasoning Cost | Cost of reasoning or extended processing tokens |
Dashboard Widgets:
- Cost Trend
- Cost Breakdown
- Top Conversations by Cost
- Cost by Model
Cache Efficiency
Measures the effectiveness of prompt caching in reducing token usage and cost.
| Metric | Description |
|---|---|
| Cache Read Tokens | Tokens served from cache |
| Cache Creation Tokens | Tokens written to cache |
| Cache Hit Rate (%) | Percentage of tokens served from cache |
| Cache Savings (USD) | Cost savings achieved through caching |
Dashboard Widgets:
- Cache Hit Rate
- Cache Savings Counter
- Cache vs Uncached Token Trend
- Non-cached Conversations
Performance and Latency
Monitors response time, system performance, and SLA compliance.
| Metric | Description |
|---|---|
| Total Latency | End-to-end duration of a conversation |
| Turn Count | Number of reasoning turns per conversation |
| Avg Turn Duration | Average duration per turn |
| Max Turn Duration | Longest turn duration |
| LLM Call Count | Number of LLM calls per conversation |
| Avg LLM Latency | Average LLM response time |
Dashboard Widgets:
- Latency Percentiles (P50, P95, P99)
- Turn Distribution
- Slow Conversations
- Turn Duration Trend
Reliability and Error Tracking
Provides insights into failures, errors, and system reliability.
| Metric | Description |
|---|---|
| Request Status | Outcome of requests (success or error) |
| Tool Calls Total | Total tool executions |
| Tool Errors Total | Number of failed tool executions |
| Tool Error Rate (%) | Percentage of failures in tool execution |
| Consecutive Errors | Maximum number of consecutive errors |
| Stop Reason | Reason for agent termination |
Dashboard Widgets:
- Success Rate
- Error Rate Trend
- Errors by Tool
- Stop Reason Distribution
- Failed Conversations
Budget Management
Tracks resource usage against configured limits.
| Metric | Description |
|---|---|
| Max Token Budget | Configured token limit |
| Tokens Used | Actual token usage |
| Budget Utilization (%) | Percentage of budget consumed |
| Budget Exceeded | Indicates if limit is reached |
| Max Turn Limit | Maximum allowed turns |
| Turn Utilization (%) | Percentage of turns used |
Dashboard Widgets:
- Budget Utilization Heatmap
- Budget Exceeded Alerts
- Budget Efficiency
Context Health
Monitors context usage and identifies pruning requirements.
| Metric | Description |
|---|---|
| Context Compressions | Number of context pruning events |
| Messages Compressed | Total messages removed |
| Compression Rate (%) | Percentage of conversations requiring pruning |
| Avg Messages Per Turn | Rate of context growth |
Dashboard Widgets:
- Compression Trend
- Context Distribution
- Long Conversations
Tool Analytics
Evaluates usage and performance of integrated tools.
| Metric | Description |
|---|---|
| Tool Call Count | Number of executions per tool |
| Tool Duration | Average execution time |
| Tool Success Rate (%) | Success rate per tool |
| Sandbox Tool Calls | Tool executions in sandbox |
| Sandbox Latency | Execution overhead in sandbox |
Dashboard Widgets:
- Tool Usage
- Tool Performance
- Sandbox Comparison
- Failing Tools Alerts
Filtering and Time Range
The dashboard supports filtering to refine data analysis.
| Filter | Description |
|---|---|
| Time Range | Predefined or custom time ranges |
| Agent Type | Filter by agent type |
| Model | Filter by LLM model |
| Provider | Filter by LLM provider |
| User | Filter by user |
| Status | Filter by request outcome |
Metrics Infrastructure
All metrics are collected using OpenTelemetry instrumentation and processed through the following pipeline:
Agent Runtime -> POST /v2/tenants/{tenantId}/aiAgentMetrics -> Kafka -> Cortex -> Dashboard (PromQL)
Common Labels
Metrics include standard labels for filtering and grouping:
| Label | Example Values |
|---|---|
| tenant_id | client1234 |
| conversation_id | conv-abc-123 |
| model_name | claude-sonnet-4-6, gemini-2.5-flash, gpt-4o |
| model_provider | gcp, azure, aws |
| status | success, error |
| stop_reason | agent_decided, max_turns, token_budget, max_errors |
| user_id | user-123 |
| partner_id | partner-xyz |