Agent Observability

Introduction

The Agent Observability dashboard provides centralized visibility into AI agent operations, LLM usage, cost metrics, and performance indicators. It enables administrators and BYOLLM (Bring Your Own LLM) customers to monitor, optimize, and control OpsPilot usage.

Dashboard Sections

Executive Summary

Provides a high-level overview of key performance indicators (KPIs) for overall OpsPilot activity.

Metric	Description
Total Agent Requests	Total number of agent invocations
Total LLM Calls	Number of LLM API calls made by all agents
Success Rate	Percentage of agent requests completed successfully
Total Tokens Used	Total token consumption (input and output combined)
Avg Tokens per Request	Average token usage per agent request

Agent Usage and Efficiency

Displays usage distribution and efficiency metrics by agent type (for example, ALERT, AUTOMATION, PRC).

Metric	Description
Requests per Agent	Number of invocations per agent type
LLM Calls per Agent	Number of LLM API calls per agent
Tokens per Agent	Total tokens consumed by each agent
Avg Tokens / Request per Agent	Average token usage per request for each agent

LLM Provider and Model Insights

Displays usage patterns across LLM providers and models.

Metric	Description
LLM Calls by Provider	Distribution of LLM calls by provider (for example, GCP, Azure, AWS)
Tokens by Provider	Token consumption per provider
LLM Calls by Model	Number of calls per model
Tokens by Model	Token consumption per model

User Adoption and Behavior

Tracks individual user activity and efficiency.

Metric	Description
Top Users by Tokens	Users ranked by total token consumption
Top Users by LLM Calls	Users ranked by number of LLM API calls
User Success %	Success rate per user
Avg Tokens / Request (User)	Average token usage per request per user

Enhanced Metrics

The following sections provide additional visibility into cost, performance, reliability, and optimization.

Cost and Billing

Tracks LLM usage costs for accurate attribution and optimization.

Metric	Description
Total Cost (USD)	Total LLM cost per conversation
Cost Per Turn	Average cost per agent reasoning turn
Input Token Cost	Cost of input tokens
Output Token Cost	Cost of output tokens
Reasoning Cost	Cost of reasoning or extended processing tokens

Dashboard Widgets:

Cost Trend
Cost Breakdown
Top Conversations by Cost
Cost by Model

Cache Efficiency

Measures the effectiveness of prompt caching in reducing token usage and cost.

Metric	Description
Cache Read Tokens	Tokens served from cache
Cache Creation Tokens	Tokens written to cache
Cache Hit Rate (%)	Percentage of tokens served from cache
Cache Savings (USD)	Cost savings achieved through caching

Dashboard Widgets:

Cache Hit Rate
Cache Savings Counter
Cache vs Uncached Token Trend
Non-cached Conversations

Performance and Latency

Monitors response time, system performance, and SLA compliance.

Metric	Description
Total Latency	End-to-end duration of a conversation
Turn Count	Number of reasoning turns per conversation
Avg Turn Duration	Average duration per turn
Max Turn Duration	Longest turn duration
LLM Call Count	Number of LLM calls per conversation
Avg LLM Latency	Average LLM response time

Dashboard Widgets:

Latency Percentiles (P50, P95, P99)
Turn Distribution
Slow Conversations
Turn Duration Trend

Reliability and Error Tracking

Provides insights into failures, errors, and system reliability.

Metric	Description
Request Status	Outcome of requests (success or error)
Tool Calls Total	Total tool executions
Tool Errors Total	Number of failed tool executions
Tool Error Rate (%)	Percentage of failures in tool execution
Consecutive Errors	Maximum number of consecutive errors
Stop Reason	Reason for agent termination

Dashboard Widgets:

Success Rate
Error Rate Trend
Errors by Tool
Stop Reason Distribution
Failed Conversations

Budget Management

Tracks resource usage against configured limits.

Metric	Description
Max Token Budget	Configured token limit
Tokens Used	Actual token usage
Budget Utilization (%)	Percentage of budget consumed
Budget Exceeded	Indicates if limit is reached
Max Turn Limit	Maximum allowed turns
Turn Utilization (%)	Percentage of turns used

Dashboard Widgets:

Budget Utilization Heatmap
Budget Exceeded Alerts
Budget Efficiency

Context Health

Monitors context usage and identifies pruning requirements.

Metric	Description
Context Compressions	Number of context pruning events
Messages Compressed	Total messages removed
Compression Rate (%)	Percentage of conversations requiring pruning
Avg Messages Per Turn	Rate of context growth

Dashboard Widgets:

Compression Trend
Context Distribution
Long Conversations

Tool Analytics

Evaluates usage and performance of integrated tools.

Metric	Description
Tool Call Count	Number of executions per tool
Tool Duration	Average execution time
Tool Success Rate (%)	Success rate per tool
Sandbox Tool Calls	Tool executions in sandbox
Sandbox Latency	Execution overhead in sandbox

Dashboard Widgets:

Tool Usage
Tool Performance
Sandbox Comparison
Failing Tools Alerts

Filtering and Time Range

The dashboard supports filtering to refine data analysis.

Filter	Description
Time Range	Predefined or custom time ranges
Agent Type	Filter by agent type
Model	Filter by LLM model
Provider	Filter by LLM provider
User	Filter by user
Status	Filter by request outcome

Metrics Infrastructure

All metrics are collected using OpenTelemetry instrumentation and processed through the following pipeline:

Agent Runtime -> POST /v2/tenants/{tenantId}/aiAgentMetrics -> Kafka -> Cortex -> Dashboard (PromQL)

Common Labels

Metrics include standard labels for filtering and grouping:

Label	Example Values
tenant_id	client1234
conversation_id	conv-abc-123
model_name	claude-sonnet-4-6, gemini-2.5-flash, gpt-4o
model_provider	gcp, azure, aws
status	success, error
stop_reason	agent_decided, max_turns, token_budget, max_errors
user_id	user-123
partner_id	partner-xyz