📈
1,247
Total Requests
1180ms
Avg Latency
🔤
89.5K
Total Tokens
⚠️
2.1%
Error Rate
💰
$0.45
Est. Cost (Session)

Recent Traces

● Live
Time Endpoint Query Tokens Latency Status
5:09:54 PM agent Optimize route Vienna to Milan 390 1.24s success
5:08:54 PM multi-agent Ship 10 tons from Munich... 1899 5.32s success
5:07:54 PM knowledge What are driving time limits? 501 980ms success
5:06:54 PM optimize/route Route optimization API call 321 890ms success
5:05:54 PM agent Invalid cargo type test 499 1.45s error

Endpoints

/api/agent 423 calls ~1.2s
/api/multi-agent 187 calls ~5.3s
/api/knowledge 312 calls ~0.9s
/api/optimize/* 325 calls ~0.8s

Model Usage

llama-3.1-8b-instruct
100%

Production would include model comparison (8B vs 70B, Mistral, etc.) and automatic fallback logic.

💡 LLMOps in Practice

This dashboard shows the observability patterns used in production LLM systems (similar to LangFuse/LangSmith). Key capabilities: request tracing, latency monitoring, token usage tracking, cost estimation, and quality scoring (burstiness, perplexity). Essential for operating AI at scale.