📈
1,247
Total Requests
1180ms
Avg Latency
🔤
89.5K
Total Tokens
⚠️
2.1%
Error Rate
💰
$0.45
Est. Cost (Session)

Recent Traces

● Live
Time Endpoint Query Tokens Latency Status
4:03:52 PM agent Optimize route Vienna to Milan 390 1.24s success
4:02:52 PM multi-agent Ship 10 tons from Munich... 1899 5.32s success
4:01:52 PM knowledge What are driving time limits? 501 980ms success
4:00:52 PM optimize/route Route optimization API call 321 890ms success
3:59:52 PM agent Invalid cargo type test 499 1.45s error

Endpoints

/api/agent 423 calls ~1.2s
/api/multi-agent 187 calls ~5.3s
/api/knowledge 312 calls ~0.9s
/api/optimize/* 325 calls ~0.8s

Model Usage

llama-3.1-8b-instruct
100%

Production would include model comparison (8B vs 70B, Mistral, etc.) and automatic fallback logic.

💡 LLMOps in Practice

This dashboard shows the observability patterns used in production LLM systems (similar to LangFuse/LangSmith). Key capabilities: request tracing, latency monitoring, token usage tracking, cost estimation, and quality scoring (burstiness, perplexity). Essential for operating AI at scale.