AI Agent Observability¶
LumiBot’s AI agent runtime is only useful if every run is fully inspectable. The observability system records everything the agent did so you can audit reasoning, validate data integrity, and debug surprising behavior.
Default Logs¶
Every agent run emits a compact summary log line that includes:
Agent name
Mode (backtest or live)
Model used
Cache hit or miss
Tool call count
Warning count
The agent’s summary conclusion
Path to the trace file
The runtime also emits detailed lines for:
Individual tool calls and their arguments
Tool results
Model text output
Observability warnings
Trace Files¶
Each run writes a structured JSON trace that is the source of truth for debugging. The trace records:
The full composed prompt surface (base prompt + system prompt + context)
Every tool call with arguments
Every tool result
The agent’s summary and reasoning
Observability warnings
Cache hit/miss metadata
DuckDB query metrics
Current datetime and timezone at the time of the run
From strategy code, the trace path is available on the result object:
trace_path = (result.payload or {}).get("trace_path")
Replay Cache¶
In backtests, LumiBot caches every agent run. When a subsequent run hits the same combination of prompt, context, model, tool surface, and simulated timestamp, the cached result is returned without making any LLM or MCP calls.
Warm reruns show:
cache_hit=Truein the resultZero model API calls
Zero external MCP calls
Identical outputs to the original run
Replay caching makes agentic backtests:
Deterministic – same inputs always produce same outputs
Fast – warm reruns complete in seconds instead of minutes
Cost-effective – no duplicate LLM or MCP API charges
Per-Run Summary Artifacts¶
LumiBot writes machine-readable artifacts for agent runs:
agent_run_summaries.jsonl– one JSON line per agent run with summary, warnings, and metadataagent_traces.zip– packaged trace files for the full backtest
These artifacts support downstream tooling, dashboards, and run history display without reparsing raw logs.
Observability Warnings¶
Warnings are diagnostics, not hard enforcement rules. They flag suspicious conditions so you can investigate:
No tools called – the agent made a decision without consulting any tools
Tool error – a tool returned an error
Future-dated data – a tool result references data published after the simulated backtest time
Unsupported order – an order was submitted without visible supporting evidence in the trace
Warnings appear in:
The summary log line
The structured JSON trace
result.warning_messageson the result object
Warnings do not automatically invalidate a run, but they are a strong signal that the run should be reviewed.
Recommended Debugging Workflow¶
When a run looks wrong:
Read the summary line in the logs. Check the agent name, cache status, tool count, and warning count.
Inspect tool calls and results in the logs. Did the agent call the right tools? Did the tools return useful data?
Open the trace JSON. This is the full record of everything the agent saw and did.
Check cache status. Was this a fresh run or a replay? If it was a replay, the issue is in the original run, not this one.
Review warnings. Are there future-dated data warnings? Missing tool usage? Unsupported orders?
Compare the summary to the outcome. Does the agent’s stated reasoning match the actual trade or no-trade decision?