Trajectory Logging for Agent Training
Complete system for recording agent decisions for reinforcement learning training (Atropos/GRPO).
Overview
The trajectory logging system captures every decision an autonomous agent makes, enabling:
- Reinforcement Learning: Train agents on successful strategies
- Decision Analysis: Understand why agents make specific choices
- Performance Optimization: Identify winning vs losing patterns
- Model Training: Generate datasets for Atropos GRPO training
Status: Production Ready - 43 automated tests
Architecture
The system records three critical data points for each agent decision:
Data Captured
1. Provider Data (Context)
- What data the agent accessed
- Why it accessed that data
- When it was accessed
2. LLM Call (Decision)
- System prompt used
- User prompt/query
- Model response
- Reasoning/thinking process
3. Action Result (Outcome)
- What action was taken
- Parameters used
- Success/failure result
- Reward signal
Quick Start
1. Database Setup
Add trajectory tables to your database schema:
# Tables are already included in packages/db/src/schema/training.ts
bun run db:push2. Wrap Your Actions
import {
wrapActionWithLogging,
logLLMCallFromAction,
logProviderFromAction
} from '@babylon/agents'
const BUY_SHARES = wrapActionWithLogging({
name: 'BUY_SHARES',
description: 'Buy prediction market shares',
handler: async (runtime, message, state, options, callback) => {
// 1. Log data access (provider)
const markets = await getMarkets()
logProviderFromAction(state, {
providerName: 'market_data',
data: markets,
purpose: 'Analyze available markets for trading opportunities'
})
// 2. Log LLM decision
const decision = await runtime.useModel({
systemPrompt: 'You are a trading agent...',
userPrompt: `Analyze these markets: ${JSON.stringify(markets)}`
})
logLLMCallFromAction(state, {
model: 'gpt-5.1',
systemPrompt,
userPrompt,
response: decision,
thinking: decision.reasoning
})
// 3. Execute action (result logged automatically)
const result = await executeTrade(decision.params)
return result
}
})3. Compute Rewards
Define how to score agent decisions:
import { defineRewardFunction } from '@babylon/agents'
export const tradingReward = defineRewardFunction({
name: 'trading_performance',
compute: (trajectory) => {
const { result, llmCall } = trajectory
// Reward based on P&L
if (result.realizedPnL > 0) {
return result.realizedPnL / 100 // Scale to 0-1
}
return -0.5 // Penalty for losses
}
})4. Export for Training
import { exportToHuggingFace } from '@babylon/agents'
const dataset = await exportToHuggingFace({
minReward: 0.5,
maxTrajectories: 1000,
format: 'parquet'
})
// Upload to HuggingFace
await dataset.upload('your-org/babylon-trading-trajectories')Testing
Run Comprehensive Tests
cd packages/agents
bun testExpected: 43 tests passing
Test Coverage:
- Database schema validation
- Provider logging
- LLM call logging
- Result logging
- Reward computation
- Export functionality
- Data quality checks
Verify Data Quality
npx tsx scripts/verify-trajectory-data.tsChecks:
- All required fields present
- Data types correct
- Timestamps valid
- Rewards computed
- Export format correct
Data Requirements
Required for Each Decision
Provider Data:
{
providerName: string // e.g., "market_data"
data: any // The actual data accessed
purpose: string // Why this data was needed
timestamp: number
}LLM Call:
{
model: string // e.g., "gpt-5.1"
systemPrompt: string // Agent's system instructions
userPrompt: string // The query/request
response: string // Model's response
thinking?: string // Reasoning process
timestamp: number
}Action Result:
{
action: string // Action name
params: any // Action parameters
result: any // Execution result
success: boolean // Did it work?
timestamp: number
}Training Pipeline Integration
Export to HuggingFace
// Export successful trajectories
const dataset = await exportToHuggingFace({
minReward: 0.7, // Only successful trades
timeRange: '7d', // Last 7 days
format: 'parquet', // Efficient format
includeMetadata: true
})
// Dataset structure:
{
prompt: string, // System + user prompts
response: string, // LLM response
reward: number, // Computed reward
context: object, // Provider data
metadata: object // Timestamps, agent ID, etc.
}Training with Atropos
from training.atropos_trainer import BabylonAtroposTrainer, AtroposTrainingConfig
config = AtroposTrainingConfig(
model_name="Qwen/Qwen2.5-3B-Instruct",
database_url="postgresql://user:pass@host/babylon",
api_url="http://localhost:8000",
vllm_port=9001,
learning_rate=1e-5,
judge_model="gpt-4o-mini"
)
trainer = BabylonAtroposTrainer(config)
trainer.setup()
trainer.train()Best Practices
1. Log Everything
// Good: Log all data access
logProviderFromAction(state, {
providerName: 'balance_check',
data: balance,
purpose: 'Verify sufficient funds before trading'
})
// Bad: Skip logging
const balance = await getBalance()2. Capture Reasoning
// Good: Include thinking process
logLLMCallFromAction(state, {
model,
systemPrompt,
userPrompt,
response: decision,
thinking: decision.reasoning // Important!
})
// Bad: Skip reasoning
logLLMCallFromAction(state, { model, systemPrompt, userPrompt, response })3. Compute Meaningful Rewards
// Good: Reward based on actual performance
const reward = (pnl > 0) ? pnl / initialInvestment : -0.5
// Bad: Binary rewards
const reward = success ? 1 : 0Integration with Babylon
Example: Trading Agent
import { wrapActionWithLogging } from '@babylon/agents'
const BABYLON_TRADE = wrapActionWithLogging({
name: 'BABYLON_TRADE',
handler: async (runtime, message, state, options) => {
// Log market data access
const markets = await runtime.a2aClient.getMarkets()
logProviderFromAction(state, {
providerName: 'babylon_markets',
data: markets,
purpose: 'Analyze prediction markets for trading'
})
// Log trading decision
const decision = await runtime.useModel({
systemPrompt: runtime.character.system,
userPrompt: `Analyze these markets and decide: ${JSON.stringify(markets)}`
})
logLLMCallFromAction(state, {
model: 'gpt-5.1',
systemPrompt: runtime.character.system,
userPrompt,
response: decision,
thinking: decision.reasoning
})
// Execute trade
const result = await runtime.a2aClient.buyShares({
marketId: decision.marketId,
outcome: decision.outcome,
amount: decision.amount
})
// Result logged automatically
// Reward computed based on eventual P&L
return result
}
})Advanced Features
Reward Functions
Define custom reward functions:
export const multiFactorReward = defineRewardFunction({
name: 'multi_factor_trading',
compute: (trajectory) => {
const { result, llmCall } = trajectory
let reward = 0
// P&L component (50%)
if (result.pnl > 0) {
reward += 0.5 * (result.pnl / result.investment)
}
// Speed component (25%)
if (result.executionTime < 1000) {
reward += 0.25
}
// Confidence component (25%)
if (llmCall.thinking?.includes('high confidence')) {
reward += 0.25
}
return Math.min(1, Math.max(-1, reward))
}
})Data Quality Checks
import { validateTrajectory } from '@babylon/agents'
const isValid = validateTrajectory(trajectory)
if (!isValid.valid) {
console.error('Invalid trajectory:', isValid.errors)
// - Missing required fields
// - Invalid data types
// - Timestamp issues
// - Malformed JSON
}Resources
- Plugin Source:
packages/agents/src/plugins/plugin-trajectory-logger/ - Trajectory Recorder:
packages/training/src/training/TrajectoryRecorder.ts - Database Schema:
packages/db/src/schema/training.ts
Next Steps
- Autonomous Agent Guide
- Python RL Training
- ElizaOS Plugin
- Python RL Training - Train agents with reinforcement learning
Last updated on