HuggingFace Integration
Complete dataset publishing system for Babylon game data and agent trajectories.
Overview
Dataset: elizaos/babylon-game-data
Updated: Daily at 2 AM UTC via GitHub Actions
Contains:
- Agent trajectories (complete gameplay with decisions + environment)
- Benchmark scenarios (game simulations with ground truth)
- Model performance results
- Organized by month for easy access
Quick Start
Download Dataset
from datasets import load_dataset
# Load complete dataset
dataset = load_dataset("elizaos/babylon-game-data")
# Access trajectories
trajectories = dataset['train']Run Offline Simulation
# Download dataset
huggingface-cli download elizaos/babylon-game-data
# Run faster-than-real-time simulation
npx tsx scripts/run-offline-simulation.ts --month=2025-11 --agent=my-agentSpeed: 100-1000x faster than real-time! ⚡
Dataset Structure
elizaos/babylon-game-data/
├── README.md - Dataset documentation
├── index.json - Metadata
├── summary.json - Statistics
├── trajectories.jsonl - All agent trajectories (up to 1,000)
├── benchmarks-metadata.json - Benchmark file info
└── monthly-data/
├── 2025-10.json - October: worlds + trajectories + benchmarks
├── 2025-11.json - November: worlds + trajectories + benchmarks
└── YYYY-MM.json - Monthly aggregated dataTrajectory Format
Each trajectory includes:
{
"trajectoryId": "...",
"agentId": "...",
"month": "2025-11",
"scenario": "trading-scenario",
"steps": [
{
"stepNumber": 1,
"environmentState": {
"agentBalance": 10000,
"agentPnL": 0,
"openPositions": 0,
"activeMarkets": 15
},
"observation": {
"markets": [...],
"prices": {...},
"feed": [...]
},
"llm_calls": [
{
"model": "babylon-agent-v1",
"user_prompt": "Analyze market...",
"response": "I should buy...",
"reasoning": "Based on momentum..."
}
],
"action": {
"type": "BUY_SHARES",
"parameters": { "marketId": "...", "amount": 100 },
"success": true,
"result": { "pnl": 50 }
},
"reward": 50
}
],
"totalReward": 1500,
"finalPnL": 1500,
"metrics": {
"tradesExecuted": 15,
"postsCreated": 5
}
}Automation
GitHub Actions (Automatic)
Workflow: .github/workflows/daily-dataset-upload.yml
Schedule: Daily at 2 AM UTC (0 2 * * *)
Process:
- Collect game data from database (memory-safe batching)
- Organize by month
- Upload to HuggingFace
- Verify upload
- Save artifacts
Setup:
Add GitHub secrets:
HUGGING_FACE_TOKEN = hf_your_token_here
DATABASE_URL = your_postgres_connectionThen push to GitHub - runs automatically!
Manual Upload
# Collect data
npm run hf:collect
# Upload
npm run hf:upload
# Verify
npm run hf:verifyOffline Simulation
Download the dataset and run simulations locally (no API calls needed):
# Download
huggingface-cli download elizaos/babylon-game-data
# Run simulation (100-1000x faster than real-time)
npm run hf:offline -- --month=2025-11 --agent=my-agent
# Or with specific data file
npx tsx scripts/run-offline-simulation.ts \
--data=monthly-data/2025-11.json \
--agent=my-agent \
--fast-forwardFeatures:
- No network required
- Deterministic replay
- Perfect for testing
- Faster-than-real-time mode
Memory Safety
All scripts use memory-safe collection:
- Batch Processing: 100 items at a time
- Hard Limits: Max 1,000 trajectories, 500 benchmarks
- Streaming: JSONL format (line-by-line processing)
- Per-Month Files: Separate files, no giant JSON
- Forced GC: Release memory between batches
Result: No OOM crashes ✅
Data Quality
What’s Included
Agent Trajectories:
- ✅ Complete decision sequence
- ✅ LLM calls (prompts + responses)
- ✅ Environment state at each step
- ✅ Actions taken
- ✅ Outcomes and rewards
- ✅ Ground truth (for training)
Benchmark Data:
- ✅ Complete game scenarios
- ✅ Ground truth outcomes
- ✅ Optimal actions
- ✅ Market dynamics
- ✅ Tick-by-tick progression
Game Worlds:
- ✅ Prediction market questions
- ✅ Events and timelines
- ✅ NPC interactions
- ✅ Feed posts
- ✅ Outcomes
npm Commands
npm run hf:collect # Collect game data (memory-safe)
npm run hf:upload # Upload to HuggingFace
npm run hf:verify # Verify upload succeeded
npm run hf:offline # Run offline simulation
npm run hf:test-flow # Test complete flowTraining Pipeline Integration
The HuggingFace dataset integrates with the Python training pipeline:
# Option 1: Train from database (default)
python src/training/babylon_trainer.py
# Option 2: Train from HuggingFace (for reproducibility)
from datasets import load_dataset
dataset = load_dataset("elizaos/babylon-game-data")
trajectories = dataset['train']
# Use for trainingScripts
collect-game-data-for-hf.ts- Collect data (memory-safe batching)upload-to-huggingface.ts- Upload to HuggingFaceverify-hf-upload.ts- Verify uploadrun-offline-simulation.ts- Offline simulatorprepare-real-dataset-for-hf.ts- Verify real data before upload
Architecture
Gameplay → TrajectoryRecorder → Database
↓
GitHub Actions (Daily 2 AM UTC)
↓
Collect (memory-safe batching)
↓
Organize by month
↓
Upload to HuggingFace
↓
elizaos/babylon-game-data (PUBLIC)
↓
Download & Use OfflineWhy GitHub Actions (Not Vercel)
| Feature | Vercel CRON | GitHub Actions |
|---|---|---|
| Timeout | 10 seconds ❌ | 60 minutes ✅ |
| Memory | Limited (~1GB) ❌ | 7GB ✅ |
| OOM Risk | High ❌ | Protected ✅ |
| Cost | Paid ❌ | Free ✅ |
| Dataset Size | Small only ❌ | Large ✅ |
Decision: GitHub Actions is perfect for dataset uploads!
Troubleshooting
”No data found"
# Generate test trajectories
npx tsx scripts/generate-test-trajectories.ts
# Or collect from database
npm run hf:collect"Schema conflict”
The upload script separates files by type to avoid schema conflicts:
- Root: JSONL files (consistent schemas)
- Subdirectory: Monthly aggregates
”Upload fails”
Check:
- Valid HuggingFace token with “Write” permission
- Token in GitHub secrets or environment
- Network connectivity
Resources
- Live Dataset: elizaos/babylon-game-data
- GitHub Workflow:
.github/workflows/daily-dataset-upload.yml - Python Training: Python RL Training
- Trajectory Logging: Trajectory Logging
Next Steps
- View Dataset: Visit elizaos/babylon-game-data
- Download:
huggingface-cli download elizaos/babylon-game-data - Use Offline:
npm run hf:offline -- --month=2025-11 - Train: Use dataset for RL training
Status: Production-ready, updating daily ✅