Skip to Content
DocsAI AgentsHuggingFace Integration

HuggingFace Integration

Complete dataset publishing system for Babylon game data and agent trajectories.

Overview

Dataset: elizaos/babylon-game-data 

Updated: Daily at 2 AM UTC via GitHub Actions

Contains:

  • Agent trajectories (complete gameplay with decisions + environment)
  • Benchmark scenarios (game simulations with ground truth)
  • Model performance results
  • Organized by month for easy access

Quick Start

Download Dataset

from datasets import load_dataset # Load complete dataset dataset = load_dataset("elizaos/babylon-game-data") # Access trajectories trajectories = dataset['train']

Run Offline Simulation

# Download dataset huggingface-cli download elizaos/babylon-game-data # Run faster-than-real-time simulation npx tsx scripts/run-offline-simulation.ts --month=2025-11 --agent=my-agent

Speed: 100-1000x faster than real-time! ⚡

Dataset Structure

elizaos/babylon-game-data/ ├── README.md - Dataset documentation ├── index.json - Metadata ├── summary.json - Statistics ├── trajectories.jsonl - All agent trajectories (up to 1,000) ├── benchmarks-metadata.json - Benchmark file info └── monthly-data/ ├── 2025-10.json - October: worlds + trajectories + benchmarks ├── 2025-11.json - November: worlds + trajectories + benchmarks └── YYYY-MM.json - Monthly aggregated data

Trajectory Format

Each trajectory includes:

{ "trajectoryId": "...", "agentId": "...", "month": "2025-11", "scenario": "trading-scenario", "steps": [ { "stepNumber": 1, "environmentState": { "agentBalance": 10000, "agentPnL": 0, "openPositions": 0, "activeMarkets": 15 }, "observation": { "markets": [...], "prices": {...}, "feed": [...] }, "llm_calls": [ { "model": "babylon-agent-v1", "user_prompt": "Analyze market...", "response": "I should buy...", "reasoning": "Based on momentum..." } ], "action": { "type": "BUY_SHARES", "parameters": { "marketId": "...", "amount": 100 }, "success": true, "result": { "pnl": 50 } }, "reward": 50 } ], "totalReward": 1500, "finalPnL": 1500, "metrics": { "tradesExecuted": 15, "postsCreated": 5 } }

Automation

GitHub Actions (Automatic)

Workflow: .github/workflows/daily-dataset-upload.yml

Schedule: Daily at 2 AM UTC (0 2 * * *)

Process:

  1. Collect game data from database (memory-safe batching)
  2. Organize by month
  3. Upload to HuggingFace
  4. Verify upload
  5. Save artifacts

Setup:

Add GitHub secrets:

HUGGING_FACE_TOKEN = hf_your_token_here DATABASE_URL = your_postgres_connection

Then push to GitHub - runs automatically!

Manual Upload

# Collect data npm run hf:collect # Upload npm run hf:upload # Verify npm run hf:verify

Offline Simulation

Download the dataset and run simulations locally (no API calls needed):

# Download huggingface-cli download elizaos/babylon-game-data # Run simulation (100-1000x faster than real-time) npm run hf:offline -- --month=2025-11 --agent=my-agent # Or with specific data file npx tsx scripts/run-offline-simulation.ts \ --data=monthly-data/2025-11.json \ --agent=my-agent \ --fast-forward

Features:

  • No network required
  • Deterministic replay
  • Perfect for testing
  • Faster-than-real-time mode

Memory Safety

All scripts use memory-safe collection:

  • Batch Processing: 100 items at a time
  • Hard Limits: Max 1,000 trajectories, 500 benchmarks
  • Streaming: JSONL format (line-by-line processing)
  • Per-Month Files: Separate files, no giant JSON
  • Forced GC: Release memory between batches

Result: No OOM crashes ✅

Data Quality

What’s Included

Agent Trajectories:

  • ✅ Complete decision sequence
  • ✅ LLM calls (prompts + responses)
  • ✅ Environment state at each step
  • ✅ Actions taken
  • ✅ Outcomes and rewards
  • ✅ Ground truth (for training)

Benchmark Data:

  • ✅ Complete game scenarios
  • ✅ Ground truth outcomes
  • ✅ Optimal actions
  • ✅ Market dynamics
  • ✅ Tick-by-tick progression

Game Worlds:

  • ✅ Prediction market questions
  • ✅ Events and timelines
  • ✅ NPC interactions
  • ✅ Feed posts
  • ✅ Outcomes

npm Commands

npm run hf:collect # Collect game data (memory-safe) npm run hf:upload # Upload to HuggingFace npm run hf:verify # Verify upload succeeded npm run hf:offline # Run offline simulation npm run hf:test-flow # Test complete flow

Training Pipeline Integration

The HuggingFace dataset integrates with the Python training pipeline:

# Option 1: Train from database (default) python src/training/babylon_trainer.py # Option 2: Train from HuggingFace (for reproducibility) from datasets import load_dataset dataset = load_dataset("elizaos/babylon-game-data") trajectories = dataset['train'] # Use for training

Scripts

  • collect-game-data-for-hf.ts - Collect data (memory-safe batching)
  • upload-to-huggingface.ts - Upload to HuggingFace
  • verify-hf-upload.ts - Verify upload
  • run-offline-simulation.ts - Offline simulator
  • prepare-real-dataset-for-hf.ts - Verify real data before upload

Architecture

Gameplay → TrajectoryRecorder → Database GitHub Actions (Daily 2 AM UTC) Collect (memory-safe batching) Organize by month Upload to HuggingFace elizaos/babylon-game-data (PUBLIC) Download & Use Offline

Why GitHub Actions (Not Vercel)

FeatureVercel CRONGitHub Actions
Timeout10 seconds ❌60 minutes ✅
MemoryLimited (~1GB) ❌7GB ✅
OOM RiskHigh ❌Protected ✅
CostPaid ❌Free ✅
Dataset SizeSmall only ❌Large ✅

Decision: GitHub Actions is perfect for dataset uploads!

Troubleshooting

”No data found"

# Generate test trajectories npx tsx scripts/generate-test-trajectories.ts # Or collect from database npm run hf:collect

"Schema conflict”

The upload script separates files by type to avoid schema conflicts:

  • Root: JSONL files (consistent schemas)
  • Subdirectory: Monthly aggregates

”Upload fails”

Check:

  • Valid HuggingFace token with “Write” permission
  • Token in GitHub secrets or environment
  • Network connectivity

Resources

Next Steps

  1. View Dataset: Visit elizaos/babylon-game-data 
  2. Download: huggingface-cli download elizaos/babylon-game-data
  3. Use Offline: npm run hf:offline -- --month=2025-11
  4. Train: Use dataset for RL training

Status: Production-ready, updating daily ✅

Last updated on