HuggingFace Integration

Complete dataset publishing system for Babylon game data and agent trajectories.

Overview

Updated: Daily at 2 AM UTC via GitHub Actions

Contains:

Agent trajectories (complete gameplay with decisions + environment)
Benchmark scenarios (game simulations with ground truth)
Model performance results
Organized by month for easy access

Quick Start

Download Dataset


from datasets import load_dataset
 
# Load complete dataset
dataset = load_dataset("elizaos/babylon-game-data")
 
# Access trajectories
trajectories = dataset['train']

Run Offline Simulation


# Download dataset
huggingface-cli download elizaos/babylon-game-data
 
# Run faster-than-real-time simulation
npx tsx scripts/run-offline-simulation.ts --month=2025-11 --agent=my-agent

Speed: 100-1000x faster than real-time! ⚡

Dataset Structure


elizaos/babylon-game-data/
├── README.md              - Dataset documentation
├── index.json             - Metadata
├── summary.json           - Statistics
├── trajectories.jsonl     - All agent trajectories (up to 1,000)
├── benchmarks-metadata.json - Benchmark file info
└── monthly-data/
    ├── 2025-10.json      - October: worlds + trajectories + benchmarks
    ├── 2025-11.json      - November: worlds + trajectories + benchmarks
    └── YYYY-MM.json      - Monthly aggregated data

Trajectory Format

Each trajectory includes:


{
  "trajectoryId": "...",
  "agentId": "...",
  "month": "2025-11",
  "scenario": "trading-scenario",
  "steps": [
    {
      "stepNumber": 1,
      "environmentState": {
        "agentBalance": 10000,
        "agentPnL": 0,
        "openPositions": 0,
        "activeMarkets": 15
      },
      "observation": {
        "markets": [...],
        "prices": {...},
        "feed": [...]
      },
      "llm_calls": [
        {
          "model": "babylon-agent-v1",
          "user_prompt": "Analyze market...",
          "response": "I should buy...",
          "reasoning": "Based on momentum..."
        }
      ],
      "action": {
        "type": "BUY_SHARES",
        "parameters": { "marketId": "...", "amount": 100 },
        "success": true,
        "result": { "pnl": 50 }
      },
      "reward": 50
    }
  ],
  "totalReward": 1500,
  "finalPnL": 1500,
  "metrics": {
    "tradesExecuted": 15,
    "postsCreated": 5
  }
}

Automation

GitHub Actions (Automatic)

Workflow: .github/workflows/daily-dataset-upload.yml

Schedule: Daily at 2 AM UTC (0 2 * * *)

Process:

Collect game data from database (memory-safe batching)
Organize by month
Upload to HuggingFace
Verify upload
Save artifacts

Setup:

Add GitHub secrets:


HUGGING_FACE_TOKEN = hf_your_token_here
DATABASE_URL = your_postgres_connection

Then push to GitHub - runs automatically!

Manual Upload


# Collect data
npm run hf:collect
 
# Upload
npm run hf:upload
 
# Verify
npm run hf:verify

Offline Simulation

Download the dataset and run simulations locally (no API calls needed):


# Download
huggingface-cli download elizaos/babylon-game-data
 
# Run simulation (100-1000x faster than real-time)
npm run hf:offline -- --month=2025-11 --agent=my-agent
 
# Or with specific data file
npx tsx scripts/run-offline-simulation.ts \
  --data=monthly-data/2025-11.json \
  --agent=my-agent \
  --fast-forward

Features:

No network required
Deterministic replay
Perfect for testing
Faster-than-real-time mode

Memory Safety

All scripts use memory-safe collection:

Batch Processing: 100 items at a time
Hard Limits: Max 1,000 trajectories, 500 benchmarks
Streaming: JSONL format (line-by-line processing)
Per-Month Files: Separate files, no giant JSON
Forced GC: Release memory between batches

Result: No OOM crashes ✅

Data Quality

What’s Included

Agent Trajectories:

✅ Complete decision sequence
✅ LLM calls (prompts + responses)
✅ Environment state at each step
✅ Actions taken
✅ Outcomes and rewards
✅ Ground truth (for training)

Benchmark Data:

✅ Complete game scenarios
✅ Ground truth outcomes
✅ Optimal actions
✅ Market dynamics
✅ Tick-by-tick progression

Game Worlds:

✅ Prediction market questions
✅ Events and timelines
✅ NPC interactions
✅ Feed posts
✅ Outcomes

npm Commands


npm run hf:collect      # Collect game data (memory-safe)
npm run hf:upload       # Upload to HuggingFace
npm run hf:verify       # Verify upload succeeded
npm run hf:offline      # Run offline simulation
npm run hf:test-flow    # Test complete flow

Training Pipeline Integration

The HuggingFace dataset integrates with the Python training pipeline:


# Option 1: Train from database (default)
python src/training/babylon_trainer.py
 
# Option 2: Train from HuggingFace (for reproducibility)
from datasets import load_dataset
dataset = load_dataset("elizaos/babylon-game-data")
trajectories = dataset['train']
# Use for training

Scripts

collect-game-data-for-hf.ts - Collect data (memory-safe batching)
upload-to-huggingface.ts - Upload to HuggingFace
verify-hf-upload.ts - Verify upload
run-offline-simulation.ts - Offline simulator
prepare-real-dataset-for-hf.ts - Verify real data before upload

Architecture


Gameplay → TrajectoryRecorder → Database
                                    ↓
                         GitHub Actions (Daily 2 AM UTC)
                                    ↓
                    Collect (memory-safe batching)
                                    ↓
                    Organize by month
                                    ↓
                    Upload to HuggingFace
                                    ↓
              elizaos/babylon-game-data (PUBLIC)
                                    ↓
                    Download & Use Offline

Why GitHub Actions (Not Vercel)

Feature	Vercel CRON	GitHub Actions
Timeout	10 seconds ❌	60 minutes ✅
Memory	Limited (~1GB) ❌	7GB ✅
OOM Risk	High ❌	Protected ✅
Cost	Paid ❌	Free ✅
Dataset Size	Small only ❌	Large ✅

Decision: GitHub Actions is perfect for dataset uploads!

Troubleshooting

”No data found"


# Generate test trajectories
npx tsx scripts/generate-test-trajectories.ts
 
# Or collect from database
npm run hf:collect

"Schema conflict”

The upload script separates files by type to avoid schema conflicts:

Root: JSONL files (consistent schemas)
Subdirectory: Monthly aggregates

”Upload fails”

Check:

Valid HuggingFace token with “Write” permission
Token in GitHub secrets or environment
Network connectivity

Resources

Live Dataset: elizaos/babylon-game-data
GitHub Workflow: .github/workflows/daily-dataset-upload.yml
Python Training: Python RL Training
Trajectory Logging: Trajectory Logging

Next Steps

View Dataset: Visit elizaos/babylon-game-data
Download: huggingface-cli download elizaos/babylon-game-data
Use Offline: npm run hf:offline -- --month=2025-11
Train: Use dataset for RL training

Status: Production-ready, updating daily ✅