InfiniteWeb

Scalable Web Environment Synthesis for GUI Agent Training

Ziyun Zhang1*   Zezhou Wang2*   Xiaoyi Zhang3†   Zongyu Guo3   Jiahao Li3   Bin Li3   Yan Lu3
1Peking University   2Nanjing University   3Microsoft Research Asia
*Equal contribution and work done during the internship at Microsoft Research Asia.  Project lead.

Generate self-contained web environments with dense evaluators for scalable GUI agent training.

Examples of Generated Websites

All websites are fully functional, self-contained, and resettable — no server required.

Examples of fully functional websites generated by InfiniteWeb

Overview

InfiniteWeb automatically synthesizes fully functional web environments for GUI agent training. Given a simple website seed (a website type description and a design image), it generates multi-page websites with realistic data, business logic, user tasks, and dense reward evaluators — all without any manual effort.

End-to-End Automation

From a simple seed to a fully functional website, no manual coding needed.

TDD Validation & Auto-Fix

Auto-generated tests with iterative fix loops ensure functional correctness.

Dense Reward Evaluators

Checkpoint-based 0.0–1.0 scoring with anti-reward-hacking design for RL training.

OSWorld Compatible

Fully compatible with OSWorld for deployment and evaluation.

Self-Contained

All data in localStorage, no external database or server required.

Scalable Batch Generation

Generate hundreds of diverse websites in parallel. $1.93 per website.

Why InfiniteWeb?

Existing approaches to GUI agent training rely on fixed benchmarks (limited task diversity) or static datasets (no interactive exploration). InfiniteWeb introduces a fundamentally different paradigm: on-demand environment synthesis that produces resettable, RL-ready training environments at scale.

Benchmarks
(WebArena, OSWorld)
Datasets
(Mind2Web, AITW)
InfiniteWeb
Primary Value Standardized evaluation Offline trajectories On-demand environment synthesis
Scalability Limited by fixed sites/tasks Limited by collection effort Generate more sites & tasks
RL Friendliness Often sparse rewards Offline only Dense evaluators + resettable
Reproducibility Live sites may change Fixed logs Self-contained artifacts
Cost High manual effort Crowdsourcing costs $1.93 per website

How It Works

A four-stage pipeline from a simple seed to a complete, training-ready web environment.

InfiniteWeb Architecture Pipeline
1

Unified Specification

From a website seed (type + design image), the system generates tasks, extracts data models, designs interfaces, and creates a unified architecture specification that guides all subsequent stages.

2

Task-Centric TDD Backend

Generates realistic data, implements business logic in JavaScript, and validates everything through auto-generated tests with an iterative fix loop (up to 8 iterations) ensuring functional correctness.

3

Design-Guided Frontend

Analyzes the reference design image, creates layout structures, and generates HTML/CSS pages that match the visual style — running in parallel with the backend stage.

4

Evaluator Generation

Creates dense reward evaluators with checkpoint-based scoring (0.0–1.0), anti-reward-hacking design, and full instrumentation for RL training.

Output Artifact

Each generated website is a self-contained package that runs entirely in the browser. All data is stored in localStorage, and all backend logic runs in business_logic.js — no external database or server is needed.

File Structure

results/generated/
├── index.html                 # Main entry point
├── [page].html/css            # Individual pages
├── business_logic.js          # Backend SDK
├── website_data.json          # Generated data
├── rewritten_tasks.json       # Task definitions
├── evaluators.json            # Reward evaluators
├── test_flows.js              # Automated tests
└── test_results.json          # TDD results

Dense Reward Evaluators

  • Checkpoint-based scoring: Each task has multiple checkpoints with weighted 0.0–1.0 scores
  • Anti-reward-hacking: Evaluators check actual state changes, not just UI clicks
  • Resettable: Clear localStorage and reload for a fresh environment
  • OSWorld-ready: Direct integration with OSWorld evaluation harness

OSWorld Compatibility

InfiniteWeb is fully compatible with OSWorld's infrastructure. All generated websites and tasks can be directly deployed in OSWorld for agent testing and training.

1
Generate websites — Run InfiniteWeb to synthesize web environments
2
Generate task JSONs — Convert to OSWorld-compatible format with generate_task_jsons.py
3
Deploy to OSWorld VM — Copy files and serve via file:// protocol
4
Run evaluation — Use OSWorld's evaluation harness with InfiniteWeb's dense reward evaluators

Results

GUI agents trained on InfiniteWeb-generated environments achieve significant performance improvements across web, desktop, and mobile benchmarks — demonstrating strong cross-domain transfer.

Training Performance (UI-TARS-1.5-7B + GRPO)

Benchmark Baseline + InfiniteWeb (600 tasks) Improvement
OSWorld (Desktop) 24.5% 31.4% +6.9pp
Online-Mind2Web (Web) 23.0% 28.7% +5.7pp
MobileWorld (Mobile) 6.4% 10.3% +3.9pp

Data Quality Matters: InfiniteWeb vs. Claude Code

Same 600-task GRPO training setup. InfiniteWeb's specification-first approach produces higher-quality environments.

Benchmark Baseline Claude Code (600) InfiniteWeb (600)
Online-Mind2Web 23.0% 23.7% (+0.7) 28.7% (+5.7)
OSWorld 24.5% 25.2% (+0.7) 31.4% (+6.9)
MobileWorld 6.4% 5.1% (-1.3) 10.3% (+3.9)
85.6%
Functional Correctness
vs. Codex 81.2%, Claude Code 74.3%
$1.93
Per Website
Orders of magnitude lower than manual
~125min
600 Websites
Embarrassingly parallel generation
6.9%
of Training Time
Generation overhead is minimal

Getting Started

1. Install

conda create -n infiniteweb python=3.10 nodejs -y
conda activate infiniteweb
pip install -r requirements.txt

2. Configure

cp config/config_template.json config/my_config.json
# Edit my_config.json with your Azure OpenAI credentials

3. Generate a Website

python src/tdd_generator.py --config config/my_config.json \
    --website-type "online_bookstore_website" \
    --design-image "resource/example.jpg"

4. Batch Generation

python src/batch_generate.py \
    --config config/my_config.json \
    --websites-config config/website_seeds_template.json \
    --concurrent 3

For detailed instructions, see the full documentation in our paper.

Citation

If you find InfiniteWeb useful for your research, please cite our paper:

@misc{zhang2026infinitewebscalablewebenvironment,
      title={InfiniteWeb: Scalable Web Environment Synthesis
             for GUI Agent Training},
      author={Ziyun Zhang and Zezhou Wang and Xiaoyi Zhang
              and Zongyu Guo and Jiahao Li and Bin Li
              and Yan Lu},
      year={2026},
      eprint={2601.04126},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.04126},
}