Back to DIVE
BrowseComp
L2
OOD — General Tools
TaskSet
Web browsing comprehension benchmark requiring agents to navigate, extract, and reason over web content to answer complex questions.
Tool Pool
Search / Browse
Toolset
Uniform
Protocol
OpenAI Function Calling
Environment
Stateless
Performance (Success Rate %)
Base
Best 8B Baseline
DIVE (SFT)
DIVE (RL)