Back to DIVE
Xbench-DeepSearch
L2
OOD — General Tools
TaskSet
Cross-lingual deep search benchmark (DeepSearch subset) requiring multi-lingual information retrieval and synthesis across web sources.
Tool Pool
Search / Browse
Toolset
Uniform
Protocol
OpenAI Function Calling
Environment
Stateless
Performance (Success Rate %)
Base
Best 8B Baseline
DIVE (SFT)
DIVE (RL)