Back to DIVE

FinSearchComp (Global)

L2 OOD — General Tools TaskSet

Financial deep research competition requiring search, browsing, and code execution to answer complex financial questions. Evaluated on Tier-2 and Tier-3 difficulty levels.

Tool Pool Search / Browse / Code Execution
Toolset Uniform
Protocol OpenAI Function Calling
Environment Stateless
Performance (Success Rate %)
Qwen3-8B (base)
17.9
EnvScaler-8B
25.8
DIVE-8B (SFT)
47.6
DIVE-8B (RL)
52.3
Base Best 8B Baseline DIVE (SFT) DIVE (RL)