Back to DIVE

Finance Agent Benchmark

L3 OOD — Specialized Tools TaskPoolSet

Financial specialist tasks using EDGAR filing retrieval, web parsing, and domain-specific financial APIs. Requires understanding of financial documents and specialized tool invocation patterns not seen during training.

Tool Pool EDGAR / Web / Parse / Retrieve
Toolset Uniform
Protocol OpenAI Function Calling
Environment Stateless
Performance (Success Rate %)
Qwen3-8B (base)
2.0
EnvScaler-8B
14.0
DIVE-8B (SFT)
28.0
DIVE-8B (RL)
34.0
Base Best 8B Baseline DIVE (SFT) DIVE (RL)