Deep Research Agents
Pokee builds deep research agents that think, search, and synthesize like a human analyst. Choose our flagship hosted product, or run our open-source 7B agent yourself.
Our hosted agent. Built for teams that need fast, accurate research without the OpenAI price tag.
A state-of-the-art 7B-sized deep research agent — released open-source for the community to study, fine-tune, and deploy.
Hosted agent
Built for teams that need fast, accurate research without the OpenAI price tag.
| Metric | OpenAI Deep Research | PokeeResearch |
|---|---|---|
| Cost per query | 1× | 4× cheaper |
| Throughput | 1× | 5× higher |
“Deep research is so easy to use, fast and reliable. We are using it in production today.”
“This is way better than OpenAI and Gemini deep research plus way lower cost.”
“Pokee Deep Research actually outputs consulting grade reports directly while coming at a fraction of cost.”
Open-source
A state-of-the-art 7B-sized deep research agent.
Train on semantic correctness from a cheap LLM judge, not brittle string-match scores.
A genuinely on-policy RL algorithm gives higher sample efficiency than the off-policy methods most agents use.
Pre-filter prompts by the initial policy's pass rate — train only on questions that actually teach.
At inference, recover from malformed tool calls instead of throwing away the episode.
PokeeResearch-7B achieves the best average across ten benchmarks among open-source 7B deep research agents, leading on 7 of 10 benchmarks. Numbers are evaluation reward × 100. Bold = best in column.
| Method | 2Wiki | TQ | NQ | BAM | POP | MUS | HOT | HLE | GAIA | BC | AVG |
|---|---|---|---|---|---|---|---|---|---|---|---|
| R1-Searcher | 61.6 | 65.0 | 66.2 | 62.4 | 65.1 | 51.5 | 62.6 | 4.13 | 4.89 | 0.80 | 40.78 |
| Search-R1 | 78.4 | 74.2 | 79.2 | 75.3 | 77.2 | 61.0 | 72.8 | 11.10 | 18.69 | 0.60 | 50.87 |
| ZeroSearch | 17.6 | 31.4 | 30.0 | 53.9 | 39.7 | 11.4 | 13.8 | 6.96 | 8.37 | 0.40 | 18.76 |
| ASearcher | 84.4 | 84.6 | 87.2 | 74.4 | 81.9 | 64.9 | 84.8 | 11.40 | 16.91 | 2.61 | 57.57 |
| DeepResearcher | 85.40 | 79.80 | 89.60 | 78.31 | 81.05 | 62.78 | 79.80 | 10.22 | 20.63 | 2.20 | 56.64 |
| WebSailor | 88.8 | 92.8 | 97.6 | 86.8 | 87.9 | 69.0 | 92.8 | 12.8 | 34.0 | 5.6 | 66.8 |
| PokeeResearch-7B | 90.8 | 92.6 | 97.8 | 92.8 | 86.3 | 81.0 | 92.0 | 17.6 | 49.2 | 6.2 | 71.07 |
Evaluated on 1,176 questions across 10 benchmarks, 4 independent runs per question, judged by Gemini-2.5-Flash.
Try PokeeResearch in your stack today, or run the open-source 7B model in your own environment.