Deep Research Agents
Research at the speed of thought.
Pokee builds deep research agents that think, search, and synthesize like a human analyst. Choose our flagship hosted product, or run our open-source 7B agent yourself.
PokeeResearch
Our hosted agent. Built for teams that need fast, accurate research without the OpenAI price tag.
PokeeResearch-7B
A state-of-the-art 7B-sized deep research agent — released open-source for the community to study, fine-tune, and deploy.
Hosted agent
PokeeResearch
Built for teams that need fast, accurate research without the OpenAI price tag.
Pricing & speed vs. OpenAI Deep Research
| Metric | OpenAI Deep Research | PokeeResearch |
|---|---|---|
| Cost per query | 1× | 4× cheaper |
| Throughput | 1× | 5× higher |
What customers are saying
“Deep research is so easy to use, fast and reliable. We are using it in production today.”
“This is way better than OpenAI and Gemini deep research plus way lower cost.”
“Pokee Deep Research actually outputs consulting grade reports directly while coming at a fraction of cost.”
Open-source
PokeeResearch-7B
A state-of-the-art 7B-sized deep research agent.
What makes it work
LLM-judge rewards
Train on semantic correctness from a cheap LLM judge, not brittle string-match scores.
True on-policy training
A genuinely on-policy RL algorithm gives higher sample efficiency than the off-policy methods most agents use.
Difficulty-filtered data
Pre-filter prompts by the initial policy's pass rate — train only on questions that actually teach.
Error-tolerant rollouts
At inference, recover from malformed tool calls instead of throwing away the episode.
Highest average among open-source 7B research agents
PokeeResearch-7B achieves the best average across ten benchmarks among open-source 7B deep research agents, leading on 7 of 10 benchmarks. Numbers are evaluation reward × 100. Bold = best in column.
| Method | 2Wiki | TQ | NQ | BAM | POP | MUS | HOT | HLE | GAIA | BC | AVG |
|---|---|---|---|---|---|---|---|---|---|---|---|
| R1-Searcher | 61.6 | 65.0 | 66.2 | 62.4 | 65.1 | 51.5 | 62.6 | 4.13 | 4.89 | 0.80 | 40.78 |
| Search-R1 | 78.4 | 74.2 | 79.2 | 75.3 | 77.2 | 61.0 | 72.8 | 11.10 | 18.69 | 0.60 | 50.87 |
| ZeroSearch | 17.6 | 31.4 | 30.0 | 53.9 | 39.7 | 11.4 | 13.8 | 6.96 | 8.37 | 0.40 | 18.76 |
| ASearcher | 84.4 | 84.6 | 87.2 | 74.4 | 81.9 | 64.9 | 84.8 | 11.40 | 16.91 | 2.61 | 57.57 |
| DeepResearcher | 85.40 | 79.80 | 89.60 | 78.31 | 81.05 | 62.78 | 79.80 | 10.22 | 20.63 | 2.20 | 56.64 |
| WebSailor | 88.8 | 92.8 | 97.6 | 86.8 | 87.9 | 69.0 | 92.8 | 12.8 | 34.0 | 5.6 | 66.8 |
| PokeeResearch-7B | 90.8 | 92.6 | 97.8 | 92.8 | 86.3 | 81.0 | 92.0 | 17.6 | 49.2 | 6.2 | 71.07 |
Evaluated on 1,176 questions across 10 benchmarks, 4 independent runs per question, judged by Gemini-2.5-Flash.
Ready to put deep research to work?
Try PokeeResearch in your stack today, or run the open-source 7B model in your own environment.