Deep Research Agents

Research at the speed of thought.

Pokee builds deep research agents that think, search, and synthesize like a human analyst. Choose our flagship hosted product, or run our open-source 7B agent yourself.

Explore PokeeResearch See the open-source 7B

Closed-Source

PokeeResearch

Our hosted agent. Built for teams that need fast, accurate research without the OpenAI price tag.

Learn more Try the API

Open-Source

PokeeResearch-7B

A state-of-the-art 7B-sized deep research agent — released open-source for the community to study, fine-tune, and deploy.

Learn more GitHub

Hosted agent

PokeeResearch

Built for teams that need fast, accurate research without the OpenAI price tag.

Pricing & speed vs. OpenAI Deep Research

Metric	OpenAI Deep Research	PokeeResearch
Cost per query	1×	4× cheaper
Throughput	1×	5× higher

What customers are saying

“Deep research is so easy to use, fast and reliable. We are using it in production today.”

“This is way better than OpenAI and Gemini deep research plus way lower cost.”

“Pokee Deep Research actually outputs consulting grade reports directly while coming at a fraction of cost.”

Try the API Talk to sales

Open-source

PokeeResearch-7B

A state-of-the-art 7B-sized deep research agent.

What makes it work

LLM-judge rewards

Train on semantic correctness from a cheap LLM judge, not brittle string-match scores.

True on-policy training

A genuinely on-policy RL algorithm gives higher sample efficiency than the off-policy methods most agents use.

Difficulty-filtered data

Pre-filter prompts by the initial policy's pass rate — train only on questions that actually teach.

Error-tolerant rollouts

At inference, recover from malformed tool calls instead of throwing away the episode.

Highest average among open-source 7B research agents

PokeeResearch-7B achieves the best average across ten benchmarks among open-source 7B deep research agents, leading on 7 of 10 benchmarks. Numbers are evaluation reward × 100. Bold = best in column.

Method	2Wiki	TQ	NQ	BAM	POP	MUS	HOT	HLE	GAIA	BC	AVG
R1-Searcher	61.6	65.0	66.2	62.4	65.1	51.5	62.6	4.13	4.89	0.80	40.78
Search-R1	78.4	74.2	79.2	75.3	77.2	61.0	72.8	11.10	18.69	0.60	50.87
ZeroSearch	17.6	31.4	30.0	53.9	39.7	11.4	13.8	6.96	8.37	0.40	18.76
ASearcher	84.4	84.6	87.2	74.4	81.9	64.9	84.8	11.40	16.91	2.61	57.57
DeepResearcher	85.40	79.80	89.60	78.31	81.05	62.78	79.80	10.22	20.63	2.20	56.64
WebSailor	88.8	92.8	97.6	86.8	87.9	69.0	92.8	12.8	34.0	5.6	66.8
PokeeResearch-7B	90.8	92.6	97.8	92.8	86.3	81.0	92.0	17.6	49.2	6.2	71.07

Evaluated on 1,176 questions across 10 benchmarks, 4 independent runs per question, judged by Gemini-2.5-Flash.

Read the paper (arXiv)View on GitHub

Ready to put deep research to work?

Try PokeeResearch in your stack today, or run the open-source 7B model in your own environment.

Try the API Talk to sales

PokeeResearch

Built for teams that need fast, accurate research without the OpenAI price tag.

Pricing & speed vs. OpenAI Deep Research

Metric	OpenAI Deep Research	PokeeResearch
Cost per query	1×	4× cheaper
Throughput	1×	5× higher

What customers are saying

“Deep research is so easy to use, fast and reliable. We are using it in production today.”

“This is way better than OpenAI and Gemini deep research plus way lower cost.”

“Pokee Deep Research actually outputs consulting grade reports directly while coming at a fraction of cost.”

PokeeResearch-7B

A state-of-the-art 7B-sized deep research agent.

What makes it work

LLM-judge rewards

Train on semantic correctness from a cheap LLM judge, not brittle string-match scores.

True on-policy training

A genuinely on-policy RL algorithm gives higher sample efficiency than the off-policy methods most agents use.

Difficulty-filtered data

Pre-filter prompts by the initial policy's pass rate — train only on questions that actually teach.

Error-tolerant rollouts

At inference, recover from malformed tool calls instead of throwing away the episode.

Highest average among open-source 7B research agents

Method	2Wiki	TQ	NQ	BAM	POP	MUS	HOT	HLE	GAIA	BC	AVG
R1-Searcher	61.6	65.0	66.2	62.4	65.1	51.5	62.6	4.13	4.89	0.80	40.78
Search-R1	78.4	74.2	79.2	75.3	77.2	61.0	72.8	11.10	18.69	0.60	50.87
ZeroSearch	17.6	31.4	30.0	53.9	39.7	11.4	13.8	6.96	8.37	0.40	18.76
ASearcher	84.4	84.6	87.2	74.4	81.9	64.9	84.8	11.40	16.91	2.61	57.57
DeepResearcher	85.40	79.80	89.60	78.31	81.05	62.78	79.80	10.22	20.63	2.20	56.64
WebSailor	88.8	92.8	97.6	86.8	87.9	69.0	92.8	12.8	34.0	5.6	66.8
PokeeResearch-7B	90.8	92.6	97.8	92.8	86.3	81.0	92.0	17.6	49.2	6.2	71.07

Evaluated on 1,176 questions across 10 benchmarks, 4 independent runs per question, judged by Gemini-2.5-Flash.