Monthly workload
30,000 Responses API web searches on the same product surface.
OpenAI tool-cost detail
This page answers the OpenAI web search pricing query directly. It keeps the current tool-call rates, search-content token rules, fixed-token exception, and practical workload patterns in one source-linked view.
Cost anatomy
OpenAI web search is not one uniform meter. The call rate, the search-content token rule, and the fixed-token exception all change depending on which search path you pick.
Workload examples
These examples use OpenAI's current published search rates and simple arithmetic. They isolate the search-specific burden so the call meter and search-content rule are visible before normal prompt and output tokens are folded back in.
Worked example
This compare isolates search-specific cost. Regular prompt and output tokens still sit outside these figures.
Monthly workload
30,000 Responses API web searches on the same product surface.
Compared options
Standard web search with gpt-5 versus standard web search with gpt-4.1-mini.
Search-content assumption
gpt-5 path sees about 3,000 search content tokens per call; gpt-4.1-mini uses the published fixed 8,000-token block instead.
Scope of estimate
This sample prices only the search-specific meters: tool calls and search content tokens.
Model option
~$412.50 in search-specific cost
Tool-call meter
30 x $10 = $300.
Search content tokens
30,000 x 3,000 tokens = 90M input tokens. At $1.25 per 1M input tokens, that is about $112.50.
Decision read
This path keeps the lower call price, but search content volume still pushes the bill upward as results get richer.
Recommended next check
Confirm whether the richer model is needed on searched turns, or whether the search path can move to a cheaper model without losing fit.
Model option
~$396 in search-specific cost
Tool-call meter
30 x $10 = $300.
Search content tokens
30,000 x 8,000 billed tokens = 240M input tokens. At $0.40 per 1M input tokens, that is about $96.
Decision read
The cheaper model row helps, but the fixed 8,000-token block means the savings are much smaller than the base model pricing alone would suggest.
Recommended next check
Use this path only after confirming the smaller model is good enough and the fixed-block search billing still beats the richer-model path for your call volume.
Estimated search-specific cost
The cheaper model only saves about $16.50 in this sample because the $300 call meter dominates both options and the mini path keeps a fixed search-token floor.
What matters first
On frequent web search, choosing the search path and understanding its token rule matter more than headline model input pricing.
Recommended next check
Before switching downmarket, compare actual call volume and the effective search-token rule, not just the base model row.
This is a sample compare, not a live calculator. It combines the current published web-search tool rates with current model input rates to show how the search-specific burden can flatten apparent model savings.
Additional examples
Use the table below to see how the same tool changes shape under different workload patterns after the worked example has framed the main decision.
| Scenario | Workload | Tool-call meter | Search content tokens | Model-row pressure | Decision read | Sources |
|---|---|---|---|---|---|---|
| Light lookup helper | 5,000 standard web searches in the Responses API on gpt-5, with about 1,000 search content tokens per call. | 5 x $10 = $50 | 5M search content tokens x $1.25 per 1M = about $6.25 | Regular prompt and output tokens still sit on the normal gpt-5 row outside this search-specific estimate. | On light lookups, standard web search is usually call-driven before search content tokens become the main concern. | |
| Mini fixed-block path | 20,000 standard web searches in the Responses API on gpt-4.1-mini. | 20 x $10 = $200 | 20,000 x 8,000 billed tokens = 160M tokens x $0.40 per 1M = $64 | The fixed block means short searches do not collapse to near-zero search-token cost even on a mini model row. | This path can still be economical, but the fixed block creates a cost floor that the base model row does not show by itself. | |
| Preview non-reasoning search model | 20,000 web lookups using gpt-4o-search-preview in Chat Completions. | 20 x $25 = $500 | Search content tokens are free on this preview non-reasoning path | Regular prompt and output tokens still use the selected preview model's token rates. | This path simplifies search-token math, but the higher call price dominates quickly on frequent search. |
Control levers
OpenAI already exposes path choices and usage patterns that materially change the search bill. These are the ones that most often move the estimate.
The web search guide shows the Responses API using a `web_search` tool, while the Chat Completions preview search models always retrieve from the web before responding. That means conditional search belongs on the Responses path when not every turn needs live web data.
Sources
Standard web search, preview reasoning web search, and preview non-reasoning web search do not share one price shape. If the path is not fixed first, the estimate is still comparing the wrong surfaces.
Sources
For gpt-4o-mini and gpt-4.1-mini with standard web search, every search call carries the fixed billed token block. This is the main reason the mini path can save less than the base model pricing suggests.
Sources
Web search introduces its own tool-call and search-content logic. Keeping those lines separate from the ordinary model bill is the only reliable way to see whether search is the real cost problem.
Sources
Decision signals
Use these signals when deciding whether to keep the current search path, switch paths, or push harder on call-volume control.
At both $10 and $25 per 1K calls, frequent web search can dominate before search-content tokens do. Start by estimating search frequency, then price the token rule for the chosen path.
Sources
The fixed 8,000-token block on gpt-4o-mini and gpt-4.1-mini standard web search creates a billed floor per call. Cheap base input pricing does not erase that search-specific floor.
Sources
If a team prices only the 'free search content tokens' note and ignores the $25 per 1K call line, it will probably under-estimate the preview non-reasoning path.
Sources
Because the guide separates conditional web search in Responses API from always-searching preview models in Chat Completions, teams should choose the search path before treating this as a normal model-pricing comparison.
Official sources
This page keeps the source set narrow so the cost brief can stay auditable instead of drifting into guesswork.
Source of record for standard web search, preview reasoning web search, preview non-reasoning web search, free search content token treatment on preview non-reasoning, and the fixed 8,000-token block on gpt-4o-mini and gpt-4.1-mini standard web search.
Shows how to use web search in the Responses API, documents that Chat Completions preview search models always retrieve from the web before responding, and gives the current tool interface context.
Continue the site
Use the groups below to move laterally through the decision, not back out into another doc hunt.
Related pages
Stay in the same decision neighborhood instead of backing out to search.
Model pricing, hosted-tool costs, and fit constraints that materially change the operating estimate.
Open pageTool-cost brief for file search pricing across storage, tool calls, and model-token exposure.
Open pageTool-cost brief for code interpreter container runtime and how it stacks with model spend.
Open pageCompare pages
Open the pages that turn this topic into a side-by-side decision.
Replacement pages
Use the likely substitutes, migration targets, or fallback choices as the next click.
Source category pages
Trace the source families behind this page instead of opening random docs in isolation.