OpenAI
GPT-4.1 pricing
Previous-generation flagship with a 1M context window. Cached input at a 75% discount - pay for prefixes only once.
Input
$2.00/ 1M tok
Output
$8.00/ 1M tok
- Context window
- 1.0M
- Max output
- 33K
- Cached input
- $0.500 / 1M
- Verified
- 2026-04-06
GPT-4.1 was OpenAI's headline long-context release before the GPT-5 family. At $2 per 1M input and $8 per 1M output it's more expensive than GPT-5 on input ($2 vs $1.25) but cheaper on output ($8 vs $10). The 1M context window carries over to this generation, so it's a reasonable choice for document workloads that were benchmarked on this model.
The cached input rate of $0.50 per 1M is a 75% discount on the standard rate - less aggressive than the 90% discount OpenAI applies on GPT-5 and later, but still substantial on long repeated prefixes. Caching is automatic when prefixes repeat within a 5-minute window.
Calcis counts GPT-4.1 input tokens with o200k_base (tiktoken), matching the tokenizer OpenAI bills against, so the token numbers you see are exactly what lands on your invoice.
Estimate your cost on GPT-4.1
Paste your prompt into the estimator, pick GPT-4.1, and see the exact dollar cost - input tokens counted with the provider's own tokenizer, output tokens predicted by our regression model.
Frequently asked
- Should I migrate from GPT-4.1 to GPT-5?
- Usually yes. GPT-5 is cheaper on input ($1.25 vs $2), marginally more expensive on output ($10 vs $8), with a 400K context window. For long-document work where you need 1M context, stay on 4.1 or move to GPT-4.1-mini. Otherwise migrate to GPT-5.
- How much does GPT-4.1 cost per request?
- A 1,000-token prompt with a 500-token reply costs about $0.006 ($0.002 input + $0.004 output). The 4:1 output-to-input ratio is lighter than the 8:1 on GPT-5, so prompt-heavy workloads benefit more on 4.1.
- Does GPT-4.1 have a long-context surcharge?
- No. Flat per-token rates across the full 1M context window - no threshold like Gemini 2.5 Pro.
- What's the cached input discount on GPT-4.1?
- $0.50 per 1M cached tokens - a 75% discount on the standard $2 input rate. Automatic when prompt prefixes repeat within 5 minutes.
Pricing verified 2026-04-06 from the provider's rate card.