Methodology
How we estimate consumer-tier session quotas
The subscription-quota panel on /session predicts how much of each consumer chat plan a simulated session would consume. Numbers come from the formula below, calibrated against primary-source provider documentation and community measurement. This page is the audit trail.
Last verified 25 April 2026.
Why this is hard
None of the three major providers publish a clean “X messages per month” number. Anthropic states explicitly that its limits are tied to compute usage rather than raw message volume and that the number you can send varies with message length, attached files, conversation history, tool usage, model choice, and artefacts. OpenAI's help centre uses the phrase “may vary based on system conditions” and ships multiple stacked counters across 3-hour, weekly, and monthly windows. Google publishes the cleanest integer caps for Gemini but explicitly reserves the right to flex them under capacity pressure.
So Calcis cannot give you a single guaranteed number. What we can do is give you a defensible band that combines published policy, community measurement, and time-of-day throttling effects.
The formula
For each (plan, model) combination we identify the applicable counters, for example Anthropic Pro has a 5-hour rolling token bucket, a 7-day Sonnet/Haiku message cap, and a separate weekly Opus cap. For each counter we compute a low / typical / high share of the bucket the simulated session consumes:
base_usage = (turns or estimated session tokens) × model_multiplier peak_multiplier = 1.0 (off-peak) or provider-specific (peak) share_low = base_usage × peak.low / bucket.high share_typical = base_usage × peak.typical / bucket.typical share_high = base_usage × peak.high / bucket.low dominant_counter = argmax(share_typical) across applicable counters
For token-based counters (Anthropic 5-hour session) we model conversation history compounding: turn N input cost ≈ N × per-message input tokens, because the entire conversation re-sends each turn. This is why Pro's 5-hour session bucket exhausts after ~20 substantive turns rather than 200 short ones, empirically validated against IntuitionLabs measurement showing that by message 206 a user's actual prompt was 1.3% of the ~118,000 tokens being processed.
Model multipliers reflect that Anthropic Opus drains roughly 5× faster than Sonnet per equivalent prompt, and ChatGPT's mini and nano variants drain a fraction of an Instant message slot.
Bucket sizes by plan
The numbers below are what we encode in lib/subscription-quotas.ts. Each carries a primary-source URL and a verified-at date; the canonical machine-readable list lives in the source file.
Claude
| Counter | Low | Typical | High | Unit | Source |
|---|---|---|---|---|---|
| 5-hour session Rolling 5 hours from your first message. | 8,000 | 12,000 | 20,000 | tokens | link |
| Counter | Low | Typical | High | Unit | Source |
|---|---|---|---|---|---|
| 5-hour session Rolling 5 hours from your first message. | 35,000 | 44,000 | 55,000 | tokens | link |
| Weekly (Sonnet/Haiku) Rolling 7 days from your first usage in the cycle. | 225 | 600 | 1,100 | msgs | link |
| Weekly Opus Pro Opus access is heavily limited. | 20 | 50 | 100 | msgs | link |
| Counter | Low | Typical | High | Unit | Source |
|---|---|---|---|---|---|
| 5-hour session Rolling 5 hours from your first message. | 70,000 | 88,000 | 110,000 | tokens | link |
| Weekly (Sonnet/Haiku) Rolling 7 days from your first usage in the cycle. | 1,100 | 3,000 | 5,000 | msgs | link |
| Weekly Opus Separate 7-day Opus sub-cap (visible in Settings → Usage). | 200 | 500 | 900 | msgs | link |
ChatGPT
| Counter | Low | Typical | High | Unit | Source |
|---|---|---|---|---|---|
| 3-hour Instant Rolling 3 hours; after exhaustion, auto-routes to nano. | 25 | 35 | 50 | msgs | link |
| 5-hour Thinking Rolling 5 hours when Thinking is manually selected. | 6 | 10 | 15 | msgs | link |
| Counter | Low | Typical | High | Unit | Source |
|---|---|---|---|---|---|
| 3-hour Instant Rolling 3 hours for GPT-5 Instant + GPT-4 family. | 80 | 160 | 240 | msgs | link |
| Weekly Thinking 3,000 manually-selected Thinking msgs/week. Auto-routed Thinking is exempt. | 2,000 | 3,000 | 4,000 | msgs | link |
| Weekly reasoning (o3 / o4-mini) o3 and o4-mini share a separate weekly cap. | 30 | 50 | 100 | msgs | link |
| Counter | Low | Typical | High | Unit | Source |
|---|---|---|---|---|---|
| 3-hour Instant OpenAI describes Pro as near-unlimited; modeled as a large but finite bucket so percentages stay meaningful. | 800 | 1,500 | 3,000 | msgs | link |
| Weekly Thinking GPT-5 Thinking weekly cap remains 3,000 even on Pro. | 2,000 | 3,000 | 4,000 | msgs | link |
| Weekly reasoning (o3 / o4-mini) Pro retains a higher reasoning cap than Plus. | 200 | 250 | 400 | msgs | link |
Gemini
| Counter | Low | Typical | High | Unit | Source |
|---|---|---|---|---|---|
| Daily Pro Resets at midnight Pacific; 5 Pro prompts/day. | 3 | 5 | 8 | msgs | link |
| Daily Flash Resets at midnight Pacific. | 50 | 100 | 200 | msgs | link |
Peak-hour windows
Anthropic confirmed in March 2026 that during weekday US business hours (05:00–11:00 PT, ≈ 13:00–19:00 UTC) Claude users move through their 5-hour session limits faster than off-peak, about 7% of users hit limits they previously would not. Community estimates put the multiplier at roughly 2×. The effect is on the 5-hour session counter only; weekly caps are unaffected.
OpenAI's File Uploads FAQ states caps may be lowered during peak hours without specifying a window. We model US business hours (13:00–22:00 UTC weekdays) at a softer 1.3× multiplier since the public confirmation is weaker.
Google publishes no peak window. The Gemini help text only notes that Free users may be throttled before paid users when capacity is constrained.
For Australian and APAC users specifically: Anthropic peak (13:00–19:00 UTC) maps to 23:00–05:00 AEDT or 00:00–06:00 AEST. The Australian working day is naturally off-peak, you should see ~2× more headroom on Claude's 5-hour session counter than US-based users during their working hours.
What we don't know
The honest list of limitations.
- Tier bucket sizes for Claude. Community P90 estimates (~44k / 88k / 220k tokens per 5-hour window for Pro / Max 5× / Max 20×) come from Claude-Code-Usage-Monitor and have not been confirmed by Anthropic.
- Peak-hour magnitude. Anthropic communicates the effect qualitatively. The 2× multiplier is a community estimate; the true figure may be lower or higher and may shift with policy changes.
- Silent policy changes. GitHub issue #9094 documents an unannounced limit reduction in late September 2025 affecting roughly 30 reporting users. Providers re-tune limits without notice; we re-verify buckets against help-centre pages but there is always lag.
- Hidden context. System prompts, project knowledge, memory features, automatic compaction and CLAUDE.md/instruction files all add tokens the user never sees. The simulator passes turn count only; real sessions with attached PDFs or large project context can land outside the band.
- Tokeniser drift. Simon Willison measured a 1.46× token inflation on the Opus 4.7 system prompt vs Opus 4.6 with no API-pricing change, so the same prompt costs more on a newer model version.
- Tool-call recursion. A single user prompt in agentic mode can spawn dozens of internal tool calls; the simulator does not yet model this and the band will under-estimate for agentic workloads.
When in doubt, treat the typical share as a midpoint and the high share as a worst-case for capacity planning. The band exists because the underlying numbers are uncertain, please do not treat any single percentage as a guarantee.
Bayesian posterior model (v1)
As of Invalid Date counters with sufficient evidence carry a posterior distribution over their bucket size, replacing the hand-coded band at runtime. The model is a Normal-Normal conjugate update on log(B) with three v1 refinements: (1) Student-t band inflation derived from a Normal-Inverse-Gamma prior on σ², so sparse-evidence counters honestly widen their bands instead of overconfidently tightening; (2) time-decay weightingon observations with a 1-year half-life so a complaint from before Anthropic introduced weekly caps (Aug 2025) or peak-hour throttling (Mar 2026) doesn't weight as heavily as one from this quarter; and (3) three evidence sources: Reddit complaint medians, Hacker News complaint medians (Algolia API), and Maciek-roboblog community P90, plus optional GitHub Issues evidence when the local gh CLI is authenticated.
# Per observation (Reddit / HN / GitHub claim): weight_i = exp(−ln(2) × age_in_years) # 1-year half-life y_i = log(claim × 1.5) # bias correction σ_i = base_σ_for_source / √n_eff_pooled # Conjugate update on μ (Normal-Normal): τ_post = τ_prior + Σ τ_i # τ = 1/σ² μ_post = (μ_prior τ_prior + Σ y_i τ_i) / τ_post σ_post = 1 / √τ_post # NIG-inspired Student-t band: df = 2 (α₀ + n_eff_total/2) # α₀ = 2 inflation = √(df / (df − 2)) # Welch-style σ_band = σ_post × inflation # widen for σ uncertainty typical = exp(μ_post) low = exp(μ_post − 1.645 × σ_band) # 5th percentile high = exp(μ_post + 1.645 × σ_band) # 95th percentile
Per-counter posteriors are stored in data/quota-posteriors.json and re-derived offline by research/scripts/train-quota-bayesian.mts. Full math + sources at research/QUOTA_MODEL.md.
Trained posteriors
For each counter the table shows its prior (hardcoded) vs posterior (trained) typical and 90% band, plus the evidence count. Wide posterior bands indicate sparse data; tight bands indicate strong evidence.
| Counter | Prior typical | Posterior typical | Posterior 5th–95th | n_eff | df | infl |
|---|---|---|---|---|---|---|
| Claude Pro 5-hour session | 44,000 | 38,805 | 32,202–46,763 | 6.0 | 10.0 | 1.12× |
| Claude Pro Weekly Opus | 50 | 24 | 12–48 | 1.0 | 5.0 | 1.29× |
| Claude Max 5× 5-hour session | 88,000 | 91,133 | 73,782–112,565 | 2.0 | 6.0 | 1.22× |
| Claude Max 20× 5-hour session | 220,000 | 220,000 | 176,531–274,173 | 1.0 | 5.0 | 1.29× |
| ChatGPT Free 3-hour Instant | 35 | 31 | 21–44 | 1.0 | 5.0 | 1.29× |
| ChatGPT Plus 3-hour Instant | 160 | 32 | 22–46 | 4.9 | 8.9 | 1.14× |
| ChatGPT Plus Weekly Thinking | 3,000 | 1,995 | 1,401–2,841 | 1.0 | 5.0 | 1.29× |
| Google AI Pro Daily Pro | 100 | 105 | 67–166 | 1.0 | 5.0 | 1.29× |
| Google AI Ultra Daily Pro | 500 | 274 | 169–444 | 1.0 | 5.0 | 1.29× |
v1 closed two of the v0 caveats: σ uncertainty is now modelled via the Student-t inflation, and time-decay downweights stale complaints with a 1-year half-life. Still open and documented in QUOTA_MODEL.md: (1) the 1.5× memory-drift correction on user claims is hand-set and unvalidated; (2) complaint-vs-counter routing is regex-based and noisy; (3) per-source σ is hand-set (Reddit 0.6, tokens 0.8, community P90 0.3) rather than learned from cross-validation residuals; (4) Reddit's self-selection bias (only complaints get posted) means posterior medians may be systematically lower than typical usage even after time-decay and inflation. v2 candidates: counter-routing classifier, cross-validation σ calibration, posterior on modelMultipliers (Opus drain rate, etc.).
Cross-validation against Reddit + HN + GitHub complaints
Every bucket in the table above is calibrated against provider documentation, news coverage, and community P90 measurements (Maciek-roboblog, IntuitionLabs). Those sources skew toward developer Claude Code users, a narrower population than the consumer chat audience that pays for Pro / Plus / Pro Gemini. Reddit is the largest publicly searchable pool of consumer-tier complaints, with much bigger n.
We run a periodic scrape against r/ClaudeAI, r/Anthropic, r/ChatGPT, r/OpenAI, r/Bard, r/GeminiAI, and r/singularity for posts that match six rate-limit queries, extract structured fields (provider, plan tier guess, claimed message count, peak-window flag, complaint type), drop usernames, and check whether each plan's typical bucket is consistent with the median complainer's claim. The script and methodology live in /research/ so the data is auditable; the corpus is regenerated by maintainers, not at request time, to respect Reddit's anti-scrape posture.
Caveat the corpus carries explicitly: self-selection bias (people only post when something breaks → the median floors usage, not typical), reporting bias (memory drift on “I only sent 8 messages”), time clustering around outage events, and plan-tier extraction is regex-based and imperfect. Treat it as one signal alongside provider documentation, not a ground truth.
Update cadence
Bucket sizes and peak windows are re-checked manually against provider help centres on a rolling cadence. Each counter carries its own verifiedAt date. When a provider announces a policy change, we update the relevant counter and bump its date. Material changes land in the release notes.