Methodology

How we estimate consumer-tier session quotas

The subscription-quota panel on /session predicts how much of each consumer chat plan a simulated session would consume. Numbers come from the formula below, calibrated against primary-source provider documentation and community measurement. This page is the audit trail.

Last verified 25 April 2026.

Why this is hard

None of the three major providers publish a clean “X messages per month” number. Anthropic states explicitly that its limits are tied to compute usage rather than raw message volume and that the number you can send varies with message length, attached files, conversation history, tool usage, model choice, and artefacts. OpenAI's help centre uses the phrase “may vary based on system conditions” and ships multiple stacked counters across 3-hour, weekly, and monthly windows. Google publishes the cleanest integer caps for Gemini but explicitly reserves the right to flex them under capacity pressure.

So Calcis cannot give you a single guaranteed number. What we can do is give you a defensible band that combines published policy, community measurement, and time-of-day throttling effects.

The formula

For each (plan, model) combination we identify the applicable counters, for example Anthropic Pro has a 5-hour rolling token bucket, a 7-day Sonnet/Haiku message cap, and a separate weekly Opus cap. For each counter we compute a low / typical / high share of the bucket the simulated session consumes:

base_usage      = (turns or estimated session tokens) × model_multiplier
peak_multiplier = 1.0 (off-peak) or provider-specific (peak)
share_low       = base_usage × peak.low  / bucket.high
share_typical   = base_usage × peak.typical / bucket.typical
share_high      = base_usage × peak.high / bucket.low

dominant_counter = argmax(share_typical) across applicable counters

For token-based counters (Anthropic 5-hour session) we model conversation history compounding: turn N input cost ≈ N × per-message input tokens, because the entire conversation re-sends each turn. This is why Pro's 5-hour session bucket exhausts after ~20 substantive turns rather than 200 short ones, empirically validated against IntuitionLabs measurement showing that by message 206 a user's actual prompt was 1.3% of the ~118,000 tokens being processed.

Model multipliers reflect that Anthropic Opus drains roughly 5× faster than Sonnet per equivalent prompt, and ChatGPT's mini and nano variants drain a fraction of an Instant message slot.

Bucket sizes by plan

The numbers below are what we encode in lib/subscription-quotas.ts. Each carries a primary-source URL and a verified-at date; the canonical machine-readable list lives in the source file.

Claude

Claude FreeFree
CounterLowTypicalHighUnitSource
5-hour session
Rolling 5 hours from your first message.
8,00012,00020,000tokenslink
Claude Pro$20/mo
CounterLowTypicalHighUnitSource
5-hour session
Rolling 5 hours from your first message.
35,00044,00055,000tokenslink
Weekly (Sonnet/Haiku)
Rolling 7 days from your first usage in the cycle.
2256001,100msgslink
Weekly Opus
Pro Opus access is heavily limited.
2050100msgslink
Claude Max 5×$100/mo
CounterLowTypicalHighUnitSource
5-hour session
Rolling 5 hours from your first message.
70,00088,000110,000tokenslink
Weekly (Sonnet/Haiku)
Rolling 7 days from your first usage in the cycle.
1,1003,0005,000msgslink
Weekly Opus
Separate 7-day Opus sub-cap (visible in Settings → Usage).
200500900msgslink
Claude Max 20×$200/mo
CounterLowTypicalHighUnitSource
5-hour session
Rolling 5 hours from your first message.
180,000220,000280,000tokenslink
Weekly (Sonnet/Haiku)
Rolling 7 days from your first usage in the cycle.
5,00012,00020,000msgslink
Weekly Opus
Separate 7-day Opus sub-cap.
6001,5002,500msgslink

ChatGPT

ChatGPT FreeFree
CounterLowTypicalHighUnitSource
3-hour Instant
Rolling 3 hours; after exhaustion, auto-routes to nano.
253550msgslink
5-hour Thinking
Rolling 5 hours when Thinking is manually selected.
61015msgslink
ChatGPT Plus$20/mo
CounterLowTypicalHighUnitSource
3-hour Instant
Rolling 3 hours for GPT-5 Instant + GPT-4 family.
80160240msgslink
Weekly Thinking
3,000 manually-selected Thinking msgs/week. Auto-routed Thinking is exempt.
2,0003,0004,000msgslink
Weekly reasoning (o3 / o4-mini)
o3 and o4-mini share a separate weekly cap.
3050100msgslink
ChatGPT Pro$200/mo
CounterLowTypicalHighUnitSource
3-hour Instant
OpenAI describes Pro as near-unlimited; modeled as a large but finite bucket so percentages stay meaningful.
8001,5003,000msgslink
Weekly Thinking
GPT-5 Thinking weekly cap remains 3,000 even on Pro.
2,0003,0004,000msgslink
Weekly reasoning (o3 / o4-mini)
Pro retains a higher reasoning cap than Plus.
200250400msgslink

Gemini

Gemini FreeFree
CounterLowTypicalHighUnitSource
Daily Pro
Resets at midnight Pacific; 5 Pro prompts/day.
358msgslink
Daily Flash
Resets at midnight Pacific.
50100200msgslink
Google AI Pro$20/mo
CounterLowTypicalHighUnitSource
Daily Pro
100 Pro prompts/day. Resets at midnight Pacific.
60100150msgslink
Daily Flash
Soft daily Flash cap. Heavily distributed across the day.
5001,0002,000msgslink
Google AI Ultra$250/mo
CounterLowTypicalHighUnitSource
Daily Pro
500 Pro prompts/day. Resets at midnight Pacific.
300500800msgslink
Daily Flash
Highest published Flash allowance.
2,0005,00010,000msgslink

Peak-hour windows

Anthropic confirmed in March 2026 that during weekday US business hours (05:00–11:00 PT, ≈ 13:00–19:00 UTC) Claude users move through their 5-hour session limits faster than off-peak, about 7% of users hit limits they previously would not. Community estimates put the multiplier at roughly 2×. The effect is on the 5-hour session counter only; weekly caps are unaffected.

OpenAI's File Uploads FAQ states caps may be lowered during peak hours without specifying a window. We model US business hours (13:00–22:00 UTC weekdays) at a softer 1.3× multiplier since the public confirmation is weaker.

Google publishes no peak window. The Gemini help text only notes that Free users may be throttled before paid users when capacity is constrained.

For Australian and APAC users specifically: Anthropic peak (13:00–19:00 UTC) maps to 23:00–05:00 AEDT or 00:00–06:00 AEST. The Australian working day is naturally off-peak, you should see ~2× more headroom on Claude's 5-hour session counter than US-based users during their working hours.

What we don't know

The honest list of limitations.

  • Tier bucket sizes for Claude. Community P90 estimates (~44k / 88k / 220k tokens per 5-hour window for Pro / Max 5× / Max 20×) come from Claude-Code-Usage-Monitor and have not been confirmed by Anthropic.
  • Peak-hour magnitude. Anthropic communicates the effect qualitatively. The 2× multiplier is a community estimate; the true figure may be lower or higher and may shift with policy changes.
  • Silent policy changes. GitHub issue #9094 documents an unannounced limit reduction in late September 2025 affecting roughly 30 reporting users. Providers re-tune limits without notice; we re-verify buckets against help-centre pages but there is always lag.
  • Hidden context. System prompts, project knowledge, memory features, automatic compaction and CLAUDE.md/instruction files all add tokens the user never sees. The simulator passes turn count only; real sessions with attached PDFs or large project context can land outside the band.
  • Tokeniser drift. Simon Willison measured a 1.46× token inflation on the Opus 4.7 system prompt vs Opus 4.6 with no API-pricing change, so the same prompt costs more on a newer model version.
  • Tool-call recursion. A single user prompt in agentic mode can spawn dozens of internal tool calls; the simulator does not yet model this and the band will under-estimate for agentic workloads.

When in doubt, treat the typical share as a midpoint and the high share as a worst-case for capacity planning. The band exists because the underlying numbers are uncertain, please do not treat any single percentage as a guarantee.

Bayesian posterior model (v1)

As of Invalid Date counters with sufficient evidence carry a posterior distribution over their bucket size, replacing the hand-coded band at runtime. The model is a Normal-Normal conjugate update on log(B) with three v1 refinements: (1) Student-t band inflation derived from a Normal-Inverse-Gamma prior on σ², so sparse-evidence counters honestly widen their bands instead of overconfidently tightening; (2) time-decay weightingon observations with a 1-year half-life so a complaint from before Anthropic introduced weekly caps (Aug 2025) or peak-hour throttling (Mar 2026) doesn't weight as heavily as one from this quarter; and (3) three evidence sources: Reddit complaint medians, Hacker News complaint medians (Algolia API), and Maciek-roboblog community P90, plus optional GitHub Issues evidence when the local gh CLI is authenticated.

# Per observation (Reddit / HN / GitHub claim):
weight_i = exp(−ln(2) × age_in_years)        # 1-year half-life
y_i      = log(claim × 1.5)                   # bias correction
σ_i      = base_σ_for_source / √n_eff_pooled

# Conjugate update on μ (Normal-Normal):
τ_post   = τ_prior + Σ τ_i                    # τ = 1/σ²
μ_post   = (μ_prior τ_prior + Σ y_i τ_i) / τ_post
σ_post   = 1 / √τ_post

# NIG-inspired Student-t band:
df         = 2 (α₀ + n_eff_total/2)            # α₀ = 2
inflation  = √(df / (df − 2))                  # Welch-style
σ_band     = σ_post × inflation                # widen for σ uncertainty

typical = exp(μ_post)
low     = exp(μ_post − 1.645 × σ_band)         # 5th percentile
high    = exp(μ_post + 1.645 × σ_band)         # 95th percentile

Per-counter posteriors are stored in data/quota-posteriors.json and re-derived offline by research/scripts/train-quota-bayesian.mts. Full math + sources at research/QUOTA_MODEL.md.

Trained posteriors

For each counter the table shows its prior (hardcoded) vs posterior (trained) typical and 90% band, plus the evidence count. Wide posterior bands indicate sparse data; tight bands indicate strong evidence.

CounterPrior typicalPosterior typicalPosterior 5th–95thn_effdfinfl
Claude Pro
5-hour session
44,00038,80532,20246,7636.010.01.12×
Claude Pro
Weekly Opus
502412481.05.01.29×
Claude Max 5×
5-hour session
88,00091,13373,782112,5652.06.01.22×
Claude Max 20×
5-hour session
220,000220,000176,531274,1731.05.01.29×
ChatGPT Free
3-hour Instant
353121441.05.01.29×
ChatGPT Plus
3-hour Instant
1603222464.98.91.14×
ChatGPT Plus
Weekly Thinking
3,0001,9951,4012,8411.05.01.29×
Google AI Pro
Daily Pro
100105671661.05.01.29×
Google AI Ultra
Daily Pro
5002741694441.05.01.29×

v1 closed two of the v0 caveats: σ uncertainty is now modelled via the Student-t inflation, and time-decay downweights stale complaints with a 1-year half-life. Still open and documented in QUOTA_MODEL.md: (1) the 1.5× memory-drift correction on user claims is hand-set and unvalidated; (2) complaint-vs-counter routing is regex-based and noisy; (3) per-source σ is hand-set (Reddit 0.6, tokens 0.8, community P90 0.3) rather than learned from cross-validation residuals; (4) Reddit's self-selection bias (only complaints get posted) means posterior medians may be systematically lower than typical usage even after time-decay and inflation. v2 candidates: counter-routing classifier, cross-validation σ calibration, posterior on modelMultipliers (Opus drain rate, etc.).

Cross-validation against Reddit + HN + GitHub complaints

Every bucket in the table above is calibrated against provider documentation, news coverage, and community P90 measurements (Maciek-roboblog, IntuitionLabs). Those sources skew toward developer Claude Code users, a narrower population than the consumer chat audience that pays for Pro / Plus / Pro Gemini. Reddit is the largest publicly searchable pool of consumer-tier complaints, with much bigger n.

We run a periodic scrape against r/ClaudeAI, r/Anthropic, r/ChatGPT, r/OpenAI, r/Bard, r/GeminiAI, and r/singularity for posts that match six rate-limit queries, extract structured fields (provider, plan tier guess, claimed message count, peak-window flag, complaint type), drop usernames, and check whether each plan's typical bucket is consistent with the median complainer's claim. The script and methodology live in /research/ so the data is auditable; the corpus is regenerated by maintainers, not at request time, to respect Reddit's anti-scrape posture.

Caveat the corpus carries explicitly: self-selection bias (people only post when something breaks → the median floors usage, not typical), reporting bias (memory drift on “I only sent 8 messages”), time clustering around outage events, and plan-tier extraction is regex-based and imperfect. Treat it as one signal alongside provider documentation, not a ground truth.

Update cadence

Bucket sizes and peak windows are re-checked manually against provider help centres on a rolling cadence. Each counter carries its own verifiedAt date. When a provider announces a policy change, we update the relevant counter and bump its date. Material changes land in the release notes.

Spot a stale source or a bucket size you can verify against your own usage data? Tell us methodology improvements ship in the open.