Methodology

How we estimate consumer-tier session quotas

The subscription-quota panel on /session predicts how much of each consumer chat plan a simulated session would consume. Numbers come from the formula below, calibrated against primary-source provider documentation and community measurement. This page is the audit trail.

Last verified 25 April 2026.

Why this is hard

None of the three major providers publish a clean “X messages per month” number. Anthropic states explicitly that its limits are tied to compute usage rather than raw message volume and that the number you can send varies with message length, attached files, conversation history, tool usage, model choice, and artefacts. OpenAI's help centre uses the phrase “may vary based on system conditions” and ships multiple stacked counters across 3-hour, weekly, and monthly windows. Google publishes the cleanest integer caps for Gemini but explicitly reserves the right to flex them under capacity pressure.

So Calcis cannot give you a single guaranteed number. What we can do is give you a defensible band that combines published policy, community measurement, and time-of-day throttling effects.

The formula

For each (plan, model) combination we identify the applicable counters, for example Anthropic Pro has a 5-hour rolling token bucket, a 7-day Sonnet/Haiku message cap, and a separate weekly Opus cap. For each counter we compute a low / typical / high share of the bucket the simulated session consumes:

base_usage      = (turns or estimated session tokens) × model_multiplier
peak_multiplier = 1.0 (off-peak) or provider-specific (peak)
share_low       = base_usage × peak.low  / bucket.high
share_typical   = base_usage × peak.typical / bucket.typical
share_high      = base_usage × peak.high / bucket.low

dominant_counter = argmax(share_typical) across applicable counters

For token-based counters (Anthropic 5-hour session) we model conversation history compounding: turn N input cost ≈ N × per-message input tokens, because the entire conversation re-sends each turn. This is why Pro's 5-hour session bucket exhausts after ~20 substantive turns rather than 200 short ones, empirically validated against IntuitionLabs measurement showing that by message 206 a user's actual prompt was 1.3% of the ~118,000 tokens being processed.

Model multipliers reflect that Anthropic Opus drains roughly 5× faster than Sonnet per equivalent prompt, and ChatGPT's mini and nano variants drain a fraction of an Instant message slot.

Bucket sizes by plan

The numbers below are what we encode in lib/subscription-quotas.ts. Each carries a primary-source URL and a verified-at date; the canonical machine-readable list lives in the source file.

Claude

Claude FreeFree

Counter	Low	Typical	High	Unit	Source
5-hour session Rolling 5 hours from your first message.	8,000	12,000	20,000	tokens	link

Claude Pro$20/mo

Counter	Low	Typical	High	Unit	Source
5-hour session Rolling 5 hours from your first message.	35,000	44,000	55,000	tokens	link
Weekly (Sonnet/Haiku) Rolling 7 days from your first usage in the cycle.	225	600	1,100	msgs	link
Weekly Opus Pro Opus access is heavily limited.	20	50	100	msgs	link

Claude Max 5×$100/mo

Counter	Low	Typical	High	Unit	Source
5-hour session Rolling 5 hours from your first message.	70,000	88,000	110,000	tokens	link
Weekly (Sonnet/Haiku) Rolling 7 days from your first usage in the cycle.	1,100	3,000	5,000	msgs	link
Weekly Opus Separate 7-day Opus sub-cap (visible in Settings → Usage).	200	500	900	msgs	link

Claude Max 20×$200/mo

Counter	Low	Typical	High	Unit	Source
5-hour session Rolling 5 hours from your first message.	180,000	220,000	280,000	tokens	link
Weekly (Sonnet/Haiku) Rolling 7 days from your first usage in the cycle.	5,000	12,000	20,000	msgs	link
Weekly Opus Separate 7-day Opus sub-cap.	600	1,500	2,500	msgs	link

ChatGPT

ChatGPT FreeFree

Counter	Low	Typical	High	Unit	Source
3-hour Instant Rolling 3 hours; after exhaustion, auto-routes to nano.	25	35	50	msgs	link
5-hour Thinking Rolling 5 hours when Thinking is manually selected.	6	10	15	msgs	link

ChatGPT Plus$20/mo

Counter	Low	Typical	High	Unit	Source
3-hour Instant Rolling 3 hours for GPT-5 Instant + GPT-4 family.	80	160	240	msgs	link
Weekly Thinking 3,000 manually-selected Thinking msgs/week. Auto-routed Thinking is exempt.	2,000	3,000	4,000	msgs	link
Weekly reasoning (o3 / o4-mini) o3 and o4-mini share a separate weekly cap.	30	50	100	msgs	link

ChatGPT Pro$200/mo

Counter	Low	Typical	High	Unit	Source
3-hour Instant OpenAI describes Pro as near-unlimited; modeled as a large but finite bucket so percentages stay meaningful.	800	1,500	3,000	msgs	link
Weekly Thinking GPT-5 Thinking weekly cap remains 3,000 even on Pro.	2,000	3,000	4,000	msgs	link
Weekly reasoning (o3 / o4-mini) Pro retains a higher reasoning cap than Plus.	200	250	400	msgs	link

Gemini

Gemini FreeFree

Counter	Low	Typical	High	Unit	Source
Daily Pro Resets at midnight Pacific; 5 Pro prompts/day.	3	5	8	msgs	link
Daily Flash Resets at midnight Pacific.	50	100	200	msgs	link

Google AI Pro$20/mo

Counter	Low	Typical	High	Unit	Source
Daily Pro 100 Pro prompts/day. Resets at midnight Pacific.	60	100	150	msgs	link
Daily Flash Soft daily Flash cap. Heavily distributed across the day.	500	1,000	2,000	msgs	link

Google AI Ultra$250/mo

Counter	Low	Typical	High	Unit	Source
Daily Pro 500 Pro prompts/day. Resets at midnight Pacific.	300	500	800	msgs	link
Daily Flash Highest published Flash allowance.	2,000	5,000	10,000	msgs	link

Peak-hour windows

Anthropic confirmed in March 2026 that during weekday US business hours (05:00–11:00 PT, ≈ 13:00–19:00 UTC) Claude users move through their 5-hour session limits faster than off-peak, about 7% of users hit limits they previously would not. Community estimates put the multiplier at roughly 2×. The effect is on the 5-hour session counter only; weekly caps are unaffected.

OpenAI's File Uploads FAQ states caps may be lowered during peak hours without specifying a window. We model US business hours (13:00–22:00 UTC weekdays) at a softer 1.3× multiplier since the public confirmation is weaker.

Google publishes no peak window. The Gemini help text only notes that Free users may be throttled before paid users when capacity is constrained.

For Australian and APAC users specifically: Anthropic peak (13:00–19:00 UTC) maps to 23:00–05:00 AEDT or 00:00–06:00 AEST. The Australian working day is naturally off-peak, you should see ~2× more headroom on Claude's 5-hour session counter than US-based users during their working hours.

What we don't know

The honest list of limitations.

Tier bucket sizes for Claude. Community P90 estimates (~44k / 88k / 220k tokens per 5-hour window for Pro / Max 5× / Max 20×) come from Claude-Code-Usage-Monitor and have not been confirmed by Anthropic.
Peak-hour magnitude. Anthropic communicates the effect qualitatively. The 2× multiplier is a community estimate; the true figure may be lower or higher and may shift with policy changes.
Silent policy changes. GitHub issue #9094 documents an unannounced limit reduction in late September 2025 affecting roughly 30 reporting users. Providers re-tune limits without notice; we re-verify buckets against help-centre pages but there is always lag.
Hidden context. System prompts, project knowledge, memory features, automatic compaction and CLAUDE.md/instruction files all add tokens the user never sees. The simulator passes turn count only; real sessions with attached PDFs or large project context can land outside the band.
Tokeniser drift. Simon Willison measured a 1.46× token inflation on the Opus 4.7 system prompt vs Opus 4.6 with no API-pricing change, so the same prompt costs more on a newer model version.
Tool-call recursion. A single user prompt in agentic mode can spawn dozens of internal tool calls; the simulator does not yet model this and the band will under-estimate for agentic workloads.

When in doubt, treat the typical share as a midpoint and the high share as a worst-case for capacity planning. The band exists because the underlying numbers are uncertain, please do not treat any single percentage as a guarantee.

Bayesian posterior model (v1)

As of Invalid Date counters with sufficient evidence carry a posterior distribution over their bucket size, replacing the hand-coded band at runtime. The model is a Normal-Normal conjugate update on log(B) with three v1 refinements: (1) Student-t band inflation derived from a Normal-Inverse-Gamma prior on σ², so sparse-evidence counters honestly widen their bands instead of overconfidently tightening; (2) time-decay weightingon observations with a 1-year half-life so a complaint from before Anthropic introduced weekly caps (Aug 2025) or peak-hour throttling (Mar 2026) doesn't weight as heavily as one from this quarter; and (3) three evidence sources: Reddit complaint medians, Hacker News complaint medians (Algolia API), and Maciek-roboblog community P90, plus optional GitHub Issues evidence when the local gh CLI is authenticated.

# Per observation (Reddit / HN / GitHub claim):
weight_i = exp(−ln(2) × age_in_years)        # 1-year half-life
y_i      = log(claim × 1.5)                   # bias correction
σ_i      = base_σ_for_source / √n_eff_pooled

# Conjugate update on μ (Normal-Normal):
τ_post   = τ_prior + Σ τ_i                    # τ = 1/σ²
μ_post   = (μ_prior τ_prior + Σ y_i τ_i) / τ_post
σ_post   = 1 / √τ_post

# NIG-inspired Student-t band:
df         = 2 (α₀ + n_eff_total/2)            # α₀ = 2
inflation  = √(df / (df − 2))                  # Welch-style
σ_band     = σ_post × inflation                # widen for σ uncertainty

typical = exp(μ_post)
low     = exp(μ_post − 1.645 × σ_band)         # 5th percentile
high    = exp(μ_post + 1.645 × σ_band)         # 95th percentile

Per-counter posteriors are stored in data/quota-posteriors.json and re-derived offline by research/scripts/train-quota-bayesian.mts. Full math + sources at research/QUOTA_MODEL.md.

Trained posteriors

For each counter the table shows its prior (hardcoded) vs posterior (trained) typical and 90% band, plus the evidence count. Wide posterior bands indicate sparse data; tight bands indicate strong evidence.

Counter	Prior typical	Posterior typical	Posterior 5th–95th	n_eff	df	infl
Claude Pro 5-hour session	44,000	38,805	32,202–46,763	6.0	10.0	1.12×
Claude Pro Weekly Opus	50	24	12–48	1.0	5.0	1.29×
Claude Max 5× 5-hour session	88,000	91,133	73,782–112,565	2.0	6.0	1.22×
Claude Max 20× 5-hour session	220,000	220,000	176,531–274,173	1.0	5.0	1.29×
ChatGPT Free 3-hour Instant	35	31	21–44	1.0	5.0	1.29×
ChatGPT Plus 3-hour Instant	160	32	22–46	4.9	8.9	1.14×
ChatGPT Plus Weekly Thinking	3,000	1,995	1,401–2,841	1.0	5.0	1.29×
Google AI Pro Daily Pro	100	105	67–166	1.0	5.0	1.29×
Google AI Ultra Daily Pro	500	274	169–444	1.0	5.0	1.29×

v1 closed two of the v0 caveats: σ uncertainty is now modelled via the Student-t inflation, and time-decay downweights stale complaints with a 1-year half-life. Still open and documented in QUOTA_MODEL.md: (1) the 1.5× memory-drift correction on user claims is hand-set and unvalidated; (2) complaint-vs-counter routing is regex-based and noisy; (3) per-source σ is hand-set (Reddit 0.6, tokens 0.8, community P90 0.3) rather than learned from cross-validation residuals; (4) Reddit's self-selection bias (only complaints get posted) means posterior medians may be systematically lower than typical usage even after time-decay and inflation. v2 candidates: counter-routing classifier, cross-validation σ calibration, posterior on modelMultipliers (Opus drain rate, etc.).

Cross-validation against Reddit + HN + GitHub complaints

Every bucket in the table above is calibrated against provider documentation, news coverage, and community P90 measurements (Maciek-roboblog, IntuitionLabs). Those sources skew toward developer Claude Code users, a narrower population than the consumer chat audience that pays for Pro / Plus / Pro Gemini. Reddit is the largest publicly searchable pool of consumer-tier complaints, with much bigger n.

We run a periodic scrape against r/ClaudeAI, r/Anthropic, r/ChatGPT, r/OpenAI, r/Bard, r/GeminiAI, and r/singularity for posts that match six rate-limit queries, extract structured fields (provider, plan tier guess, claimed message count, peak-window flag, complaint type), drop usernames, and check whether each plan's typical bucket is consistent with the median complainer's claim. The script and methodology live in /research/ so the data is auditable; the corpus is regenerated by maintainers, not at request time, to respect Reddit's anti-scrape posture.

Caveat the corpus carries explicitly: self-selection bias (people only post when something breaks → the median floors usage, not typical), reporting bias (memory drift on “I only sent 8 messages”), time clustering around outage events, and plan-tier extraction is regex-based and imperfect. Treat it as one signal alongside provider documentation, not a ground truth.

Update cadence

Bucket sizes and peak windows are re-checked manually against provider help centres on a rolling cadence. Each counter carries its own verifiedAt date. When a provider announces a policy change, we update the relevant counter and bump its date. Material changes land in the release notes.

Spot a stale source or a bucket size you can verify against your own usage data? Tell us methodology improvements ship in the open.