Ratelimit Architecture #352

renuka-fernando · 2025-12-08T10:11:51Z

renuka-fernando
Dec 8, 2025
Collaborator

Instead of going with the envoy/ratelimt gRPC service, we can use a library or our own implementation. This eliminates the extra network hop and extra container, and allows rate limiting to be handled like any other policy.

renuka-fernando · 2025-12-08T10:17:52Z

renuka-fernando
Dec 8, 2025
Collaborator Author

Without special casing RataLimit, I thought to include the Envoy ratelimiter in the policy engine as a Policy Implementation (A user can include a gRPC or REST service as a policy implementation as well). So this fits well with our Policy story. In this case, envoy/ratelimit works as a library for Policy Engine. During request processing, the Policy Engine would invoke it via an in-process function call instead of making an external gRPC request.

If we are not using the Envoy rate limit filter (Envoy is not directly call the envoy/ratelimit service), there is no value in using envoy/ratelimit service. Anyway, this is going to be a library for Policy Engine if we are going as described above.

If we go with our own implementation, we can use any rate limit algorithm as well. envoy/ratelimit is using the Fixed Windows algorithm.

In my opinion going with a rate limit library or our own implementation is better.

0 replies

renuka-fernando · 2026-01-05T16:45:42Z

renuka-fernando
Jan 5, 2026
Collaborator Author

Rate Limit Policy Implementation - v0.1.0

Following up on the discussion about the rate limiting architecture, I've implemented the rate limit policy as proposed - integrating it directly into the Policy Engine rather than using the envoy/ratelimit gRPC service.

Implementation Overview

The rate limit policy (gateway/policies/ratelimit/v0.1.0/) implements all the key benefits discussed:

1. No Extra Network Latency

Rate limiting logic runs in-process within the Policy Engine. There's no inter-service gRPC call to an external rate limiter - the check happens directly during request processing via the extproc filter.

2. No Additional Container Required

The rate limiter is compiled into the Policy Engine binary. For single-instance deployments, no Redis is needed - it uses in-memory storage.

3. Multiple Algorithm Support

Unlike Envoy's built-in Fixed Windows approach, this implementation supports:

Algorithm	Description	Best For
GCRA (default)	Generic Cell Rate Algorithm with token bucket semantics	Smooth traffic shaping, burst handling
Fixed Window	Simple counter per time window	Lower computational overhead, simpler use cases

# System configuration to select algorithm
systemParameters:
  algorithm: "gcra"  # or "fixed-window"

4. Flexible Key Extraction

Supports multiple key components that can be combined:

keyExtraction:
  - type: header
    key: "x-api-key"
  - type: ip
  - type: apiname
  - type: apiversion
  - type: routename
  - type: metadata
    key: "user-id"

Multiple components are joined with : for composite keys (e.g., my-api-key:192.168.1.1:PetStore).

5. Multiple Concurrent Limits

Supports enforcing multiple limits simultaneously:

limits:
  - limit: 10
    duration: "1s"
  - limit: 1000
    duration: "1h"

All limits are evaluated, and the most restrictive one is enforced.

6. Weighted Rate Limiting (Cost Parameter)

Different operations can consume different amounts of quota:

# Expensive operation consumes 10 tokens
- method: POST
  path: /analytics/report
  requestPolicies:
    - name: ratelimit
      params:
        cost: 10
        limits:
          - limit: 100
            duration: "1h"

# Simple query consumes 1 token (default)
- method: GET
  path: /users/{id}
  requestPolicies:
    - name: ratelimit
      params:
        limits:
          - limit: 100
            duration: "1h"

7. Dual Backend Support

Backend	Use Case	Distributed
memory (default)	Single-instance gateway	No
redis	Multi-instance gateway cluster	Yes

Redis backend features:

Atomic operations via Lua scripts (GCRA) or native Redis commands (Fixed Window)
Configurable fail-open/fail-closed behavior
Connection pooling and timeout configuration

systemParameters:
  backend: "redis"
  redis:
    host: "redis.example.com"
    port: 6379
    failureMode: "open"  # Allow requests if Redis is unavailable

8. Comprehensive Rate Limit Headers

Supports multiple header standards:

X-RateLimit-* (de facto industry standard)
RateLimit-* (IETF draft standard - RFC draft-ietf-httpapi-ratelimit-headers)
Retry-After (RFC 7231)

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1704067200
RateLimit-Limit: 100
RateLimit-Remaining: 95
RateLimit-Reset: 60
RateLimit-Policy: 100;w=3600
Retry-After: 60

Example Usage

Basic Rate Limiting (per route)

operations:
  - method: GET
    path: /api/users
    requestPolicies:
      - name: ratelimit
        params:
          limits:
            - limit: 100
              duration: "1m"

Per-User Rate Limiting with API Key

operations:
  - method: GET
    path: /api/users
    requestPolicies:
      - name: ratelimit
        params:
          keyExtraction:
            - type: header
              key: "x-api-key"
          limits:
            - limit: 1000
              duration: "1h"

IP-Based Rate Limiting with Multiple Limits

operations:
  - method: POST
    path: /api/login
    requestPolicies:
      - name: ratelimit
        params:
          keyExtraction:
            - type: ip
          limits:
            - limit: 5
              duration: "1m"
              burst: 10
            - limit: 100
              duration: "1h"

0 replies

renuka-fernando · 2026-01-06T01:54:25Z

renuka-fernando
Jan 6, 2026
Collaborator Author

Go Distributed Rate Limiting Libraries

Overview

This document provides a comprehensive comparison of open-source Go libraries that support global distributed rate limiting using Redis or other cache backends.

Complete Library Comparison Table

Library	Stars	Forks	Backend Support	Algorithms	HTTP Middleware	Last Updated	Key Features
ulule/limiter	2,300	158	Redis, Memcache, Memory, DynamoDB	Sliding Window	✅ Yes	Active	Multi-backend, production-ready
juju/ratelimit	2,900	317	Memory only	Token Bucket	❌ No	Active	Efficient single-instance, used in many projects
throttled/throttled	1,600	97	Redis, Memcache, Memory	GCRA/Cell Rate	✅ Yes	Active	HTTP-focused, flexible
go-redis/redis_rate	970	102	Redis only	GCRA (Leaky Bucket)	❌ No	Active	Official go-redis integration
mailgun/gubernator	960	96	Self-contained cluster	Token Bucket, Leaky Bucket	✅ Yes (gRPC/HTTP)	Active	Distributed microservice, no external cache needed
sethvargo/go-limiter	688	49	Redis, Memory	Token Bucket	✅ Yes	Active	Flexible, configurable
mennanov/limiters	604	59	Redis, Memcache, DynamoDB, Cosmos DB, Memory	Token Bucket, Leaky Bucket, Sliding Window, Concurrent Buffer	❌ No	Active	Most backend options, distributed locks
RussellLuo/slidingwindow	406	41	Redis, Memory	Sliding Window (Kong algorithm)	❌ No	Active	Kong-style implementation
teambition/ratelimiter-go	98	15	Redis, Memory	Token Bucket	❌ No	Moderate	Fast, minimal
vearne/ratelimit	66	9	Redis	Token Bucket, Leaky Bucket	❌ No	Moderate	Simple distributed implementation
wallstreetcn/rate	50	13	Redis	Token Bucket (Stripe-style)	❌ No	Low	Based on Stripe's approach
Shareed2k/go_limiter	20	7	Redis	GCRA, Sliding Window	❌ No	Low	Multiple algorithms
RussellLuo/ratelimiter	19	3	Redis	GCRA, Token Bucket	❌ No	Low	Multiple algorithms
senderista/sliding-window-rate-limiter	5	1	Redis	Sliding Window	❌ No	Low	Educational/experimental

Top Recommendations by Use Case

1. Best Overall for Distributed Rate Limiting

ulule/limiter (2.3k ⭐, 158 forks)

Why: Most popular dedicated rate limiting library, multiple backends, production-ready
Backends: Redis, Memcache, Memory, DynamoDB
Use when: You need flexibility in backend choice with HTTP middleware support
GitHub: https://github.com/ulule/limiter

2. Best for Redis-Only Solutions

go-redis/redis_rate (970 ⭐, 102 forks)

Why: Official integration with go-redis, implements GCRA algorithm
Backends: Redis only
Use when: You're already using go-redis and want tight integration
GitHub: https://github.com/go-redis/redis_rate

3. Best for HTTP APIs

throttled/throttled (1.6k ⭐, 97 forks)

Why: HTTP-focused design with GCRA algorithm
Backends: Redis, Memcache, Memory
Use when: Building HTTP/REST APIs with rate limiting requirements
GitHub: https://github.com/throttled/throttled

4. Best for Microservices Architecture

mailgun/gubernator (960 ⭐, 96 forks)

Why: Self-contained distributed system, no external cache dependency
Backends: Self-managed cluster (peer-to-peer)
Use when: Running in Kubernetes/orchestrated environments, want to avoid Redis dependency
GitHub: https://github.com/mailgun/gubernator

5. Most Backend Options

mennanov/limiters (604 ⭐, 59 forks)

Why: Supports the most backends including DynamoDB and Cosmos DB
Backends: Redis, Memcache, DynamoDB, Cosmos DB, Memory
Use when: You need non-Redis backends or distributed locks
GitHub: https://github.com/mennanov/limiters

6. Best for Single-Instance Performance

juju/ratelimit (2.9k ⭐, 317 forks)

Why: Most stars, highly efficient token bucket, widely used
Backends: Memory only (not distributed)
Use when: Single-instance rate limiting with excellent performance
Note: Not distributed, but often used as a building block
GitHub: https://github.com/juju/ratelimit

Algorithm Comparison

GCRA (Generic Cell Rate Algorithm)

Libraries: go-redis/redis_rate, throttled/throttled, Shareed2k/go_limiter
Pros: Smooth rate limiting, no bursts, memory efficient
Cons: More complex to understand
Best for: Strict rate limits without bursts

Token Bucket

Libraries: juju/ratelimit, mennanov/limiters, teambition/ratelimiter-go, wallstreetcn/rate
Pros: Allows bursts, simple to understand
Cons: Can allow sudden bursts
Best for: APIs that allow occasional bursts

Sliding Window

Libraries: ulule/limiter, RussellLuo/slidingwindow, mennanov/limiters
Pros: More accurate than fixed window, allows some bursts
Cons: More memory usage than GCRA
Best for: Balancing accuracy and burst tolerance

Leaky Bucket

Libraries: go-redis/redis_rate (as GCRA), vearne/ratelimit, mennanov/limiters
Pros: Smooth output rate
Cons: No burst support
Best for: Steady-state rate limiting

Feature Matrix

Feature	ulule/limiter	go-redis/redis_rate	throttled	gubernator	mennanov/limiters
Redis Support	✅	✅	✅	❌	✅
Memcache Support	✅	❌	✅	❌	✅
DynamoDB Support	✅	❌	❌	❌	✅
In-Memory Support	✅	❌	✅	✅ (cluster)	✅
HTTP Middleware	✅	❌	✅	✅	❌
gRPC Support	❌	❌	❌	✅	❌
Distributed Locks	❌	❌	❌	✅	✅
Multiple Algorithms	❌	✅ (GCRA variants)	✅	✅	✅
Active Maintenance	✅	✅	✅	✅	✅

Decision Guide

Choose ulule/limiter if:

You want the most popular, battle-tested solution
You need HTTP middleware support
You want flexibility in backend choice
You prefer sliding window algorithm

Choose go-redis/redis_rate if:

You're already using go-redis
You want GCRA algorithm for smooth rate limiting
Redis is your only backend requirement
You prefer official integrations

Choose throttled/throttled if:

You're building HTTP APIs specifically
You want GCRA algorithm with HTTP middleware
You need flexible quota policies

Choose mailgun/gubernator if:

You're running in Kubernetes/orchestrated environments
You want to avoid external cache dependencies
You need both gRPC and HTTP support
You need high-performance distributed rate limiting

Choose mennanov/limiters if:

You need DynamoDB or Cosmos DB backends
You require distributed locks
You want multiple algorithm options
You need fine-grained control

Choose juju/ratelimit if:

You only need single-instance rate limiting
You want the most efficient token bucket implementation
You're building the backend yourself on top of it

Maintenance and Community Health

Actively Maintained (2024-2025):

ulule/limiter ✅
go-redis/redis_rate ✅
throttled/throttled ✅
mailgun/gubernator ✅
sethvargo/go-limiter ✅
mennanov/limiters ✅

Moderately Active:

juju/ratelimit (mature, stable)
RussellLuo/slidingwindow
teambition/ratelimiter-go
vearne/ratelimit

Lower Activity:

wallstreetcn/rate
Shareed2k/go_limiter
RussellLuo/ratelimiter
senderista/sliding-window-rate-limiter

Final Recommendation

For distributed rate limiting with Redis, the top 3 choices are:

ulule/limiter - Best overall choice for most use cases
go-redis/redis_rate - Best if already using go-redis
throttled/throttled - Best for HTTP-focused applications

For cloud-native/Kubernetes deployments without external cache:

mailgun/gubernator - Self-contained distributed solution

For maximum backend flexibility:

mennanov/limiters - Supports Redis, Memcache, DynamoDB, Cosmos DB

Sources

0 replies

renuka-fernando · 2026-01-06T01:55:05Z

renuka-fernando
Jan 6, 2026
Collaborator Author

Rate Limiting Algorithms

Overview

This document provides an in-depth explanation of the major rate limiting algorithms used in distributed systems, particularly in Go libraries for Redis-backed rate limiting.

1. Token Bucket Algorithm

Description

The Token Bucket algorithm maintains a bucket that holds tokens. Tokens are added to the bucket at a fixed rate up to a maximum capacity. Each request consumes one or more tokens. If tokens are available, the request is allowed; otherwise, it's denied or delayed.

How It Works

1. Initialize bucket with capacity C and refill rate R
2. Start with bucket full (C tokens)
3. For each request:
   - Add tokens based on time elapsed: tokens = min(C, current_tokens + (elapsed_time * R))
   - If tokens >= request_cost:
       - Consume tokens
       - Allow request
   - Else:
       - Deny request
4. Refill tokens continuously at rate R

Visual Representation

Bucket Capacity: 10 tokens
Refill Rate: 2 tokens/second

Time 0s:  [••••••••••] 10 tokens
Request:  [••••••••••] → Allow (10 tokens)
Time 1s:  [••        ] 2 tokens (consumed 8, refilled 0)
Time 2s:  [••••      ] 4 tokens (refilled 2)
Time 3s:  [••••••    ] 6 tokens (refilled 2)
Request:  [••••••    ] → Allow (6 tokens)

Characteristics

Pros:

Allows bursts up to bucket capacity
Simple to understand and implement
Smooth handling of traffic spikes
Memory efficient (stores only 2 values: tokens and timestamp)
Works well with variable request costs

Cons:

Can allow sudden bursts that might overwhelm downstream services
Requires careful tuning of bucket size vs refill rate
In distributed systems, requires synchronized state

Use Cases

APIs that need to allow occasional bursts
Systems where brief traffic spikes are acceptable
Rate limiting with different costs per operation
User-facing APIs (better UX than strict limiting)

Implementation in Go Libraries

Libraries using Token Bucket:

juju/ratelimit - Most popular implementation
mennanov/limiters - Supports token bucket variant
teambition/ratelimiter-go - Fast token bucket
wallstreetcn/rate - Based on Stripe's implementation
mailgun/gubernator - Supports token bucket

Example Configuration

// Bucket capacity: 100 tokens
// Refill rate: 10 tokens/second
// Allows bursts of 100 requests
// Sustained rate: 10 req/s

capacity := 100
refillRate := 10 // per second

Mathematical Formula

Available tokens at time t:
T(t) = min(C, T(t-1) + R × Δt)

where:
- C = bucket capacity (max tokens)
- R = refill rate (tokens per second)
- Δt = time elapsed since last update
- T(t-1) = tokens at previous time

2. Leaky Bucket Algorithm

Description

The Leaky Bucket algorithm models a bucket with a hole at the bottom. Requests are added to the bucket, and they "leak out" at a constant rate. If the bucket overflows, excess requests are discarded. This ensures a smooth, constant output rate.

How It Works

1. Initialize bucket with capacity C and leak rate R
2. For each request:
   - Calculate leaked amount: leaked = elapsed_time * R
   - Current level = max(0, previous_level - leaked)
   - If current_level + 1 <= C:
       - Add request to bucket
       - Allow request
   - Else:
       - Deny request (bucket full)
3. Process requests from bucket at constant rate R

Visual Representation

Bucket Capacity: 10 requests
Leak Rate: 2 requests/second

Time 0s:  [          ] Empty
Request:  [█         ] Added (1/10)
Request:  [██        ] Added (2/10)
Request:  [███       ] Added (3/10)
Time 1s:  [█         ] Leaked 2 requests (1/10 remain)
Request:  [██        ] Added (2/10)
Time 2s:  [          ] Leaked 2 requests (0/10)

Characteristics

Pros:

Guarantees constant output rate
Smooths out traffic bursts
Protects downstream services from overload
Simple to reason about
Predictable behavior

Cons:

No burst allowance (strict rate enforcement)
Can lead to higher latency (requests wait in queue)
May drop requests during traffic spikes
Less flexible than token bucket
Poor user experience during legitimate bursts

Use Cases

Protecting backend services with strict capacity limits
Message queue rate limiting
Network traffic shaping
Systems requiring guaranteed smooth rate
Preventing thundering herd problems

Implementation in Go Libraries

Libraries using Leaky Bucket:

go-redis/redis_rate - Implements as GCRA variant
mennanov/limiters - Supports leaky bucket
vearne/ratelimit - Supports leaky bucket
mailgun/gubernator - Supports leaky bucket variant

Leaky Bucket vs Token Bucket

Aspect	Leaky Bucket	Token Bucket
Output Rate	Constant	Variable (up to burst)
Bursts	Not allowed	Allowed
Queue Behavior	Requests queue up	Tokens accumulate
Strictness	More strict	More flexible
User Experience	Can cause delays	Better for spikes

Mathematical Formula

Current level at time t:
L(t) = max(0, L(t-1) - R × Δt) + 1

Allow request if:
L(t) ≤ C

where:
- C = bucket capacity
- R = leak rate (requests per second)
- Δt = time elapsed since last update
- L(t-1) = level at previous time

3. GCRA (Generic Cell Rate Algorithm)

Description

GCRA (Generic Cell Rate Algorithm), also known as the Virtual Scheduling Algorithm, is a sophisticated rate limiting algorithm originally designed for ATM networks. It's mathematically equivalent to the leaky bucket but uses a different approach: instead of tracking bucket level, it tracks the "theoretical arrival time" (TAT) of the next allowed request.

How It Works

1. Initialize:
   - TAT (Theoretical Arrival Time) = 0
   - Emission Interval (EI) = 1 / rate
   - Burst allowance

2. For each request at time t:
   - TAT = max(TAT, t)
   - If TAT - t <= burst_allowance:
       - TAT = TAT + EI
       - Allow request
   - Else:
       - Deny request (rate exceeded)

Visual Representation

Rate: 10 req/s, Burst: 5
Emission Interval (EI) = 0.1s

Request at t=0.0s:  TAT=0.0, Allow, new TAT=0.1
Request at t=0.05s: TAT=0.1, Allow, new TAT=0.2
Request at t=0.1s:  TAT=0.2, Allow, new TAT=0.3
Request at t=0.15s: TAT=0.3, Allow, new TAT=0.4
Request at t=0.2s:  TAT=0.4, Allow, new TAT=0.5 (burst used)
Request at t=0.25s: TAT=0.5, Deny (TAT - t = 0.25 > burst allowance)

Characteristics

Pros:

Memory efficient (only stores TAT timestamp)
No background cleanup needed
Mathematically precise
Smooth rate limiting without bursts
Perfect for distributed systems (single atomic operation)
Works well with Redis (single value to store)

Cons:

More complex to understand than token bucket
Less intuitive for developers
Harder to explain to non-technical stakeholders
Strict enforcement (like leaky bucket)

Use Cases

APIs with strict rate limits (like Stripe, GitHub)
Distributed rate limiting with Redis
Systems requiring precise rate control
High-performance rate limiting
Avoiding rate limit abuse

Implementation in Go Libraries

Libraries using GCRA:

go-redis/redis_rate - Primary algorithm (based on redis-gcra)
throttled/throttled - Uses GCRA (cell rate algorithm)
Shareed2k/go_limiter - Supports GCRA
RussellLuo/ratelimiter - Supports GCRA

GCRA vs Leaky Bucket

While mathematically equivalent, they differ in implementation:

Aspect	GCRA	Leaky Bucket
Storage	TAT (1 timestamp)	Queue level
Computation	Forward-looking	Current state
Memory	Minimal	More (queue)
Background Work	None	Queue processing
Redis Operations	Single SET	Multiple ops

Mathematical Formula

Emission Interval:
EI = 1 / rate

Theoretical Arrival Time update:
TAT_new = max(TAT_old, now) + EI

Allow request if:
TAT_old - now ≤ burst_capacity × EI

where:
- EI = emission interval (time between requests)
- TAT = theoretical arrival time
- now = current timestamp
- burst_capacity = number of requests allowed in burst

Redis Implementation

GCRA is particularly well-suited for Redis because it requires only a single value:

-- Redis Lua script for GCRA
local tat = redis.call('GET', KEYS[1])
local now = tonumber(ARGV[1])
local emission_interval = tonumber(ARGV[2])
local burst_capacity = tonumber(ARGV[3])

if tat == false then
  tat = now
else
  tat = tonumber(tat)
end

tat = math.max(tat, now)
local allow_at = tat - (burst_capacity * emission_interval)

if now >= allow_at then
  local new_tat = tat + emission_interval
  redis.call('SET', KEYS[1], new_tat)
  return 1  -- allowed
else
  return 0  -- denied
end

4. Fixed Window Counter

Description

The Fixed Window Counter divides time into fixed windows (e.g., 1-minute intervals) and counts requests within each window. When a window ends, the counter resets.

How It Works

1. Define window size (e.g., 60 seconds)
2. For each request:
   - Calculate current window: window = floor(current_time / window_size)
   - Get counter for current window
   - If counter < limit:
       - Increment counter
       - Allow request
   - Else:
       - Deny request
3. Old windows automatically expire

Visual Representation

Limit: 10 req/minute
Window: 1 minute

Window 1 (0:00-0:59):  ████████ (8 requests) ✓
Window 2 (1:00-1:59):  ████████████ (12 requests) ✗ (2 denied)
Window 3 (2:00-2:59):  ████ (4 requests) ✓

Boundary issue:
0:50 - 5 requests  } Total: 15 requests
1:10 - 10 requests } in 20 seconds (issue!)

Characteristics

Pros:

Extremely simple to implement
Low memory usage (one counter per window)
Easy to understand and explain
Fast lookups
Works well with Redis (INCR + EXPIRE)

Cons:

Boundary issues (2× rate possible at window edges)
Sudden reset allows bursts
Not smooth distribution
Can be gamed by coordinating requests at window boundaries

Use Cases

Simple rate limiting where precision isn't critical
Analytics and counting
Resource quotas (daily API limits)
When simplicity is more important than accuracy

Boundary Problem Example

Limit: 100 requests/minute

0:59:30 - 50 requests   } Window 1
0:59:59 - 50 requests   } Total: 100 (allowed)

1:00:00 - Reset
1:00:01 - 100 requests  } Window 2 (allowed)

Result: 200 requests in ~30 seconds!

Mathematical Formula

Window key:
window_id = floor(current_timestamp / window_size)

Allow request if:
counter[window_id] < limit

Update:
counter[window_id]++

5. Sliding Window Log

Description

The Sliding Window Log keeps a timestamp log of all requests within the time window. For each new request, it removes expired timestamps and checks if the count is below the limit.

How It Works

1. Maintain a log (list) of request timestamps
2. For each request at time t:
   - Remove all timestamps older than (t - window_size)
   - If remaining_count < limit:
       - Add current timestamp to log
       - Allow request
   - Else:
       - Deny request

Visual Representation

Limit: 5 requests per 10 seconds
Current time: 15s

Log: [6s, 7s, 10s, 12s, 14s]
         ↑ Remove (older than 5s)

Cleaned log: [7s, 10s, 12s, 14s] (4 requests)
New request at 15s: Add to log → Allow
Log: [7s, 10s, 12s, 14s, 15s] (5 requests)

Characteristics

Pros:

Most accurate algorithm
No boundary issues
True sliding window
Perfect for audit trails
Fair distribution

Cons:

High memory usage (stores all timestamps)
O(n) complexity for cleanup
Expensive in distributed systems
Poor performance at high rates
Not suitable for high-volume APIs

Use Cases

Low-volume, high-precision rate limiting
Audit and compliance requirements
Fair usage enforcement
Research and analysis
When memory isn't a constraint

Memory Considerations

For 1000 req/s limit over 1-second window:
- Stores up to 1000 timestamps
- Each timestamp: ~8 bytes (int64)
- Memory per key: ~8 KB

For 1 million users:
- Total memory: 8 GB (not scalable!)

Mathematical Formula

Log cleanup:
log = filter(log, timestamp > now - window_size)

Allow request if:
len(log) < limit

Update:
log.append(now)

6. Sliding Window Counter

Description

The Sliding Window Counter (also called Sliding Window with Counter) is a hybrid approach that combines Fixed Window efficiency with Sliding Window accuracy. It uses two counters (current and previous window) and interpolates between them.

How It Works

1. Maintain two counters: current_window and previous_window
2. For each request at time t:
   - Calculate position in current window (0.0 to 1.0)
   - Estimate count = previous_window × (1 - position) + current_window
   - If estimated_count < limit:
       - Increment current_window
       - Allow request
   - Else:
       - Deny request
3. When window rolls over, previous = current, current = 0

Visual Representation

Limit: 100 requests per minute
Current time: 12:00:45 (75% through minute)

Previous window (11:00-11:59): 60 requests
Current window (12:00-12:59): 30 requests (so far)

Estimate = 60 × (1 - 0.75) + 30
         = 60 × 0.25 + 30
         = 15 + 30
         = 45 requests

Allow request: 45 < 100 ✓

Visual Timeline

Window Size: 60s
Time: 45s into current window

Previous [════════════════] 60 req (11:00-11:59)
                      ↑
Current  [════════════════] 30 req (12:00-12:59)
         └─────45s────┘

Sliding window estimate:
- Weight from previous: 25% (15s overlap)
- Weight from current: 100% (45s)
- Total estimate: 60×0.25 + 30 = 45 requests

Characteristics

Pros:

More accurate than fixed window
Much more efficient than sliding log
Smooth rate limiting
Low memory (only 2 counters)
Good balance of accuracy and performance
Works well with Redis

Cons:

Approximate (not perfectly accurate)
Slightly more complex than fixed window
Still has minor boundary issues
Can allow ~1-2% over limit in edge cases

Use Cases

High-volume APIs needing better accuracy than fixed window
Distributed rate limiting with Redis
Production systems (good accuracy/performance trade-off)
Kong API Gateway (uses this algorithm)
Most real-world rate limiting scenarios

Implementation in Go Libraries

Libraries using Sliding Window Counter:

ulule/limiter - Default algorithm
RussellLuo/slidingwindow - Based on Kong's implementation
mennanov/limiters - Supports sliding window variant
Shareed2k/go_limiter - Supports sliding window

Accuracy Analysis

Fixed Window worst case: 200% of limit (at boundaries)
Sliding Window Counter worst case: ~101-102% of limit
Sliding Window Log: 100% accurate (exactly at limit)

Example with 100 req/min limit:
- Fixed Window: Could allow 200 req in 2 seconds
- Sliding Counter: Could allow 101-102 req in worst case
- Sliding Log: Exactly 100 req maximum

Mathematical Formula

Window position (0.0 to 1.0):
position = (now % window_size) / window_size

Estimated count:
count = previous_window × (1 - position) + current_window

Allow request if:
count < limit

Update:
current_window++

Redis Implementation

-- Two Redis keys: current and previous window
local current_key = KEYS[1]
local previous_key = KEYS[2]
local limit = tonumber(ARGV[1])
local window_size = tonumber(ARGV[2])
local now = tonumber(ARGV[3])

local current_count = tonumber(redis.call('GET', current_key) or 0)
local previous_count = tonumber(redis.call('GET', previous_key) or 0)

local position = (now % window_size) / window_size
local estimate = previous_count * (1 - position) + current_count

if estimate < limit then
  redis.call('INCR', current_key)
  redis.call('EXPIRE', current_key, window_size * 2)
  return 1  -- allowed
else
  return 0  -- denied
end

7. Comparison Matrix

Quick Reference Table

Algorithm	Accuracy	Memory	Performance	Burst Support	Boundary Issues	Complexity
Token Bucket	Good	Very Low	Excellent	Yes	Minor	Low
Leaky Bucket	Excellent	Low	Excellent	No	None	Low
GCRA	Excellent	Very Low	Excellent	Configurable	None	Medium
Fixed Window	Poor	Very Low	Excellent	Yes (at reset)	Severe	Very Low
Sliding Log	Perfect	Very High	Poor	No	None	High
Sliding Counter	Very Good	Low	Excellent	Partial	Minor	Medium

Detailed Comparison

Accuracy Spectrum

Least Accurate                          Most Accurate
│                                                    │
Fixed Window ─── Token Bucket ─── Sliding Counter ─── GCRA ─── Sliding Log
     40%              70%                 95%          99%        100%

Memory Usage

Lowest Memory                          Highest Memory
│                                                    │
GCRA/Token ─── Fixed Window ─── Sliding Counter ─── Leaky Bucket ─── Sliding Log
  O(1)           O(1)              O(1)                O(n)            O(n)

Performance Ranking

Fastest                                     Slowest
│                                                   │
Fixed Window ─── Token Bucket ─── GCRA ─── Sliding Counter ─── Sliding Log
  O(1)              O(1)          O(1)         O(1)              O(n)

Use Case Matrix

Use Case	Recommended Algorithm	Alternative
API rate limiting (general)	Sliding Window Counter	GCRA
Strict SLA enforcement	GCRA	Leaky Bucket
User-facing APIs	Token Bucket	Sliding Counter
High-volume distributed	GCRA	Sliding Counter
Allow traffic bursts	Token Bucket	Fixed Window
Smooth output rate	Leaky Bucket	GCRA
Simple implementation	Fixed Window	Token Bucket
Audit/compliance	Sliding Log	Sliding Counter
Redis-backed	GCRA	Sliding Counter
Kubernetes rate limiting	Token Bucket	GCRA

8. Algorithm Selection Guide

Decision Tree

Start Here
│
├─ Need perfect accuracy?
│  ├─ Yes → Sliding Window Log
│  └─ No → Continue
│
├─ Need to allow bursts?
│  ├─ Yes → Token Bucket
│  └─ No → Continue
│
├─ Need smooth output rate?
│  ├─ Yes → GCRA or Leaky Bucket
│  └─ No → Continue
│
├─ High volume (>1000 req/s)?
│  ├─ Yes → GCRA
│  └─ No → Sliding Window Counter
│
└─ Simplicity most important?
   └─ Yes → Fixed Window (with caution)

By Deployment Type

Single Instance

Best: Token Bucket (juju/ratelimit)
Why: Simple, efficient, no coordination needed
Alternative: golang.org/x/time/rate

Distributed (Redis-backed)

Best: GCRA (go-redis/redis_rate)
Why: Single atomic operation, memory efficient
Alternative: Sliding Window Counter (ulule/limiter)

Kubernetes/Microservices

Best: Gubernator (self-contained)
Why: No external dependencies, peer-to-peer
Alternative: GCRA with Redis

Multi-Cloud/Multi-Region

Best: Sliding Window Counter with Redis
Why: Handles network latency well, approximation acceptable
Alternative: GCRA with Redis Cluster

By Use Case

1. Public API (like Stripe, GitHub)

Algorithm: GCRA
Library: go-redis/redis_rate or throttled/throttled
Reason: Precise, abuse-resistant, industry standard
Config: 100 req/min, burst 10

2. Internal Microservice

Algorithm: Token Bucket
Library: juju/ratelimit or golang.org/x/time/rate
Reason: Simple, allows bursts, local
Config: 1000 req/s, burst 100

3. User Dashboard/Web App

Algorithm: Sliding Window Counter
Library: ulule/limiter
Reason: Good UX (allows some bursts), accurate enough
Config: 10 req/s, 60-second window

4. Background Job Queue

Algorithm: Leaky Bucket
Library: mennanov/limiters
Reason: Smooth processing rate, queue management
Config: 50 jobs/s, no burst

5. DDoS Protection

Algorithm: Fixed Window (first line) + GCRA (second line)
Library: Multiple layers
Reason: Fast rejection + precise limiting
Config: 10000 req/s coarse, 1000 req/s precise

Performance Characteristics

Latency Requirements

Ultra-low latency (<1ms): Token Bucket, Fixed Window
Low latency (<10ms): GCRA, Sliding Counter
Medium latency (<100ms): Sliding Log (small windows)

Throughput Requirements

Very high (>100k req/s): Token Bucket, GCRA
High (>10k req/s): Sliding Counter, Fixed Window
Medium (<10k req/s): Any algorithm
Low (<1k req/s): Sliding Log acceptable

Implementation Complexity

Development Time

Fastest to implement:
1. Fixed Window (30 minutes)
2. Token Bucket (1-2 hours)
3. Sliding Counter (2-4 hours)
4. GCRA (4-8 hours)
5. Leaky Bucket (4-8 hours)
6. Sliding Log (8-16 hours)

Maintenance Burden

Lowest maintenance:
1. Token Bucket (simple logic)
2. Fixed Window (minimal state)
3. GCRA (no cleanup needed)
4. Sliding Counter (window rotation)
5. Leaky Bucket (queue management)
6. Sliding Log (cleanup intensive)

Real-World Examples

Example 1: GitHub API

Algorithm: GCRA
Limit: 5000 requests/hour
Burst: None
Headers: X-RateLimit-Remaining, X-RateLimit-Reset
Why: Strict enforcement, prevents abuse

Example 2: Stripe API

Algorithm: Token Bucket (with GCRA characteristics)
Limit: 100 requests/second
Burst: Small allowance
Why: Balance between strictness and UX

Example 3: Kong API Gateway

Algorithm: Sliding Window Counter
Limit: Configurable
Why: Good balance of accuracy and performance

Example 4: Cloudflare

Algorithm: Multiple layers (Fixed + GCRA)
Why: Fast DDoS rejection + precise limiting

Testing Rate Limiters

Test Scenarios

1. Burst Test

Send limit × 2 requests instantly
Expected: First 'limit' allowed, rest denied
Algorithms that pass: Token Bucket, Fixed Window
Algorithms that fail: GCRA, Leaky Bucket

2. Sustained Load Test

Send requests at exactly 'rate' for extended time
Expected: All requests allowed
Algorithms that pass: All

3. Boundary Test

Send requests at window boundaries
Expected: No more than limit in any sliding window
Algorithms that pass: Sliding Log, GCRA, Sliding Counter
Algorithms that fail: Fixed Window

4. Distributed Test

Send requests from multiple clients simultaneously
Expected: Total across all clients <= limit
Algorithms that work well: GCRA (atomic), Sliding Counter
Algorithms that struggle: Token Bucket (race conditions)

Summary

Quick Recommendations

For most use cases: Sliding Window Counter (ulule/limiter)

Good accuracy, efficient, production-ready

For strict rate limiting: GCRA (go-redis/redis_rate)

Most accurate, memory efficient, industry standard

For allowing bursts: Token Bucket (juju/ratelimit)

Simple, intuitive, good user experience

For simplicity: Fixed Window

Easy to implement, but use with caution

Avoid: Sliding Window Log (unless very low volume)

Memory intensive, poor performance at scale

9. Remaining Request Tracking

Overview

A critical feature of rate limiting APIs is the ability to inform clients about how many requests they have remaining in the current time window. This section explains how each algorithm calculates and exposes this information.

Why Tracking Remaining Requests Matters

Benefits:

Better UX: Clients can pace their requests intelligently
Reduced errors: Prevents unnecessary 429 (Too Many Requests) errors
Debugging: Helps identify rate limiting issues quickly
API transparency: Builds trust by showing clear limits
Backoff strategies: Enables smart retry logic

Standard HTTP Headers (RFC 6585):

X-RateLimit-Limit: 100        # Maximum requests allowed
X-RateLimit-Remaining: 75     # Requests remaining in window
X-RateLimit-Reset: 1638360000 # When the limit resets (Unix timestamp)
Retry-After: 30               # Seconds until next request allowed

Algorithm-Specific Tracking Capabilities

1. Token Bucket - Excellent Tracking

What it tracks:

Remaining tokens (exact)
Time until next token refill
Time until bucket is full

Calculation:

def get_remaining_info():
    # Calculate current tokens
    elapsed = now() - last_update
    current_tokens = min(capacity,
                        tokens + elapsed * refill_rate)

    return {
        'remaining': floor(current_tokens),
        'limit': capacity,
        'reset': now() + (capacity - current_tokens) / refill_rate,
        'retry_after': 0 if current_tokens >= 1 else
                      (1 - current_tokens) / refill_rate
    }

Response Headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 45
X-RateLimit-Reset: 1638360050
Retry-After: 0

Pros:

Exact remaining count
Can predict when tokens will be available
Smooth degradation (can show fractional tokens)

Cons:

In distributed systems, remaining count might be slightly inaccurate due to race conditions

Go Library Support:

// juju/ratelimit example
bucket := ratelimit.NewBucket(time.Second/10, 100)

remaining := bucket.Available()  // Get remaining tokens
waitTime := bucket.Take(1)       // Time to wait for token

2. Leaky Bucket - Moderate Tracking

What it tracks:

Current bucket level
Available capacity
Time until space is available

Calculation:

def get_remaining_info():
    # Calculate current level after leaking
    elapsed = now() - last_update
    current_level = max(0, level - elapsed * leak_rate)

    return {
        'remaining': capacity - current_level,
        'limit': capacity,
        'reset': now() + current_level / leak_rate,
        'retry_after': max(0, (current_level + 1 - capacity) / leak_rate)
    }

Response Headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 23
X-RateLimit-Reset: 1638360100
Retry-After: 5

Pros:

Shows available space in bucket
Predictable refill schedule

Cons:

Remaining doesn't represent immediate availability
Represents queue space, not instant capacity

3. GCRA - Excellent but Complex Tracking

What it tracks:

Requests remaining in burst allowance
Time until next request is allowed
Theoretical Arrival Time (TAT)

Calculation:

def get_remaining_info():
    now_time = now()
    tat_current = max(tat, now_time)

    # Calculate how many requests can be made immediately
    allow_at = tat_current - burst_allowance
    time_to_wait = max(0, allow_at - now_time)

    # Remaining burst capacity
    used_burst = max(0, tat_current - now_time)
    remaining_burst = burst_capacity - (used_burst / emission_interval)

    return {
        'remaining': floor(remaining_burst),
        'limit': burst_capacity,
        'reset': tat_current,
        'retry_after': ceil(time_to_wait)
    }

Response Headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 12
X-RateLimit-Reset: 1638360075
Retry-After: 0

Pros:

Precise calculation
Optimal for distributed systems (single value read)
No race conditions

Cons:

More complex to understand
"Remaining" represents burst capacity, not total window capacity
Harder to explain to API consumers

Go Library Support:

// go-redis/redis_rate example
limiter := redis_rate.NewLimiter(redisClient)

res, err := limiter.Allow(ctx, "key", redis_rate.PerMinute(100))
if err != nil {
    panic(err)
}

fmt.Println("Allowed:", res.Allowed)
fmt.Println("Remaining:", res.Remaining)
fmt.Println("Retry after:", res.RetryAfter)
fmt.Println("Reset after:", res.ResetAfter)

4. Fixed Window Counter - Simple Tracking

What it tracks:

Requests used in current window
Requests remaining in current window
Window reset time

Calculation:

def get_remaining_info():
    window_id = floor(now() / window_size)
    current_count = counter[window_id]
    window_start = window_id * window_size
    window_end = window_start + window_size

    return {
        'remaining': max(0, limit - current_count),
        'limit': limit,
        'reset': window_end,
        'retry_after': 0 if current_count < limit else
                      (window_end - now())
    }

Response Headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 37
X-RateLimit-Reset: 1638360060
Retry-After: 0

Pros:

Extremely simple calculation
Exact count within window
Clear reset time

Cons:

Remaining count resets suddenly at window boundary
Can be misleading (shows high remaining at start of window even after burst)

Example Issue:

Window 1 (0:00-0:59): Used 100/100 at 0:58
                      Remaining: 0

Window 2 (1:00-1:59): Used 0/100 at 1:00
                      Remaining: 100 (sudden jump!)

5. Sliding Window Log - Perfect Tracking

What it tracks:

Exact count of requests in sliding window
Exact remaining capacity
Precise time until oldest request expires

Calculation:

def get_remaining_info():
    now_time = now()

    # Remove expired entries
    valid_log = [ts for ts in log if ts > now_time - window_size]

    remaining = limit - len(valid_log)

    # Time until oldest entry expires (space becomes available)
    if len(valid_log) >= limit:
        oldest = min(valid_log)
        retry_after = oldest + window_size - now_time
    else:
        retry_after = 0

    return {
        'remaining': remaining,
        'limit': limit,
        'reset': now_time + window_size,
        'retry_after': retry_after
    }

Response Headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1638360100
Retry-After: 0

Pros:

100% accurate remaining count
True sliding window visibility
Perfect for compliance/auditing

Cons:

Expensive to calculate (O(n) operations)
High overhead in distributed systems
Not suitable for high-volume APIs

6. Sliding Window Counter - Good Approximation

What it tracks:

Estimated requests in sliding window
Approximate remaining capacity
Window reset time

Calculation:

def get_remaining_info():
    now_time = now()

    # Calculate position in current window
    position = (now_time % window_size) / window_size

    # Estimate current usage
    estimated_count = previous_count * (1 - position) + current_count

    remaining = max(0, limit - ceil(estimated_count))

    # Next window reset
    current_window_start = floor(now_time / window_size) * window_size
    next_reset = current_window_start + window_size

    return {
        'remaining': remaining,
        'limit': limit,
        'reset': next_reset,
        'retry_after': 0 if remaining > 0 else (next_reset - now_time)
    }

Response Headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 38
X-RateLimit-Reset: 1638360060
Retry-After: 0

Pros:

Good approximation (95%+ accurate)
Efficient calculation
Smooth degradation (no sudden jumps)

Cons:

Not exact (can be off by 1-2%)
Might show 0 remaining when 1-2 requests still allowed
Conservative estimation

Go Library Support:

// ulule/limiter example
import "github.com/ulule/limiter/v3"

limiter := limiter.New(store, rate)
context, err := limiter.Get(ctx, "key")

fmt.Println("Limit:", context.Limit)
fmt.Println("Remaining:", context.Remaining)
fmt.Println("Reset:", context.Reset)
fmt.Println("Reached:", context.Reached)

Comparison: Remaining Request Tracking

Algorithm	Accuracy	Calculation Cost	Real-time Updates	Distributed-Friendly	Complexity
Token Bucket	Exact	O(1)	Yes	Medium	Low
Leaky Bucket	Exact	O(1)	Yes	Medium	Low
GCRA	Exact	O(1)	Yes	Excellent	Medium
Fixed Window	Exact*	O(1)	Yes	Excellent	Very Low
Sliding Log	Perfect	O(n)	Yes	Poor	High
Sliding Counter	~95-99%	O(1)	Yes	Excellent	Medium

*Exact within current window, but has boundary issues

Implementation Patterns

Pattern 1: Return Metadata with Every Request

Best for: All algorithms

type RateLimitResult struct {
    Allowed      bool      // Was request allowed?
    Limit        int       // Total limit
    Remaining    int       // Remaining requests
    Reset        time.Time // When limit resets
    RetryAfter   int       // Seconds to wait if denied
}

func CheckRateLimit(key string) RateLimitResult {
    // Algorithm-specific logic
    return RateLimitResult{
        Allowed:    true,
        Limit:      100,
        Remaining:  42,
        Reset:      time.Now().Add(60 * time.Second),
        RetryAfter: 0,
    }
}

Pattern 2: Separate Query Method

Best for: High-performance scenarios

// Check limit (modifies state)
allowed := limiter.Allow(ctx, key)

// Query status (read-only)
status := limiter.GetStatus(ctx, key)
fmt.Println("Remaining:", status.Remaining)

Pattern 3: HTTP Middleware

Best for: Web APIs

func RateLimitMiddleware(limiter RateLimiter) func(http.Handler) http.Handler {
    return func(next http.Handler) http.Handler {
        return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            key := getClientKey(r)
            result := limiter.Check(key)

            // Always set headers
            w.Header().Set("X-RateLimit-Limit", strconv.Itoa(result.Limit))
            w.Header().Set("X-RateLimit-Remaining", strconv.Itoa(result.Remaining))
            w.Header().Set("X-RateLimit-Reset", strconv.FormatInt(result.Reset.Unix(), 10))

            if !result.Allowed {
                w.Header().Set("Retry-After", strconv.Itoa(result.RetryAfter))
                http.Error(w, "Rate limit exceeded", http.StatusTooManyRequests)
                return
            }

            next.ServeHTTP(w, r)
        })
    }
}

Library-Specific Examples

go-redis/redis_rate (GCRA)

import "github.com/go-redis/redis_rate/v10"

limiter := redis_rate.NewLimiter(rdb)
res, err := limiter.Allow(ctx, "user:123", redis_rate.PerMinute(100))

if err != nil {
    panic(err)
}

fmt.Printf("Allowed: %v\n", res.Allowed)
fmt.Printf("Limit: %d\n", res.Limit)
fmt.Printf("Remaining: %d\n", res.Remaining)
fmt.Printf("Reset after: %v\n", res.ResetAfter)
fmt.Printf("Retry after: %v\n", res.RetryAfter)

// Output:
// Allowed: true
// Limit: 100
// Remaining: 42
// Reset after: 45s
// Retry after: 0s

ulule/limiter (Sliding Window Counter)

import "github.com/ulule/limiter/v3"
import "github.com/ulule/limiter/v3/drivers/store/memory"

rate := limiter.Rate{
    Period: 1 * time.Minute,
    Limit:  100,
}

store := memory.NewStore()
instance := limiter.New(store, rate)

context, err := instance.Get(ctx, "user:123")
if err != nil {
    panic(err)
}

fmt.Printf("Limit: %d\n", context.Limit)
fmt.Printf("Remaining: %d\n", context.Remaining)
fmt.Printf("Reset: %v\n", context.Reset)
fmt.Printf("Reached: %v\n", context.Reached)

// Output:
// Limit: 100
// Remaining: 38
// Reset: 2025-12-07 15:45:00 +0000 UTC
// Reached: false

juju/ratelimit (Token Bucket)

import "github.com/juju/ratelimit"

// 100 requests per second with burst of 100
bucket := ratelimit.NewBucketWithRate(100, 100)

// Get remaining tokens
remaining := bucket.Available()
fmt.Printf("Remaining: %d\n", remaining)

// Try to take a token
if bucket.TakeAvailable(1) > 0 {
    fmt.Println("Request allowed")
} else {
    fmt.Println("Rate limited")
}

// Get wait time for next token
waitDuration := bucket.Take(1)
fmt.Printf("Wait time: %v\n", waitDuration)

// Output:
// Remaining: 45
// Request allowed
// Wait time: 0s

throttled/throttled (GCRA)

import "github.com/throttled/throttled/v2"
import "github.com/throttled/throttled/v2/store/memstore"

store, err := memstore.New(65536)
quota := throttled.RateQuota{
    MaxRate:  throttled.PerMin(100),
    MaxBurst: 10,
}

rateLimiter, err := throttled.NewGCRARateLimiter(store, quota)

limited, result, err := rateLimiter.RateLimit("user:123", 1)

fmt.Printf("Limited: %v\n", limited)
fmt.Printf("Limit: %d\n", result.Limit)
fmt.Printf("Remaining: %d\n", result.Remaining)
fmt.Printf("Reset after: %v\n", result.ResetAfter)
fmt.Printf("Retry after: %v\n", result.RetryAfter)

// Output:
// Limited: false
// Limit: 100
// Remaining: 12
// Reset after: 45s
// Retry after: 0s

mennanov/limiters (Multiple algorithms)

import "github.com/mennanov/limiters"

// Token bucket
limiter := limiters.NewTokenBucket(
    100,                    // capacity
    10*time.Second,        // refill interval
    limiters.NewSystemClock(),
)

// Check capacity
capacity := limiter.Capacity()
fmt.Printf("Remaining: %d\n", capacity)

// Try to take tokens
err := limiter.Limit(context.Background())
if err != nil {
    fmt.Println("Rate limited")
} else {
    fmt.Println("Request allowed")
}

Best Practices for Exposing Remaining Requests

1. Always Include Standard Headers

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1638360060

2. Add Retry-After on Rate Limit

HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1638360060

{
  "error": "rate_limit_exceeded",
  "message": "You have exceeded the rate limit. Please retry after 30 seconds."
}

3. Provide Multiple Time Formats

{
  "rate_limit": {
    "limit": 100,
    "remaining": 42,
    "reset": 1638360060,
    "reset_iso": "2025-12-07T15:41:00Z",
    "reset_relative": "in 30 seconds"
  }
}

4. Document Conservative vs Exact Counts

# API Documentation

Rate Limit Headers:
- X-RateLimit-Remaining: Approximate remaining requests (within 2%)
- May show 0 when 1-2 requests are still available
- Conservative to prevent over-limiting

5. Handle Clock Skew in Distributed Systems

// Add safety margin for distributed clock differences
safeRemaining := max(0, calculatedRemaining - 2)

w.Header().Set("X-RateLimit-Remaining", strconv.Itoa(safeRemaining))

Common Pitfalls

Pitfall 1: Race Conditions

Problem: In distributed systems, remaining count can decrease between read and write

Thread 1: Read remaining = 1
Thread 2: Read remaining = 1
Thread 1: Make request (remaining = 0)
Thread 2: Make request (remaining = -1) ← Problem!

Solution: Use atomic operations or accept slight inaccuracy

// Redis atomic operation
remaining, err := rdb.Eval(ctx, `
    local current = redis.call('GET', KEYS[1]) or 0
    if tonumber(current) < tonumber(ARGV[1]) then
        redis.call('INCR', KEYS[1])
        return tonumber(ARGV[1]) - tonumber(current) - 1
    end
    return -1
`, []string{key}, limit).Int()

Pitfall 2: Negative Remaining Counts

Problem: Showing -5 remaining is confusing

Solution: Always floor at 0

remaining := max(0, limit - current)

Pitfall 3: Misleading Reset Times

Problem: Fixed window shows reset time that might mislead users

Solution: Clarify in documentation

X-RateLimit-Reset: Unix timestamp when the current window ends
Note: Limits are calculated using a sliding window, so requests
made before this time may still count against your limit.

Pitfall 4: Not Accounting for Burst

Problem: Token bucket shows capacity, not sustained rate

Solution: Expose both metrics

X-RateLimit-Limit: 100
X-RateLimit-Burst: 100
X-RateLimit-Remaining: 45
X-RateLimit-Sustained-Rate: 10

Summary: Which Algorithm to Choose for Remaining Tracking?

Best tracking accuracy:

Sliding Window Log (100% accurate) - Low volume only
GCRA (99%+ accurate) - Production recommended
Token Bucket (Exact) - Good for local systems
Sliding Window Counter (95-99% accurate) - Great balance

Best for user experience:

Token Bucket - Intuitive, shows exact tokens
Sliding Window Counter - Smooth degradation
GCRA - Precise but complex
Fixed Window - Can be confusing at boundaries

Best for distributed systems:

GCRA - Single atomic read
Sliding Window Counter - Two reads, efficient
Fixed Window - Simple, fast
Token Bucket - Requires synchronization

Recommendation:

High-volume distributed APIs: GCRA (go-redis/redis_rate)
General-purpose APIs: Sliding Window Counter (ulule/limiter)
Local/single-instance: Token Bucket (juju/ratelimit)
Simplest implementation: Fixed Window (with caveats)

References and Further Reading

Appendix: Algorithm Pseudocode

Token Bucket Pseudocode

class TokenBucket:
    def __init__(self, capacity, refill_rate):
        self.capacity = capacity
        self.tokens = capacity
        self.refill_rate = refill_rate
        self.last_update = now()

    def allow_request(self, cost=1):
        # Refill tokens
        elapsed = now() - self.last_update
        self.tokens = min(self.capacity,
                         self.tokens + elapsed * self.refill_rate)
        self.last_update = now()

        # Check and consume
        if self.tokens >= cost:
            self.tokens -= cost
            return True
        return False

GCRA Pseudocode

class GCRA:
    def __init__(self, rate, burst):
        self.emission_interval = 1.0 / rate
        self.burst_allowance = burst * self.emission_interval
        self.tat = 0  # Theoretical Arrival Time

    def allow_request(self):
        now = time.now()
        self.tat = max(self.tat, now)

        allow_at = self.tat - self.burst_allowance

        if now >= allow_at:
            self.tat += self.emission_interval
            return True
        return False

Sliding Window Counter Pseudocode

class SlidingWindowCounter:
    def __init__(self, limit, window_size):
        self.limit = limit
        self.window_size = window_size
        self.previous_count = 0
        self.current_count = 0
        self.current_window_start = now()

    def allow_request(self):
        now = time.now()

        # Check if we need to rotate windows
        if now - self.current_window_start >= self.window_size:
            self.previous_count = self.current_count
            self.current_count = 0
            self.current_window_start = now

        # Calculate position in current window
        position = (now - self.current_window_start) / self.window_size

        # Estimate current count using interpolation
        estimate = (self.previous_count * (1 - position) +
                   self.current_count)

        if estimate < self.limit:
            self.current_count += 1
            return True
        return False

Document Version: 1.0
Last Updated: December 7, 2025
Author: Research on Go Rate Limiting Libraries

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ratelimit Architecture #352

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Ratelimit Architecture #352

Uh oh!

Uh oh!

renuka-fernando Dec 8, 2025 Collaborator

Replies: 4 comments

Uh oh!

renuka-fernando Dec 8, 2025 Collaborator Author

Uh oh!

renuka-fernando Jan 5, 2026 Collaborator Author

Rate Limit Policy Implementation - v0.1.0

Implementation Overview

1. No Extra Network Latency

2. No Additional Container Required

3. Multiple Algorithm Support

4. Flexible Key Extraction

5. Multiple Concurrent Limits

6. Weighted Rate Limiting (Cost Parameter)

7. Dual Backend Support

8. Comprehensive Rate Limit Headers

Example Usage

Basic Rate Limiting (per route)

Per-User Rate Limiting with API Key

IP-Based Rate Limiting with Multiple Limits

Uh oh!

renuka-fernando Jan 6, 2026 Collaborator Author

Go Distributed Rate Limiting Libraries

Overview

Complete Library Comparison Table

Top Recommendations by Use Case

1. Best Overall for Distributed Rate Limiting

2. Best for Redis-Only Solutions

3. Best for HTTP APIs

4. Best for Microservices Architecture

5. Most Backend Options

6. Best for Single-Instance Performance

Algorithm Comparison

GCRA (Generic Cell Rate Algorithm)

Token Bucket

Sliding Window

Leaky Bucket

Feature Matrix

Decision Guide

Choose ulule/limiter if:

Choose go-redis/redis_rate if:

Choose throttled/throttled if:

Choose mailgun/gubernator if:

Choose mennanov/limiters if:

Choose juju/ratelimit if:

Maintenance and Community Health

Actively Maintained (2024-2025):

Moderately Active:

Lower Activity:

Final Recommendation

Sources

Uh oh!

renuka-fernando Jan 6, 2026 Collaborator Author

Rate Limiting Algorithms

Overview

Table of Contents

1. Token Bucket Algorithm

Description

How It Works

Visual Representation

Characteristics

Use Cases

Implementation in Go Libraries

Example Configuration

Mathematical Formula

2. Leaky Bucket Algorithm

Description

How It Works

Visual Representation

Characteristics

Use Cases

Implementation in Go Libraries

Leaky Bucket vs Token Bucket

Mathematical Formula

3. GCRA (Generic Cell Rate Algorithm)

Description

How It Works

renuka-fernando
Dec 8, 2025
Collaborator

renuka-fernando
Dec 8, 2025
Collaborator Author

renuka-fernando
Jan 5, 2026
Collaborator Author

renuka-fernando
Jan 6, 2026
Collaborator Author

renuka-fernando
Jan 6, 2026
Collaborator Author