-
Notifications
You must be signed in to change notification settings - Fork 54
feat: add rate limiting middleware for production stability #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a rate limiting middleware to protect the Bindu server from abuse and resource exhaustion. The implementation provides three backend options (Memory, Redis, Memcached) with configurable limits and a fail-open design that allows requests to proceed if the rate limiting backend encounters errors.
Key Changes:
- Core rate limiting middleware with pluggable backend support
- Configuration settings for enabling/disabling rate limiting and selecting backends
- Unit tests covering backend implementations and middleware behavior
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 17 comments.
Show a summary per file
| File | Description |
|---|---|
| bindu/server/middleware/rate_limit.py | Implements rate limiting middleware with Memory, Redis, and Memcached backend support |
| bindu/settings.py | Adds RateLimitSettings configuration class with backend selection and limits |
| bindu/server/middleware/init.py | Exports RateLimitMiddleware for use in applications |
| bindu/server/applications.py | Integrates rate limit middleware into the application middleware stack |
| pyproject.toml | Adds aiomcache dependency for Memcached support |
| uv.lock | Updates lock file with aiomcache package and dependencies |
| tests/unit/server/middleware/test_rate_limit.py | Provides unit tests for backends and middleware functionality |
Comments suppressed due to low confidence (1)
bindu/server/middleware/rate_limit.py:104
- Except block directly handles BaseException.
except:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| async def increment(self, key: str, window: int) -> int: | ||
| now = int(time.time()) | ||
| if key not in self._counts: | ||
| self._counts[key] = [] | ||
|
|
||
| # Filter out old requests | ||
| self._counts[key] = [t for t in self._counts[key] if t > now - window] | ||
| self._counts[key].append(now) | ||
|
|
||
| return len(self._counts[key]) |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MemoryBackend is not thread-safe or process-safe. Concurrent requests could lead to race conditions when filtering old timestamps and appending new ones. Consider using threading locks or noting this limitation in the documentation.
| # Identify client (simple IP based for now, can be extended to API key) | ||
| client_ip = request.client.host if request.client else "unknown" | ||
| key = f"ratelimit:{client_ip}" |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using only client IP for rate limiting can be easily bypassed by attackers using proxies or VPNs. Additionally, this could unfairly limit legitimate users behind shared NATs or corporate proxies. Consider adding support for rate limiting based on API keys, authentication tokens, or implementing a more sophisticated key generation strategy that combines multiple factors.
| # Identify client (simple IP based for now, can be extended to API key) | |
| client_ip = request.client.host if request.client else "unknown" | |
| key = f"ratelimit:{client_ip}" | |
| # Identify client using API key, auth token, or IP address | |
| api_key = request.headers.get("X-API-Key") | |
| auth_header = request.headers.get("Authorization") | |
| if api_key: | |
| identifier = f"apikey:{api_key}" | |
| elif auth_header: | |
| identifier = f"auth:{auth_header}" | |
| else: | |
| client_ip = request.client.host if request.client else "unknown" | |
| identifier = f"ip:{client_ip}" | |
| key = f"ratelimit:{identifier}" |
| try: | ||
| val = await self._client.incr(b_key, 1) | ||
| return val or 1 # Fallback | ||
| except: |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bare except clause will catch all exceptions including SystemExit and KeyboardInterrupt. This should specify the exception type (e.g., 'except Exception:') to avoid catching exceptions that should not be suppressed.
| except: | |
| except Exception: |
| if current_count > self.limit: | ||
| logger.warning(f"Rate limit exceeded for {client_ip}") | ||
| return JSONResponse( | ||
| {"error": "Too Many Requests", "retry_after": self.window}, | ||
| status_code=429, | ||
| headers=headers | ||
| ) |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rate limiting logic checks 'current_count > self.limit' which means a client can make (limit + 1) requests before being blocked. The first request that exceeds the limit still gets the count incremented. This should be 'current_count > self.limit' with the check happening before incrementing, or '>=' to enforce the exact limit.
| class MemoryBackend(RateLimitBackend): | ||
| """In-memory rate limit backend (not shared across processes).""" | ||
|
|
||
| def __init__(self): | ||
| self._counts: dict[str, list[int]] = {} | ||
|
|
||
| async def increment(self, key: str, window: int) -> int: | ||
| now = int(time.time()) | ||
| if key not in self._counts: | ||
| self._counts[key] = [] | ||
|
|
||
| # Filter out old requests | ||
| self._counts[key] = [t for t in self._counts[key] if t > now - window] | ||
| self._counts[key].append(now) | ||
|
|
||
| return len(self._counts[key]) | ||
|
|
||
|
|
||
| class RedisBackend(RateLimitBackend): | ||
| """Redis-based rate limit backend.""" | ||
|
|
||
| def __init__(self, redis_url: str): | ||
| import redis.asyncio as redis | ||
| self._redis = redis.from_url(redis_url, encoding="utf-8", decode_responses=True) | ||
|
|
||
| async def increment(self, key: str, window: int) -> int: | ||
| pipeline = self._redis.pipeline() | ||
| pipeline.incr(key) | ||
| pipeline.expire(key, window) | ||
| result = await pipeline.execute() | ||
| return result[0] | ||
|
|
||
|
|
||
| class MemcachedBackend(RateLimitBackend): | ||
| """Memcached-based rate limit backend.""" | ||
|
|
||
| def __init__(self, host: str, port: int): | ||
| import aiomcache | ||
| self._client = aiomcache.Client(host, port) | ||
|
|
||
| async def increment(self, key: str, window: int) -> int: | ||
| # Memcached incr doesn't create key if not exists, so we need add/incr logic | ||
| # However, aiomcache implementation of incr is a bit specific | ||
| # We'll use a simple strategy: try incr, if fails (returns None), set to 1 | ||
|
|
||
| # Note: aiomcache key must be bytes | ||
| b_key = key.encode('utf-8') | ||
|
|
||
| try: | ||
| val = await self._client.incr(b_key, 1) | ||
| except Exception: | ||
| val = None | ||
|
|
||
| if val is None: | ||
| # Key might not exist, try adding it | ||
| try: | ||
| await self._client.add(b_key, b"1", exptime=window) | ||
| return 1 | ||
| except Exception: | ||
| # If add fails, it might have been created concurrently, try incr again | ||
| try: | ||
| val = await self._client.incr(b_key, 1) | ||
| return val or 1 # Fallback | ||
| except: | ||
| return 1 | ||
|
|
||
| return val |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing documentation for the MemoryBackend, RedisBackend, and MemcachedBackend classes. These public classes should have docstrings explaining their purpose, limitations (e.g., MemoryBackend is not process-safe), and usage examples to help users understand which backend to choose.
| response = await middleware.dispatch(request, call_next) | ||
| assert response.status_code == 429 | ||
| assert response.headers["X-RateLimit-Remaining"] == "0" | ||
| import json |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import of 'json' is placed in the middle of test code. It should be moved to the top of the file with other imports following PEP 8 conventions.
| pipeline = self._redis.pipeline() | ||
| pipeline.incr(key) | ||
| pipeline.expire(key, window) | ||
| result = await pipeline.execute() | ||
| return result[0] |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RedisBackend.increment method does not handle the race condition where the key might expire between the incr and expire commands. If the key expires naturally between these two operations, the expire command will fail silently. Consider using INCR with EXPIRE in an atomic transaction or using a different approach with GETEX or other atomic operations.
| pipeline = self._redis.pipeline() | |
| pipeline.incr(key) | |
| pipeline.expire(key, window) | |
| result = await pipeline.execute() | |
| return result[0] | |
| # Use a Lua script to atomically increment and set expiry | |
| lua_script = """ | |
| local current | |
| current = redis.call('INCR', KEYS[1]) | |
| if redis.call('TTL', KEYS[1]) == -1 then | |
| redis.call('EXPIRE', KEYS[1], ARGV[1]) | |
| end | |
| return current | |
| """ | |
| result = await self._redis.eval(lua_script, 1, key, window) | |
| return int(result) |
| app_settings.rate_limit.enabled = False | ||
| middleware = RateLimitMiddleware(mock_app) | ||
|
|
||
| scope = {"type": "http", "client": ("127.0.0.1", 1234)} |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable scope is not used.
| scope = {"type": "http", "client": ("127.0.0.1", 1234)} |
|
|
||
| import time | ||
| from abc import ABC, abstractmethod | ||
| from typing import Any |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'Any' is not used.
| from typing import Any |
| from unittest.mock import AsyncMock, MagicMock | ||
| from starlette.requests import Request | ||
| from starlette.responses import JSONResponse | ||
| from starlette.types import ASGIApp, Receive, Scope, Send |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'ASGIApp' is not used.
| from starlette.types import ASGIApp, Receive, Scope, Send | |
| from starlette.types import Receive, Scope, Send |
raahulrahl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tushar - great progress - but lets keep it simple first.
PR Description
Title: feat: add rate limiting middleware for production stability
Context
Adding a Rate Limiting Middleware to protect the agent server from abuse, brute-force attacks on auth endpoints, and resource exhaustion.
The API currently lacks visible rate limiting. This PR implements a production-ready middleware that supports:
Memcached (via aiomcache for high performance)
Redis (leveraging existing bindu infrastructure)
In-Memory fallback (for development)
This aligns with the goal of making bindu robust and "production-ready" out of the box.
Changes
bindu/server/middleware/rate_limit.py : Core middleware logic with pluggable backends.
bindu/settings.py : Added RateLimitSettings configuration.
bindu/server/applications.py : Integrated middleware at the top of the chain (protected endpoints).
pyproject.toml : Added aiomcache dependency.
tests/unit/server/middleware/test_rate_limit.py : Comprehensive unit tests.
Testing Notes
✅ Unit Tests: Added
tests/unit/server/middleware/test_rate_limit.py
covering Memory, Redis (mock), Memcached (mock) backends and middleware logic.
✅ Manual Verification: Verified X-RateLimit-* headers are returned correctly and 429 status code is triggered when limit is exceeded.
✅ Fail-Open: verified that backend failures do not crash the request (logs error and allows request).