Rate Limiting with Spring Cloud Gateway and Redis

In this post, we’ll explore how to implement distributed rate limiting using Spring Cloud Gateway and Redis for rate limiting storage.

· Prerequisites
· Overview
∘ What Is Rate Limiting?
∘ Why is rate limiting necessary for applications and systems?
∘ Why Use Redis for Rate Limiting?
∘ Rate Limiting Algorithms
∘ Algorithm Matrix
· Let’s code
∘ Setting up a Redis Instance
∘ How Rate Limiting Works with Redis
∘ Spring Cloud Gateway
∘ Configuring Redis
∘ Enabling Rate Limiting in Spring Cloud Gateway
∘ Creating a Key Resolver
∘ Testing
· Conclusion
· References


Prerequisites

This is the list of all the prerequisites:

  • Spring Boot 4
  • Maven 3.6.3 or later
  • Java 21 or later
  • Docker / Docker-compose installed (optional if you’ve already downloaded and installed Redis)
  • Postman / insomnia or any other API testing tool.
  • IntelliJ IDEA, Visual Studio Code, or another IDE

Overview

What Is Rate Limiting?

In modern microservice architectures, APIs are often exposed to the public internet or consumed by multiple internal services. Without proper safeguards, a single misbehaving client — or a malicious actor — can overwhelm your system.

Rate limiting is a technique used in computer systems to control the rate at which requests are sent or processed in order to maintain system stability and security. In web applications, rate limiting restricts the number of requests that a client can make to a server within a given time period to prevent abuse and ensure fair usage of resources among multiple clients.

Why is rate limiting necessary for applications and systems?

Rate limiting is a foundational security and availability mechanism that controls how frequently users or systems can access an application or resource. By limiting the number of requests, companies can protect against various types of attacks, such as:

DDoS attacks: By restricting the number of requests to a reasonable level, organizations can prevent DDoS attacks from overloading their system and bringing it down.

Credential stuffing: Limiting login attempts from a single IP address or user can prevent credential stuffing attacks, where attackers use automated scripts to try different combinations of usernames and passwords until they find a good match.

Brute force attacks: Limiting the number of requests or attempts to access a resource can help prevent brute force attacks, where attackers try different combinations of characters to gain access to a system or application.

Data scraping: Rate limiting can help prevent data scraping by restricting the number of requests made by a single user or IP address. Attackers are prevented from scraping sensitive data.

Why Use Redis for Rate Limiting?

  • In-memory performance: Achieves sub-millisecond response times for rapid rate checks.
  • Atomic operations: Redis ensures thread-safe, atomic INCR and DECR operations for accurate counting.
  • Distributed architecture: Multiple instances or gateways can share the same rate-limiting state seamlessly.
  • TTL support: Automatically expires keys, simplifying time-based rate limiting.

Rate Limiting Algorithms

Rate limiting algorithms define how requests are counted and restricted over time. Each algorithm has different trade-offs in terms of accuracy, memory usage, and behavior during traffic spikes.

  1. Token Bucket

A token bucket is a container that has pre-defined capacity. Tokens are put in the bucket at preset rates periodically. Once the bucket is full, no more tokens are added.

2. Leaky Bucket

Requests enter a bucket (queue) and leak out at a constant rate. If bucket overflows, requests are dropped.

3. Fixed Window Counter

Fixed window counter algorithm works as follows:

  • The algorithm divides the timeline into fix-sized time windows and assign a counter for each window.
  • Each request increments the counter by one.
  • Once the counter reaches the pre-defined threshold, new requests are dropped until a new time window starts.

4. Sliding Window Log

Each request timestamp is stored and evaluated against a moving time window. This provides precise rate limiting but requires more memory and processing, making it less suitable for high-throughput systems.

5. Sliding Window Counter

A hybrid approach that combines fixed windows with sliding behavior by weighting counts from the current and previous windows. It offers better accuracy than fixed windows with lower overhead than full sliding logs.

Algorithm Matrix

Let’s code

Setting up a Redis Instance

This post uses the Docker approach to create a Redis instance. Here is the full content of the docker-compose file:

version: '3.8'

services:
redis:
image: redis:7-alpine
container_name: gateway-redis
ports:
- "6379:6379"
volumes:
- redis-limiter-data:/data
command: redis-server --appendonly yes
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 3s
retries: 3

volumes:
redis-limiter-data:

How Rate Limiting Works with Redis

  1. User Request: The user sends a request (e.g., API call) to the system.
  2. Gateway (API): The API Gateway receives the request and checks the rate limit in Redis.
  3. Check Redis Counter:
  • The system queries Redis for the current request count associated with the user (or IP address).
  • Redis performs an atomic INCR operation to track the requests.

4. Counter < Limit?:

  • If the counter is less than the limit, Redis will increment the counter and the request will be allowed.
  • If the counter is greater than or equal to the limit, the request will be rejected with a 429 Too Many Requests status.

5. Counter Expiration (TTL): Redis automatically manages the expiration of the key (after the specified time window), resetting the counter to 0.

Spring Cloud Gateway

We’ll start by creating a simple Spring Boot project from start.spring.io, with the following dependencies: Reactive Gateway, Spring Data Reactive Redis, Lombok.

Configuring Redis

In application.yml, configure the Redis connection:

spring:
application:
name: spring-cloud-gateway-rate-limiter

data:
redis:
host: localhost
port: 6379
timeout: 2000ms

If you’re using Redis in Docker or a cloud provider, adjust the host and port accordingly.

  • If Redis doesn’t respond in 2 seconds, operation fails
  • Important: Low timeout prevents gateway from hanging if Redis is down

Enabling Rate Limiting in Spring Cloud Gateway

Spring Cloud Gateway uses the Token Bucket Algorithm.

spring:

cloud:
gateway:
server:
webflux:
routes:
- id: local-demo-service
uri: http://localhost:8080
predicates:
- Path=/auth/**
filters:
- RewritePath=/auth/(?<segment>.*), /internal/${segment}
- name: RequestRateLimiter
args:
redis-rate-limiter.replenishRate: 10
redis-rate-limiter.burstCapacity: 20
redis-rate-limiter.requestedTokens: 1
key-resolver: "#{@userKeyResolver}"
  • replenishRate: defines how many requests per second to allow (without any dropped requests). This is the rate at which the token bucket is filled.
  • burstCapacity: is the maximum number of requests a user is allowed in a single second (without any dropped requests). This is the number of tokens the token bucket can hold. Setting this value to zero blocks all requests.
  • requestedTokens: is how many tokens a request costs. This is the number of tokens taken from the bucket for each request and defaults to 1.
  • key-resolver: Determines how requests are grouped (e.g., by IP or user)

Creating a Key Resolver

The RequestRateLimiter GatewayFilter factory uses a RateLimiter implementation to determine if the current request is allowed to proceed. If it is not, a status of HTTP 429 - Too Many Requests (by default) is returned.

keyResolver is a bean that implements the KeyResolver interface. In configuration, reference the bean by name using SpEL. #{@userKeyResolver} is a SpEL expression that references a bean named userKeyResolver.

@Configuration
public class RateLimiterConfig {

/**
* Rate limiting by IP address
*/
@Bean
public KeyResolver userKeyResolver() {
return exchange -> Mono.just(
Objects.requireNonNull(exchange.getRequest()
.getRemoteAddress())
.getAddress()
.getHostAddress()
);
}
}

The Key Resolver is critical — it determines the rate limiting boundary.

Testing

  1. Start Redis
  2. Run the Spring Gateway application
  • Scenario 1: Single Call

Send one request to the API endpoint

The request should succeed.

  • Scenario 2: Multiple Calls

Expected behavior:

  • The first requests (within the rate limit) return HTTP 200 (OK)
  • Once the limit is exceeded, requests return HTTP 429 (Too Many Requests)

Open Runner in Postman and select the collection and the specific request you want to test. Then, set the number of iterations (for example, 20 or 50) to send repeated requests to the same API endpoint.

Click Run to execute the requests and observe the responses

You can see that a 429 Too Many Requests error has occurred.

Result in Redis:

After 3 seconds, the key will expire automatically, and the token count will reset.

Conclusion

🏁 Well done !!. In this post, we implemented rate limiting using Spring Cloud Gateway and Redis.

Implementing rate limiting with Spring Cloud Gateway and Redis is straightforward and powerful. It helps protect your APIs from overuse while providing a scalable, distributed solution.

The complete source code is available on GitHub.

Support me through GitHub Sponsors.

Thank you for reading!! See you in the next post.

References

👉Link to Medium blog

Related Posts