Rate Limits
Default rate limits, per-endpoint behavior, and how to configure them.
Appstrate uses rate-limiter-flexible for rate limiting. With REDIS_URL set, buckets are shared across instances (RateLimiterRedis). Without Redis, buckets are process-local (RateLimiterMemory) and reset on restart.
How buckets are keyed
Rate limits are per-endpoint, not just per-identity. The key format is:
- Session auth:
{method}:{path}:{userId} - API key auth:
{method}:{path}:apikey:{apiKeyId} - Unauthenticated / public routes:
ip:{method}:{path}:{ipAddr} - Internal bearer tokens:
internal:{path}:{tokenPrefix}
Two identical identities hitting different paths have separate counters, so a single API key can simultaneously hit its POST /run limit without affecting its POST /end-users limit.
On top of per-endpoint limits, two org-wide rate limits run in parallel:
- The org-wide run rate limit (keyed by
orgId, configured viaPLATFORM_RUN_LIMITS.per_org_global_rate_per_min) governs any run launch. - The org-wide inline rate limit (keyed by
orgId,INLINE_RUN_LIMITS.rate_per_min) is layered on top forPOST /api/runs/inline.
Platform-wide limits (PLATFORM_RUN_LIMITS)
JSON object env var. Validated strictly at boot by apps/api/src/services/run-limits.ts. Defaults are conservative but production-ready — the system is never "unlimited" out of the box.
{
"timeout_ceiling_seconds": 1800,
"per_org_global_rate_per_min": 200,
"max_concurrent_per_org": 50
}| Key | Default | Meaning |
|---|---|---|
timeout_ceiling_seconds | 1800 (30 min) | Max runtime per run. Clamps any agent-declared timeout. Hitting the ceiling emits a run.timeout webhook event. |
per_org_global_rate_per_min | 200 | Runs per minute per organization (agent run + inline run + scheduled run). |
max_concurrent_per_org | 50 | Concurrent runs per organization. Extra runs are rejected with 429 until a slot frees. |
Inline run limits (INLINE_RUN_LIMITS)
Applies only to POST /api/runs/inline.
{
"rate_per_min": 60,
"manifest_bytes": 65536,
"prompt_bytes": 200000,
"max_skills": 20,
"max_tools": 20,
"max_authorized_uris": 50,
"wildcard_uri_allowed": false,
"retention_days": 30
}| Key | Default | Meaning |
|---|---|---|
rate_per_min | 60 | Inline runs per minute per org |
manifest_bytes | 65536 | Max size of the inline manifest |
prompt_bytes | 200000 | Max size of the agent prompt |
max_skills | 20 | Max number of skills declared in the inline manifest |
max_tools | 20 | Max number of tools declared in the inline manifest |
max_authorized_uris | 50 | Max entries in each authorizedUris allowlist |
wildcard_uri_allowed | false | Allow * in authorizedUris entries |
retention_days | 30 | Days before the shadow package backing an inline run is garbage-collected |
Per-endpoint limits
These are set via the rateLimit(N) middleware and recorded as method:path:identity buckets.
| Endpoint | Limit / min |
|---|---|
POST /api/agents/@scope/name/run | 20 |
POST /api/runs/inline, /api/runs/inline/validate | INLINE_RUN_LIMITS.rate_per_min (default 60) |
POST /api/packages/import, /api/packages/import-github | 10 |
GET /api/packages/@scope/name/{version}/download | 50 |
POST /api/end-users | 60 |
GET /api/end-users, /api/end-users/:id | 300 |
PATCH /api/end-users/:id, DELETE /api/end-users/:id | 60 |
POST /api/proxies/:id/test | 5 |
POST /api/provider-keys/test, /api/provider-keys/:id/test | 5 |
POST /api/models/test, /api/models/:id/test | 5 |
GET /api/models/openrouter | 10 |
POST /api/connection-profiles | 10 |
POST /api/app-profiles, /api/app-profiles/:id/bind | 10 |
POST /api/schedules | 10 |
POST /api/uploads | 20 |
Unauthenticated routes (signup, login, public health checks) are keyed by IP with equivalent per-route limits; the single global "60/min per IP" bucket described in earlier versions of this doc was never a real thing.
Response on limit
HTTP/1.1 429 Too Many Requests
Content-Type: application/problem+json
Retry-After: 15
RateLimit: limit=20, remaining=0, reset=15
RateLimit-Policy: 20;w=60
{
"type": "https://appstrate.dev/errors/rate-limited",
"title": "Rate limit exceeded",
"status": 429,
"code": "rate_limited",
"retryAfter": 15,
"requestId": "req_..."
}The three IETF structured headers (RateLimit, RateLimit-Policy, Retry-After) are emitted on every 429 and on non-429 responses from rate-limited routes (so clients can pre-emptively back off). Respect Retry-After (seconds). Implement exponential backoff for repeated 429s.
Per-endpoint specifics
Realtime (SSE)
SSE endpoints (/api/realtime/*) have no per-message rate limit — once the stream is open, the server fans events out as they arrive. Connection establishment goes through the normal auth pipeline but not a dedicated rate limiter.
Webhook deliveries (outbound from Appstrate)
Outbound webhook deliveries run in a BullMQ worker outside the HTTP request pipeline, so they are not subject to these limits. Plan your receiver's rate limits to accommodate burst retries (the retry backoff is 30s → 5min → 30min → 1h → 2h → 3h → 4h, see Webhooks).
Write endpoints with Idempotency-Key
A replayed request (same key, same body) returns the cached response without consuming a new rate-limit point. Note that the original request that populated the cache did consume a point when it was first executed.
Monitoring
Rate-limit hit counts are emitted as structured log lines at DEBUG level. Forward Appstrate logs to your observability stack and alert on sustained rate_limited codes to detect either abusive clients or misconfigured per-endpoint caps.
Bypasses
There is no built-in bypass. Admin role, dev mode, and IP allowlists are all unimplemented. If you need differentiated limits for a specific tenant, either:
- Adjust
PLATFORM_RUN_LIMITSglobally and scale other endpoint caps as needed, or - Run your API key behind a reverse proxy that applies its own rate policy before reaching Appstrate.