Rate Limits

Appstrate uses rate-limiter-flexible for rate limiting. With REDIS_URL set, buckets are shared across instances (RateLimiterRedis). Without Redis, buckets are process-local (RateLimiterMemory) and reset on restart.

How buckets are keyed

Rate limits are per-endpoint, not just per-identity. The key format is:

Session auth: {method}:{path}:{userId}
API key auth: {method}:{path}:apikey:{apiKeyId}
Unauthenticated / public routes: ip:{method}:{path}:{ipAddr}
Internal bearer tokens: internal:{path}:{tokenPrefix}

Two identical identities hitting different paths have separate counters, so a single API key can simultaneously hit its POST /run limit without affecting its POST /end-users limit.

On top of per-endpoint limits, two org-wide rate limits run in parallel:

The org-wide run rate limit (keyed by orgId, configured via PLATFORM_RUN_LIMITS.per_org_global_rate_per_min) governs any run launch.
The org-wide inline rate limit (keyed by orgId, INLINE_RUN_LIMITS.rate_per_min) is layered on top for POST /api/runs/inline.

Platform-wide limits (`PLATFORM_RUN_LIMITS`)

JSON object env var. Validated strictly at boot by apps/api/src/services/run-limits.ts. Defaults are conservative but production-ready — the system is never "unlimited" out of the box.

{
  "timeout_ceiling_seconds": 1800,
  "per_org_global_rate_per_min": 200,
  "max_concurrent_per_org": 50
}

Key	Default	Meaning
`timeout_ceiling_seconds`	`1800` (30 min)	Max runtime per run. Clamps any agent-declared timeout. Hitting the ceiling emits a `run.timeout` webhook event.
`per_org_global_rate_per_min`	`200`	Runs per minute per organization (agent run + inline run + scheduled run).
`max_concurrent_per_org`	`50`	Concurrent runs per organization. Extra runs are rejected with 429 until a slot frees.

Inline run limits (`INLINE_RUN_LIMITS`)

Applies only to POST /api/runs/inline.

{
  "rate_per_min": 60,
  "manifest_bytes": 65536,
  "prompt_bytes": 200000,
  "max_skills": 20,
  "max_tools": 20,
  "max_authorized_uris": 50,
  "wildcard_uri_allowed": false,
  "retention_days": 30
}

Key	Default	Meaning
`rate_per_min`	`60`	Inline runs per minute per org
`manifest_bytes`	`65536`	Max size of the inline manifest
`prompt_bytes`	`200000`	Max size of the agent prompt
`max_skills`	`20`	Max number of skills declared in the inline manifest
`max_tools`	`20`	Max number of tools declared in the inline manifest
`max_authorized_uris`	`50`	Max entries in each `authorizedUris` allowlist
`wildcard_uri_allowed`	`false`	Allow `*` in `authorizedUris` entries
`retention_days`	`30`	Days before the shadow package backing an inline run is garbage-collected

Per-endpoint limits

These are set via the rateLimit(N) middleware and recorded as method:path:identity buckets.

Endpoint	Limit / min
`POST /api/agents/@scope/name/run`	20
`POST /api/runs/inline`, `/api/runs/inline/validate`	`INLINE_RUN_LIMITS.rate_per_min` (default 60)
`POST /api/packages/import`, `/api/packages/import-github`	10
`GET /api/packages/@scope/name/{version}/download`	50
`POST /api/end-users`	60
`GET /api/end-users`, `/api/end-users/:id`	300
`PATCH /api/end-users/:id`, `DELETE /api/end-users/:id`	60
`POST /api/proxies/:id/test`	5
`POST /api/provider-keys/test`, `/api/provider-keys/:id/test`	5
`POST /api/models/test`, `/api/models/:id/test`	5
`GET /api/models/openrouter`	10
`POST /api/connection-profiles`	10
`POST /api/app-profiles`, `/api/app-profiles/:id/bind`	10
`POST /api/schedules`	10
`POST /api/uploads`	20

Unauthenticated routes (signup, login, public health checks) are keyed by IP with equivalent per-route limits; the single global "60/min per IP" bucket described in earlier versions of this doc was never a real thing.

Response on limit

HTTP/1.1 429 Too Many Requests
Content-Type: application/problem+json
Retry-After: 15
RateLimit: limit=20, remaining=0, reset=15
RateLimit-Policy: 20;w=60

{
  "type": "https://appstrate.dev/errors/rate-limited",
  "title": "Rate limit exceeded",
  "status": 429,
  "code": "rate_limited",
  "retryAfter": 15,
  "requestId": "req_..."
}

The three IETF structured headers (RateLimit, RateLimit-Policy, Retry-After) are emitted on every 429 and on non-429 responses from rate-limited routes (so clients can pre-emptively back off). Respect Retry-After (seconds). Implement exponential backoff for repeated 429s.

Per-endpoint specifics

Realtime (SSE)

SSE endpoints (/api/realtime/*) have no per-message rate limit — once the stream is open, the server fans events out as they arrive. Connection establishment goes through the normal auth pipeline but not a dedicated rate limiter.

Webhook deliveries (outbound from Appstrate)

Outbound webhook deliveries run in a BullMQ worker outside the HTTP request pipeline, so they are not subject to these limits. Plan your receiver's rate limits to accommodate burst retries (the retry backoff is 30s → 5min → 30min → 1h → 2h → 3h → 4h, see Webhooks).

Adjust PLATFORM_RUN_LIMITS globally and scale other endpoint caps as needed, or
Run your API key behind a reverse proxy that applies its own rate policy before reaching Appstrate.

Rate Limits

How buckets are keyed

Platform-wide limits (`PLATFORM_RUN_LIMITS`)

Inline run limits (`INLINE_RUN_LIMITS`)

Per-endpoint limits

Response on limit

Per-endpoint specifics

Realtime (SSE)

Webhook deliveries (outbound from Appstrate)

Write endpoints with `Idempotency-Key`

Monitoring

Bypasses

On this page