Self-Hosting

Rate Limits

Default rate limits, per-endpoint behavior, and how to configure them.

Appstrate uses rate-limiter-flexible for rate limiting. With REDIS_URL set, buckets are shared across instances (RateLimiterRedis). Without Redis, buckets are process-local (RateLimiterMemory) and reset on restart.

How buckets are keyed

Rate limits are per-endpoint, not just per-identity. The key format is:

  • Session auth: {method}:{path}:{userId}
  • API key auth: {method}:{path}:apikey:{apiKeyId}
  • Unauthenticated / public routes: ip:{method}:{path}:{ipAddr}
  • Internal bearer tokens: internal:{path}:{tokenPrefix}

Two identical identities hitting different paths have separate counters, so a single API key can simultaneously hit its POST /run limit without affecting its POST /end-users limit.

On top of per-endpoint limits, two org-wide rate limits run in parallel:

  • The org-wide run rate limit (keyed by orgId, configured via PLATFORM_RUN_LIMITS.per_org_global_rate_per_min) governs any run launch.
  • The org-wide inline rate limit (keyed by orgId, INLINE_RUN_LIMITS.rate_per_min) is layered on top for POST /api/runs/inline.

Platform-wide limits (PLATFORM_RUN_LIMITS)

JSON object env var. Validated strictly at boot by apps/api/src/services/run-limits.ts. Defaults are conservative but production-ready — the system is never "unlimited" out of the box.

{
  "timeout_ceiling_seconds": 1800,
  "per_org_global_rate_per_min": 200,
  "max_concurrent_per_org": 50
}
KeyDefaultMeaning
timeout_ceiling_seconds1800 (30 min)Max runtime per run. Clamps any agent-declared timeout. Hitting the ceiling emits a run.timeout webhook event.
per_org_global_rate_per_min200Runs per minute per organization (agent run + inline run + scheduled run).
max_concurrent_per_org50Concurrent runs per organization. Extra runs are rejected with 429 until a slot frees.

Inline run limits (INLINE_RUN_LIMITS)

Applies only to POST /api/runs/inline.

{
  "rate_per_min": 60,
  "manifest_bytes": 65536,
  "prompt_bytes": 200000,
  "max_skills": 20,
  "max_tools": 20,
  "max_authorized_uris": 50,
  "wildcard_uri_allowed": false,
  "retention_days": 30
}
KeyDefaultMeaning
rate_per_min60Inline runs per minute per org
manifest_bytes65536Max size of the inline manifest
prompt_bytes200000Max size of the agent prompt
max_skills20Max number of skills declared in the inline manifest
max_tools20Max number of tools declared in the inline manifest
max_authorized_uris50Max entries in each authorizedUris allowlist
wildcard_uri_allowedfalseAllow * in authorizedUris entries
retention_days30Days before the shadow package backing an inline run is garbage-collected

Per-endpoint limits

These are set via the rateLimit(N) middleware and recorded as method:path:identity buckets.

EndpointLimit / min
POST /api/agents/@scope/name/run20
POST /api/runs/inline, /api/runs/inline/validateINLINE_RUN_LIMITS.rate_per_min (default 60)
POST /api/packages/import, /api/packages/import-github10
GET /api/packages/@scope/name/{version}/download50
POST /api/end-users60
GET /api/end-users, /api/end-users/:id300
PATCH /api/end-users/:id, DELETE /api/end-users/:id60
POST /api/proxies/:id/test5
POST /api/provider-keys/test, /api/provider-keys/:id/test5
POST /api/models/test, /api/models/:id/test5
GET /api/models/openrouter10
POST /api/connection-profiles10
POST /api/app-profiles, /api/app-profiles/:id/bind10
POST /api/schedules10
POST /api/uploads20

Unauthenticated routes (signup, login, public health checks) are keyed by IP with equivalent per-route limits; the single global "60/min per IP" bucket described in earlier versions of this doc was never a real thing.

Response on limit

HTTP/1.1 429 Too Many Requests
Content-Type: application/problem+json
Retry-After: 15
RateLimit: limit=20, remaining=0, reset=15
RateLimit-Policy: 20;w=60

{
  "type": "https://appstrate.dev/errors/rate-limited",
  "title": "Rate limit exceeded",
  "status": 429,
  "code": "rate_limited",
  "retryAfter": 15,
  "requestId": "req_..."
}

The three IETF structured headers (RateLimit, RateLimit-Policy, Retry-After) are emitted on every 429 and on non-429 responses from rate-limited routes (so clients can pre-emptively back off). Respect Retry-After (seconds). Implement exponential backoff for repeated 429s.

Per-endpoint specifics

Realtime (SSE)

SSE endpoints (/api/realtime/*) have no per-message rate limit — once the stream is open, the server fans events out as they arrive. Connection establishment goes through the normal auth pipeline but not a dedicated rate limiter.

Webhook deliveries (outbound from Appstrate)

Outbound webhook deliveries run in a BullMQ worker outside the HTTP request pipeline, so they are not subject to these limits. Plan your receiver's rate limits to accommodate burst retries (the retry backoff is 30s → 5min → 30min → 1h → 2h → 3h → 4h, see Webhooks).

Write endpoints with Idempotency-Key

A replayed request (same key, same body) returns the cached response without consuming a new rate-limit point. Note that the original request that populated the cache did consume a point when it was first executed.

Monitoring

Rate-limit hit counts are emitted as structured log lines at DEBUG level. Forward Appstrate logs to your observability stack and alert on sustained rate_limited codes to detect either abusive clients or misconfigured per-endpoint caps.

Bypasses

There is no built-in bypass. Admin role, dev mode, and IP allowlists are all unimplemented. If you need differentiated limits for a specific tenant, either:

  • Adjust PLATFORM_RUN_LIMITS globally and scale other endpoint caps as needed, or
  • Run your API key behind a reverse proxy that applies its own rate policy before reaching Appstrate.

On this page