Overview
Why we rate-limit
Two things are scarce on our end: the worker pool that runs humanization, and our Claude API budget. Rate limits exist to keep both of those healthy when traffic spikes — yours or someone else’s. The limits are intentionally generous for a sane integration and intentionally strict against runaway scripts.
Three tiers cover everything we serve, scoped one of three ways: per IP, per authenticated user, or per IP for unauthenticated traffic. Everything below applies to the public production API at https://api.inksong.appright now. There’s no “coming soon” in this section.
The limits
What applies where
| Endpoint group | Limit | Scope |
|---|---|---|
Auth endpoints/auth/login, /auth/register, /auth/refresh | 10 / minute | per IP |
Document uploadsPOST /api/v1/documents/upload | 20 / hour | per authenticated user |
| Public reads everything else | 60 / minute | per IP |
Uploads are scoped per authenticated user rather than per IP on purpose. A school computer lab or a corporate VPN can put a dozen of your customers behind a single egress address — counting them all against the same bucket would let one user starve the rest. The per-user-per-hour upload limit is the real cost-protective one: each upload triggers a worker job and a Claude API call, and twenty an hour is well above what any legitimate single-user workflow needs.
429s
The rate-limit response
When you exceed any of the limits above, we respond with HTTP 429 Too Many Requests, a Retry-After header in whole seconds, and the standard application-error body shape.
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 42
{"detail": "Rate limit exceeded"}The number in Retry-Afteris the number of seconds until your bucket has capacity again. Always respect it. Hammering past a 429 will keep your bucket pinned at zero and won’t earn you any throughput — it just makes the next legitimate request take longer.
We’ve seen integrations loop on 429 with a 50ms back-off. Don’t. Your worker pool will idle, our worker pool will idle, and you’ll see no successful responses until traffic naturally drops off.
Back-off
Recommended pattern
Trivially: try the request, on 429 sleep Retry-Afterseconds, then retry. Cap retries at three so a transient outage doesn’t turn into a hung process.
import time
import httpx
def call_with_backoff(client, request, *, max_retries=3):
for attempt in range(max_retries + 1):
response = client.send(request)
if response.status_code != 429:
return response
if attempt == max_retries:
return response # give up, let the caller see the 429
retry_after = int(response.headers.get("Retry-After", "1"))
time.sleep(retry_after)
return responseIf Retry-After is somehow missing — middleware, proxy, a bug on our end — fall back to exponential back-off starting at one second and doubling. Three retries is enough for a transient blip; beyond that, surface the error to your caller and let them decide.