API Docs

Rate limits

Three tiers, three scopes. They’re live on production today — not aspirational — and the values below are the real numbers.

Overview

Why we rate-limit

Two things are scarce on our end: the worker pool that runs humanization, and our Claude API budget. Rate limits exist to keep both of those healthy when traffic spikes — yours or someone else’s. The limits are intentionally generous for a sane integration and intentionally strict against runaway scripts.

Three tiers cover everything we serve, scoped one of three ways: per IP, per authenticated user, or per IP for unauthenticated traffic. Everything below applies to the public production API at https://api.inksong.appright now. There’s no “coming soon” in this section.

The limits

What applies where

Endpoint groupLimitScope
Auth endpoints
/auth/login, /auth/register, /auth/refresh
10 / minuteper IP
Document uploads
POST /api/v1/documents/upload
20 / hourper authenticated user
Public reads
everything else
60 / minuteper IP

Uploads are scoped per authenticated user rather than per IP on purpose. A school computer lab or a corporate VPN can put a dozen of your customers behind a single egress address — counting them all against the same bucket would let one user starve the rest. The per-user-per-hour upload limit is the real cost-protective one: each upload triggers a worker job and a Claude API call, and twenty an hour is well above what any legitimate single-user workflow needs.

429s

The rate-limit response

When you exceed any of the limits above, we respond with HTTP 429 Too Many Requests, a Retry-After header in whole seconds, and the standard application-error body shape.

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 42

{"detail": "Rate limit exceeded"}

The number in Retry-Afteris the number of seconds until your bucket has capacity again. Always respect it. Hammering past a 429 will keep your bucket pinned at zero and won’t earn you any throughput — it just makes the next legitimate request take longer.

Don’t retry without waiting

We’ve seen integrations loop on 429 with a 50ms back-off. Don’t. Your worker pool will idle, our worker pool will idle, and you’ll see no successful responses until traffic naturally drops off.

Back-off

Recommended pattern

Trivially: try the request, on 429 sleep Retry-Afterseconds, then retry. Cap retries at three so a transient outage doesn’t turn into a hung process.

import time
import httpx

def call_with_backoff(client, request, *, max_retries=3):
    for attempt in range(max_retries + 1):
        response = client.send(request)
        if response.status_code != 429:
            return response
        if attempt == max_retries:
            return response  # give up, let the caller see the 429
        retry_after = int(response.headers.get("Retry-After", "1"))
        time.sleep(retry_after)
    return response

If Retry-After is somehow missing — middleware, proxy, a bug on our end — fall back to exponential back-off starting at one second and doubling. Three retries is enough for a transient blip; beyond that, surface the error to your caller and let them decide.