Rate limits — Inksong

Overview

Why we rate-limit

Two things are scarce on our end: the worker pool that runs humanization, and our Claude API budget. Rate limits exist to keep both of those healthy when traffic spikes — yours or someone else’s. The limits are intentionally generous for a sane integration and intentionally strict against runaway scripts.

Three tiers cover everything we serve, scoped one of three ways: per IP, per authenticated user, or per IP for unauthenticated traffic. Everything below applies to the public production API at https://api.inksong.appright now. There’s no “coming soon” in this section.

The limits

What applies where

Endpoint group	Limit	Scope
Auth endpoints `/auth/login`, `/auth/register`, `/auth/refresh`	10 / minute	per IP
Document uploads `POST /api/v1/documents/upload`	20 / hour	per authenticated user
Public reads everything else	60 / minute	per IP

Uploads are scoped per authenticated user rather than per IP on purpose. A school computer lab or a corporate VPN can put a dozen of your customers behind a single egress address — counting them all against the same bucket would let one user starve the rest. The per-user-per-hour upload limit is the real cost-protective one: each upload triggers a worker job and a Claude API call, and twenty an hour is well above what any legitimate single-user workflow needs.

429s

The rate-limit response

When you exceed any of the limits above, we respond with HTTP 429 Too Many Requests, a Retry-After header in whole seconds, and the standard application-error body shape.

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 42

{"detail": "Rate limit exceeded"}

The number in Retry-Afteris the number of seconds until your bucket has capacity again. Always respect it. Hammering past a 429 will keep your bucket pinned at zero and won’t earn you any throughput — it just makes the next legitimate request take longer.

Don’t retry without waiting

We’ve seen integrations loop on 429 with a 50ms back-off. Don’t. Your worker pool will idle, our worker pool will idle, and you’ll see no successful responses until traffic naturally drops off.

Back-off

Recommended pattern

Trivially: try the request, on 429 sleep Retry-Afterseconds, then retry. Cap retries at three so a transient outage doesn’t turn into a hung process.

import time
import httpx

def call_with_backoff(client, request, *, max_retries=3):
    for attempt in range(max_retries + 1):
        response = client.send(request)
        if response.status_code != 429:
            return response
        if attempt == max_retries:
            return response  # give up, let the caller see the 429
        retry_after = int(response.headers.get("Retry-After", "1"))
        time.sleep(retry_after)
    return response

If Retry-After is somehow missing — middleware, proxy, a bug on our end — fall back to exponential back-off starting at one second and doubling. Three retries is enough for a transient blip; beyond that, surface the error to your caller and let them decide.