The prompt strategy behind document-level humanization

title: "The prompt strategy behind document-level humanization" description: "Most paraphrasers work sentence by sentence. Coherent humanization needs cross-paragraph awareness. Here's how Inksong structures the Claude prompt." date: "2026-05-13" tag: "Engineering" author: "Inksong"

The obvious way to humanize a document is the wrong way. You split it into paragraphs, you fire off an API call for each one, you stitch the responses back together. The math is clean, the latency is parallelizable, and the output reads like it was written by three different people. By paragraph three the voice has drifted. By paragraph six the register has shifted from "academic" to "blog." References to earlier sections no longer resolve. A pronoun that pointed back to a named subject now points nowhere.

We learned this the hard way and rebuilt the pipeline.

The whole-document prompt

Inksong sends the full document to Claude in a single call. There is one system prompt, one user message containing the document, and one response. Everything the model needs to maintain coherence — domain, tone, target humanness, the optional voice profile — lives in the system prompt before the document ever appears.

The system prompt is structured around a few non-negotiables. It states the domain (academic, legal, pastoral, medical, marketing, or technical) so vocabulary doesn't drift mid-paragraph. It states the tone (one of seven: balanced, academic, pastoral, casual, professional, creative, journalistic). It states the target humanness as a number between 0 and 100, which calibrates how aggressively to rewrite sentence structure. If a voice profile is attached, its extracted features — n-gram fingerprints, sentence-length distribution, burstiness, vocabulary specificity, hedging density — are listed as soft constraints, not as a style sheet to copy.

Then come the cross-paragraph rules. Maintain the register established in the opening. Preserve citation markers exactly as they appear. Keep reference resolution intact: if paragraph 2 says "the model," paragraph 4 should not call it "the system." Do not introduce new claims; do not drop existing ones.

The document arrives last, untouched. Claude rewrites it as one continuous artifact.

What the system prompt looks like

The structure is plain enough to describe without showing the literal text. It opens with role framing: you are rewriting a document so it reads as if a human wrote it, while preserving meaning, structure, and citations. It states the rewrite rules: vary sentence length, introduce mild burstiness, shift toward the specified tone, match the voice profile features if any.

Then it lists what not to change. Citations stay verbatim — every [1], every (Smith, 2019), every footnote marker. Technical terms in their canonical form stay canonical. List structures stay as lists. Headings stay as headings. Markdown stays as Markdown. Tables stay as tables.

Finally it specifies output format: return the rewritten document as one block, no commentary, no preamble, no "here is your rewritten version." Claude is good at following that last instruction; we still strip leading whitespace defensively.

The system prompt is about 600 tokens. The document is whatever the document is, capped at 5,000 words on Pro and 10,000 on Enterprise. The response is roughly the same length as the input.

Why one big call beats many small ones

The token cost of a single 5,000-word call is meaningfully higher than the cost of, say, twenty 250-word calls — there's no free lunch on context. But the coherence delta is large enough that we eat the cost on documents up to the Pro cap.

The break-even point we observed is around 3,000 words. Below that, chunked calls produced output that was also coherent, because there weren't enough paragraphs for drift to accumulate. Above 3,000, the chunked version started to read like a collaboration that nobody edited. Above 5,000, it was obvious.

Enterprise documents go up to 10,000 words. At that size we do split, but not by paragraph and not in parallel. We use a sequential chunked path where each chunk's prompt includes an explicit handoff: the closing two sentences of the previous chunk's output, plus a recap of the voice features that were maintained. The model sees what it just produced and continues from there. It is slower and more expensive than the single-call path. It is the right tradeoff for long documents.

Where this still breaks

Very long documents stretch model attention. Even with the chunked handoff, a 9,000-word document occasionally exhibits subtle drift in the last third — the voice features hold but the rhythm flattens. We catch some of this in QA and re-run the affected chunks with a tighter prompt; we don't catch all of it.

Deeply nested formatting confuses the prompt. A footnote inside a table cell inside a section heading is rare in the corpus but real, and the model sometimes flattens the nesting on output. We degrade gracefully — the text survives, the structure does not always.

Mixed-language documents are the hardest. A passage of English with embedded French citations or German technical terms can confuse the rewrite boundary, and the model occasionally over-translates a term we wanted preserved. We pass an explicit "preserve non-English spans verbatim" rule in the system prompt when we detect this, which helps but doesn't eliminate it.

Idioms are a small recurring failure. The model sometimes "corrects" an idiom into a literal phrase. We treat this as a quality bug, not a prompt bug, and keep iterating.

What we don't do

We don't fine-tune. The base instruction-following on Claude is strong enough that prompt engineering is doing the work. Fine-tuning would require a humanization corpus we don't have and don't want to synthesize from competitors.

We don't run an agent loop. There's a tempting design where you humanize, score, and re-humanize until the score crosses a threshold. We tried it. It produces over-rewritten text that has lost the source's actual content. AI-detection scores are heuristics; optimizing against them directly is a way to optimize against meaning.

We don't post-process with rule-based substitutions. No "replace 'utilize' with 'use,' replace 'in order to' with 'to.'" Those rules look helpful and produce identifiable patterns of their own. The single-call approach with a well-structured system prompt is the entire pipeline.

Closer

The roadmap from here is domain-specific prompt variants. Academic and pastoral already have their own system prompts; legal and medical are in progress. The shape of the rewrite differs by domain — what counts as "more human" in a legal brief is not what counts in a sermon — and a generic prompt is leaving quality on the table.

If you want to understand the voice features this prompt consumes, read Voice Cloning under the hood.