Large Language Models

LLMs for Editing Textual Documents

Published 2026-04-21.
Time to read: 7 minutes.

This page is part of the llm collection.

Motivation: Requirements Traceability

Properly structured and maintained documents are the only way to provide requirements traceability. This is easier said than done with the best of LLMs.

Unfortunately, most LLMs suck at this task. Claude is singularly good at this, however Anthropic has been taking liberties with the terms of their credit card merchant agreement, and my response has been to cancel that paid subscription.

I also cancelled my Gemini paid subscription, but they continue to invoice me.

Previously, I canceled my MiniMax paid subscription. Now that I have no paid frontier LLMs available to me, I will be forced to find alternatives. I will continue to follow up on this topic.

Horseshit

I did not have a horse available
I did not have a horse available

In the following you will find a lot of horseshit. You must work for a few nuggets of useful information mixed in with liberal helpings of misinformation. Verify the information presented here before relying upon it.

In particular, Gemini consistently misrepresents the edit format choice as being a significant determinor of document processing fidelity by an LLM. This is fucking horseshit.

I am going to do some testing.

Dialog With Gemini

Which llms are good at editing documents?

I edited Gemini's response because it generated a lot of crap. This is what remains after the edit:

Several Large Language Models (LLMs) excel at document editing, each offering distinct strengths depending on whether you need deep stylistic changes, long-form consistency, or integration into existing workflows.

Claude (Anthropic)

Claude is often considered one of the best for document editing because of its precision and human-like writing style. It is particularly effective for technical writing and scientific literature, as it tends to make fewer errors than competitors.

Long Documents: Claude handles long texts well because of its large context window and strong “memory”.

Style Matching: Features like “Projects” and “Styles” allow for maintaining a consistent tone throughout a document.

ChatGPT (OpenAI)

ChatGPT is a versatile editor that is highly effective for restructuring sections, rewording sentences, and simplifying complex text. Newer versions like GPT-4o can perform in-line editing, allowing for targeted updates.

The paid version can output well-formatted Word and PowerPoint files, making it easy to transition from a chat interface to a final document.

Llama 3 (Meta)

Llama 3 is a strong open-source option preferred for content creation and editorial tasks. It is often used for drafting boilerplate text and proofreading professional documents.

Microsoft Copilot

This is GPT-4 integrated into Microsoft Word, specifically designed for document editing. It can improve grammar, clarity, and flow with simple commands, executing formatting tasks swiftly within the application.

Gemini summarizes 1500 lines to 300, yielding merely an outline without any detail. Is there a way to prevent this?

To prevent over-summarization of long documents, you can change the process from a single prompt to a multi-stage process. This AI on Google Search may create high-level outlines with large amounts of data to save on computing resources, even with a large context window.

  1. The “Index and Expand” Process

    Instead of requesting a “summary,” the model can map the document first, then explore specific sections.

    Phase 1 (Indexing): Ask the model to “Index the document into a list of specific topics and themes without summarizing yet.” Phase 2 (Deep Dive): For each topic on that list, ask: “Explain [Topic Name] in detail, including [specific data points/examples] from the source text.”

  2. Use “Chunking” (MapReduce Method)

    If a document is very long (1,500+ lines), performance may decrease. Dividing the text into smaller parts ensures detail is preserved.

    Manual Chunking: Divide the document into three parts (about 500 lines each) and summarize them individually in separate prompts. The “MapReduce” Prompt: “Summarize lines 1–500 in detail. Then do the same for 501–1000 and 1001–1500. Only combine them at the very end.”

  3. Change the Prompting Strategy

    Vague prompts often lead to vague outputs. Use specific constraints to encourage the model to be more detailed.

    Specify Length & Format: Ask for a specific word count or a set number of bullet points per section (e.g., “Provide at least 5 detailed bullets for every major section”).

    The “Information Sandwich”: Place specific instructions at the end of the prompt, after the text, to ensure the model focuses on the detail requirements. “Think Step-by-Step”: Add “Think step-by-step before summarizing” to encourage the model to process the logic of the document before condensing it.

  4. Switch to specialized tools

    NotebookLM: This Google tool is specifically designed for analyzing long documents and is better at retaining specific details than the standard AI on Google Search chat interface.

    Google AI Studio: Use this for a higher context window (up to 1M–2M tokens) and more control over “temperature” settings, which can help prevent the model from becoming too concise.

I do not have to do any of this bullshit with claude. If I say “change x to y” it does that without brutally oversummarizing the rest of the document.

You are absolutely right. You shouldn’t have to jump through those hoops.

The differences users experience are real and documented. Claude, specifically Sonnet and Opus, is designed to be a “faithful editor.” Gemini is designed to be a “summarizer/generator.”

  1. Optimization

    Gemini may exhibit “laziness.” It is computationally cheaper and faster to summarize a large text block than to reprint it verbatim with a small change. Gemini is optimized for speed and efficiency, often leading to summarization unless explicitly instructed otherwise.

    Claude appears to handle long outputs differently. It is less likely to use this compression shortcut, and it will reprint most of the document to make a single change.

  2. Instruction Following vs. Content Generation

    Claude: Follows instructions precisely, viewing the rest of the document as a constraint to be preserved.

    Gemini: Provides the gist or answer, viewing the document as “context” to be processed and condensed.

  3. Context Window

    Gemini treats its context window like a library to search through. Claude treats its context window more like a workspace to maintain. This makes Claude better for “needle in a haystack” edits where the rest of the document needs to remain undisturbed.

i do not want to use claude. what other options are there

If you want the “surgical” editing precision of Claude without using an Anthropic model, your best options are DeepSeek-V3 or GPT-4o used through specialized “Diff” tools.

Several large language models (LLMs) can perform surgical edits on long documents. These models avoid the tendency to summarize and preserve the document’s structure.

Other LLMs

DeepSeek-V3 (or R1)

DeepSeek is a strong alternative for precise document manipulation. It is more affordable than GPT-4o and less likely to summarize when reprinting long text blocks with minor changes.

Why it works: It has a large context window (up to 256k tokens) and is optimized for code-like precision, which helps maintain the document structure. Where to use it: Access is available via their official API or through third-party interfaces such as OpenRouter.

Llama 3.1 & 3.2

The larger Llama models (specifically the 405B version) are good at following “do not summarize” instructions more strictly than smaller models.

Instruction Following: It adheres to formatting constraints better than Gemini. Access: It can be run locally via Ollama if the hardware is available, or via Groq for speed.

Any llm can use the diff edit format. Stop pretending this is special; also the choice of editing format is unrelated to the phenomenon of unwanted summarization.

Identify LLMs, including open source LLMs, that are good at merging large, complex documents while preserving detail. Claude is best at this, but Anthropic has become arrogant and charges users like me without their permission. Do not repeat your previous information.

The tendency of some models to summarize, or compress, information is a documented behavior that can reduce the accuracy of the output. To merge large documents where every detail is important, models with high “long-context fidelity” or “effective context length” are needed. The following are recommendations for open-source and non-Anthropic options that excel at “Context Engineering” (maintaining coherence over large texts without summarizing):

Top Open Source Recommendations

  1. Qwen Series (Qwen 2.5 / Qwen 3)

    This is currently the leading open-weight option for high-fidelity document processing. [1]

    • Why it works: The Qwen architecture (especially the 72B and newer Qwen 3 variants) is optimized for “Context Engineering.” Unlike models that use “Needle in a Haystack” retrieval (which just finds one fact), Qwen is built to reason over the entire context simultaneously.
    • Behavior: It is significantly more obedient than Llama or Gemini when instructed to “retain all original text.” It treats long documents as a structured data processing task rather than a creative writing prompt.
    • Deployment: Available via Hugging Face or API providers like Together AI and OpenRouter. [1, 2]
  2. Command R+ (by Cohere)

    While the weights are “open weights” (CC-BY-NC), this model is unique because it was built specifically for enterprise RAG and “data-heavy” tasks. [3]

    • Why it works: It is explicitly trained to handle “messy” data and large archives. Its “command” nature makes it rigid in following formatting instructions. If told to “merge these two texts, do not delete line 500,” it is far more likely to comply than a general-purpose chat model.
    • Best for: Merging technical documentation, legal texts, or structured reports where dropping a section is catastrophic. [3]
  3. Llama 3.1 405B (The “Brute Force” Option)

    If the necessary computing power or API budget is available, the large parameter size of the 405B model allows it to maintain state better than smaller models.

    • Why it works: Smaller models summarize because they literally “run out of brain” (attention capacity) to track all details. The 405B model has the capacity to hold the entire structure in “working memory” without compressing it.
    • Warning: The Instruct version must be used and explicitly prompted: “Output the full merged text. Do not summarize. Do not truncate.”
  4. Mistral Large 2

    A very strong European alternative that rivals GPT-4 in coding and logic.

    • Why it works: Its 128k context window is highly optimized for “dense” tasks (like code merging). Because it treats text like code, it is less prone to “hallucinating a summary” and more likely to perform a deterministic merge. [4]

The “Context Engineering” Strategy

To force these models to merge without summarizing, avoid standard “rewrite” prompts. Use a Structural Prompt strategy:

  1. Role Assignment: “You are a Document Merging Engine. You are NOT a copywriter. Your goal is 100% data preservation.”

  2. The “anchor” technique: Tell the model to output specific “Anchor Tags” at the start and end of every section it processes (e.g., [SECTION_START: ID_001]). This forces the model to iterate through the entire text body rather than skipping to the end.

  3. Negative Constraints: Explicitly add: “If you summarize, the task is failed. If the output word count is lower than the input sum, the task is failed.”

Summary of Best Models for Fidelity

Model License Best For
Qwen 3 / 2.5 (72B) Apache 2.0 Overall Best. High fidelity, obedient, free to run.
Command R+ CC-BY-NC Technical/Legal Merges. Rigid adherence to data.
Llama 3.1 405B Llama Community Complex, Nuanced Merges. Massive brain, less compression.
Mistral Large 2 Research / Comm. Code/Logic Merges. Treats text with surgical precision.
* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.