Sequential Decomposition

When to reach for this

Reach for sequential-decomposition when the work splits into ordered, independent reasoning tasks:

Two or more LLM steps, each with ONE responsibility. E.g. "identify the word's sense" then "generate examples for that pinned sense" — each block is small, scoped, individually testable.
Step N's output is smaller than its input. This is the lever for token efficiency: distill, don't re-pass the raw user input through every block.
A failure in step N can be diagnosed and fixed in isolation. If you can't reason about block 2 without thinking about block 1's prompt, the split is wrong — collapse them.
The steps have a natural ordering. "First detect, then act" — not "any order, all needed" (that's parallel-fan-out).

If two steps share most of their context AND share their reasoning surface, single-block is cheaper. If the steps have NO causal dependency, parallel-fan-out is faster.

Two blocks where the second consumes a distilled output of the first.

Flow: "Article → Executive Summary"
├── Block: "Extract Key Points" (llm)
│     Output: { points: string[] }
└── Block: "Write Summary" (llm)
      Input: previous output
      Output: { summary: string }

When to use: the task has two distinct concerns that benefit from being separated — e.g., extracting structured data vs. writing prose. Splitting also lets you cache or reuse the intermediate result, and lets each block use a different model if needed.

Common mistake: passing the original input forward alongside the distilled output. The whole point of decomposition is that the second block works from the condensed representation. Re-passing the raw article doubles your token cost on every call.

Sequential Decomposition

When to reach for this

On this page