Sequential Decomposition
Two blocks where the second consumes a distilled output of the first.
When to reach for this
Reach for sequential-decomposition when the work splits into ordered, independent reasoning tasks:
- Two or more LLM steps, each with ONE responsibility. E.g. "identify the word's sense" then "generate examples for that pinned sense" — each block is small, scoped, individually testable.
- Step N's output is smaller than its input. This is the lever for token efficiency: distill, don't re-pass the raw user input through every block.
- A failure in step N can be diagnosed and fixed in isolation. If you can't reason about block 2 without thinking about block 1's prompt, the split is wrong — collapse them.
- The steps have a natural ordering. "First detect, then act" — not "any order, all needed" (that's parallel-fan-out).
If two steps share most of their context AND share their reasoning surface, single-block is cheaper. If the steps have NO causal dependency, parallel-fan-out is faster.
Two blocks where the second consumes a distilled output of the first.
When to use: the task has two distinct concerns that benefit from being separated — e.g., extracting structured data vs. writing prose. Splitting also lets you cache or reuse the intermediate result, and lets each block use a different model if needed.
Common mistake: passing the original input forward alongside the distilled output. The whole point of decomposition is that the second block works from the condensed representation. Re-passing the raw article doubles your token cost on every call.