OpenRouter & OpenAI SDK Patterns

Pattern catalog for auditing raw OpenAI/OpenRouter SDK chained LLM code — no framework.

This page is a reference for the Audit & Design phase. It covers chained LLM call patterns built with the raw OpenAI or OpenRouter SDK — no LangChain or other framework.

Imports to Search For

Python

from openai import OpenAI, AsyncOpenAI
import openai                                   # older style
openai.ChatCompletion.create(...)               # legacy SDK (< v1.0)
client = OpenAI(base_url="https://openrouter.ai/api/v1")  # OpenRouter

TypeScript/JavaScript

import OpenAI from "openai";
const openai = new OpenAI({ baseURL: "https://openrouter.ai/api/v1" });

Function Calls to Search For

client.chat.completions.create(
openai.chat.completions.create(
await client.chat.completions.create(
openai.ChatCompletion.create(          # legacy, pre v1.0

Code Shapes

Single LLM Call (No Chaining)

The simplest pattern. Maps directly to a single LLM block in Noukai.

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
 
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": user_input},
    ],
)
result = response.choices[0].message.content

What to capture: The system message becomes the block's prompt. The model becomes the block's model config. The user message is the flow input.

Sequential Chain (Output Feeds Next Call)

The most common manual chaining pattern. Maps to a sequential Noukai flow.

# Step 1: Classify
classify_response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Classify this message into: billing, technical, general"},
        {"role": "user", "content": user_message},
    ],
)
category = classify_response.choices[0].message.content
 
# Step 2: Generate response based on category
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": f"You are a {category} support specialist. Help the user."},
        {"role": "user", "content": user_message},
    ],
)
answer = response.choices[0].message.content

What to capture: Each client.chat.completions.create call becomes a block. The data passed between calls (here, category) shows the block-to-block data flow. In Noukai, this becomes {{previous_output}} in the second block's prompt.

Multi-Turn Conversation Loop

A loop that accumulates message history. Maps to a single LLM block called repeatedly by the user's code (Noukai handles one turn at a time, the calling code manages history).

messages = [{"role": "system", "content": "You are a helpful assistant."}]
 
while True:
    user_input = input("You: ")
    messages.append({"role": "user", "content": user_input})
 
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
    )
 
    assistant_message = response.choices[0].message.content
    messages.append({"role": "assistant", "content": assistant_message})
    print(f"Assistant: {assistant_message}")

Token bloat alert: This pattern re-sends the entire conversation history on every call. As the conversation grows, token usage grows linearly. When migrating, the calling code should manage history and pass only relevant context to the Noukai flow — or use a separate "summarize history" flow to compress context.

Tool Calling / Function Calling Loop

An LLM calls tools in a loop until it has a final answer. This is the SDK equivalent of a LangChain agent.

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        },
    }
]
 
messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]
 
while True:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
    )
    msg = response.choices[0].message
 
    if msg.tool_calls:
        messages.append(msg)
        for call in msg.tool_calls:
            result = dispatch_tool(call.function.name, call.function.arguments)
            messages.append({
                "role": "tool",
                "tool_call_id": call.id,
                "content": result,
            })
    else:
        print(msg.content)
        break

Like LangChain agents, tool-calling loops are hard to migrate directly because Noukai flows are DAGs, not loops. Consider: (1) if the tools are simple data lookups, have the calling code run the tools and pass results as input to a single Noukai flow, (2) decompose into a classification block + specialized handler blocks, or (3) keep the tool loop as-is and only migrate the non-loop chains.

Router Pattern (Pick-Next-Prompt)

A first LLM call decides which prompt to use for the second call. Maps to a branching Noukai flow with a router block.

# Step 1: Route
router_response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": f"Classify this query into exactly one category: search, calculator, knowledge_base.\n\nQuery: {user_query}",
    }],
)
route = router_response.choices[0].message.content.strip().lower()
 
# Step 2: Dispatch
if route == "search":
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Search the web for: {user_query}"}],
    )
elif route == "calculator":
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Calculate: {user_query}"}],
    )
else:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Answer from knowledge: {user_query}"}],
    )

What to capture: The routing call becomes a router block. Each branch becomes a downstream block. In Noukai, this can be modeled as a sequential flow where the first block classifies and the second block uses {{previous_output}} to condition its response, or as a branching topology.

Fan-Out / Aggregate

Multiple independent LLM calls run concurrently, then results are combined. Maps to parallel blocks in a v container.

import asyncio
 
async def analyze(client, text):
    sentiment_task = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Sentiment of: {text}"}],
    )
    keywords_task = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Extract keywords from: {text}"}],
    )
    summary_task = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Summarize in one sentence: {text}"}],
    )
 
    sentiment, keywords, summary = await asyncio.gather(
        sentiment_task, keywords_task, summary_task
    )
    return {
        "sentiment": sentiment.choices[0].message.content,
        "keywords": keywords.choices[0].message.content,
        "summary": summary.choices[0].message.content,
    }

What to capture: Each concurrent call becomes a parallel block. The aggregation at the end is handled by Noukai's parallel container output merging — or by a final passthrough or code block if custom merging is needed.

Common Pain Points

These are the problems that make migration worthwhile. Reference them in the "Structural Improvement" column during audit.

Problem	How to Spot It	Noukai Advantage
History bloat	`messages` array grows unbounded; full history re-sent every call	Block-to-block data passing — only relevant data flows forward
No caching	Same prompt + input produces same output but always re-calls the API	Versioned flows enable caching strategies at the infrastructure level
Sequential bottleneck	Independent calls run one after another (`await` in series)	Parallel containers execute independent blocks concurrently
Prompt sprawl	System prompts scattered across files, hardcoded in strings	Centralized, versioned prompts in the Noukai flow editor
No observability	No tracing, logging is ad-hoc `print()` or custom code	Built-in step-level tracing and SSE streaming
Ad-hoc retries	Manual `try/except` with `time.sleep` retry loops	Managed execution with retry policies
No versioning	Prompt changes = code changes = deploy cycle	Flow versioning — publish, rollback, A/B test without code changes
No structured output	Parsing LLM text output with regex or string splitting	Block output schemas enforce JSON structure

Mapping Cheat Sheet

Raw SDK Pattern	Noukai Equivalent
Single `completions.create`	Single LLM block
Sequential calls (output → next input)	Sequential blocks, use `{{previous_output}}`
`asyncio.gather` / `Promise.all` on independent calls	Blocks in a `v` (parallel) container
Router + if/else dispatch	Router block → conditional downstream blocks
Tool-calling while loop	Decompose into classification + handler blocks, or keep as-is
Conversation history array	Calling code manages history, passes relevant context as flow input
JSON mode / `response_format`	Block output schema
System message	Block prompt text
Temperature / max_tokens	Block config via `update_block_config`

OpenRouter & OpenAI SDK Patterns

On this page