NOUKAI

OpenRouter & OpenAI SDK Patterns

Pattern catalog for auditing raw OpenAI/OpenRouter SDK chained LLM code — no framework.

This page is a reference for the Audit & Design phase. It covers chained LLM call patterns built with the raw OpenAI or OpenRouter SDK — no LangChain or other framework.

Imports to Search For

Python

from openai import OpenAI, AsyncOpenAI
import openai                                   # older style
openai.ChatCompletion.create(...)               # legacy SDK (< v1.0)
client = OpenAI(base_url="https://openrouter.ai/api/v1")  # OpenRouter

TypeScript/JavaScript

import OpenAI from "openai";
const openai = new OpenAI({ baseURL: "https://openrouter.ai/api/v1" });

Function Calls to Search For

client.chat.completions.create(
openai.chat.completions.create(
await client.chat.completions.create(
openai.ChatCompletion.create(          # legacy, pre v1.0

Code Shapes

Single LLM Call (No Chaining)

The simplest pattern. Maps directly to a single LLM block in Noukai.

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
 
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": user_input},
    ],
)
result = response.choices[0].message.content

What to capture: The system message becomes the block's prompt. The model becomes the block's model config. The user message is the flow input.

Sequential Chain (Output Feeds Next Call)

The most common manual chaining pattern. Maps to a sequential Noukai flow.

# Step 1: Classify
classify_response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Classify this message into: billing, technical, general"},
        {"role": "user", "content": user_message},
    ],
)
category = classify_response.choices[0].message.content
 
# Step 2: Generate response based on category
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": f"You are a {category} support specialist. Help the user."},
        {"role": "user", "content": user_message},
    ],
)
answer = response.choices[0].message.content

What to capture: Each client.chat.completions.create call becomes a block. The data passed between calls (here, category) shows the block-to-block data flow. In Noukai, this becomes {{previous_output}} in the second block's prompt.

Multi-Turn Conversation Loop

A loop that accumulates message history. Maps to a single LLM block called repeatedly by the user's code (Noukai handles one turn at a time, the calling code manages history).

messages = [{"role": "system", "content": "You are a helpful assistant."}]
 
while True:
    user_input = input("You: ")
    messages.append({"role": "user", "content": user_input})
 
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
    )
 
    assistant_message = response.choices[0].message.content
    messages.append({"role": "assistant", "content": assistant_message})
    print(f"Assistant: {assistant_message}")

Token bloat alert: This pattern re-sends the entire conversation history on every call. As the conversation grows, token usage grows linearly. When migrating, the calling code should manage history and pass only relevant context to the Noukai flow — or use a separate "summarize history" flow to compress context.

Tool Calling / Function Calling Loop

An LLM calls tools in a loop until it has a final answer. This is the SDK equivalent of a LangChain agent.

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        },
    }
]
 
messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]
 
while True:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
    )
    msg = response.choices[0].message
 
    if msg.tool_calls:
        messages.append(msg)
        for call in msg.tool_calls:
            result = dispatch_tool(call.function.name, call.function.arguments)
            messages.append({
                "role": "tool",
                "tool_call_id": call.id,
                "content": result,
            })
    else:
        print(msg.content)
        break

Like LangChain agents, tool-calling loops are hard to migrate directly because Noukai flows are DAGs, not loops. Consider: (1) if the tools are simple data lookups, have the calling code run the tools and pass results as input to a single Noukai flow, (2) decompose into a classification block + specialized handler blocks, or (3) keep the tool loop as-is and only migrate the non-loop chains.

Router Pattern (Pick-Next-Prompt)

A first LLM call decides which prompt to use for the second call. Maps to a branching Noukai flow with a router block.

# Step 1: Route
router_response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": f"Classify this query into exactly one category: search, calculator, knowledge_base.\n\nQuery: {user_query}",
    }],
)
route = router_response.choices[0].message.content.strip().lower()
 
# Step 2: Dispatch
if route == "search":
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Search the web for: {user_query}"}],
    )
elif route == "calculator":
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Calculate: {user_query}"}],
    )
else:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Answer from knowledge: {user_query}"}],
    )

What to capture: The routing call becomes a router block. Each branch becomes a downstream block. In Noukai, this can be modeled as a sequential flow where the first block classifies and the second block uses {{previous_output}} to condition its response, or as a branching topology.

Fan-Out / Aggregate

Multiple independent LLM calls run concurrently, then results are combined. Maps to parallel blocks in a v container.

import asyncio
 
async def analyze(client, text):
    sentiment_task = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Sentiment of: {text}"}],
    )
    keywords_task = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Extract keywords from: {text}"}],
    )
    summary_task = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Summarize in one sentence: {text}"}],
    )
 
    sentiment, keywords, summary = await asyncio.gather(
        sentiment_task, keywords_task, summary_task
    )
    return {
        "sentiment": sentiment.choices[0].message.content,
        "keywords": keywords.choices[0].message.content,
        "summary": summary.choices[0].message.content,
    }

What to capture: Each concurrent call becomes a parallel block. The aggregation at the end is handled by Noukai's parallel container output merging — or by a final passthrough or code block if custom merging is needed.

Common Pain Points

These are the problems that make migration worthwhile. Reference them in the "Structural Improvement" column during audit.

ProblemHow to Spot ItNoukai Advantage
History bloatmessages array grows unbounded; full history re-sent every callBlock-to-block data passing — only relevant data flows forward
No cachingSame prompt + input produces same output but always re-calls the APIVersioned flows enable caching strategies at the infrastructure level
Sequential bottleneckIndependent calls run one after another (await in series)Parallel containers execute independent blocks concurrently
Prompt sprawlSystem prompts scattered across files, hardcoded in stringsCentralized, versioned prompts in the Noukai flow editor
No observabilityNo tracing, logging is ad-hoc print() or custom codeBuilt-in step-level tracing and SSE streaming
Ad-hoc retriesManual try/except with time.sleep retry loopsManaged execution with retry policies
No versioningPrompt changes = code changes = deploy cycleFlow versioning — publish, rollback, A/B test without code changes
No structured outputParsing LLM text output with regex or string splittingBlock output schemas enforce JSON structure

Mapping Cheat Sheet

Raw SDK PatternNoukai Equivalent
Single completions.createSingle LLM block
Sequential calls (output → next input)Sequential blocks, use {{previous_output}}
asyncio.gather / Promise.all on independent callsBlocks in a v (parallel) container
Router + if/else dispatchRouter block → conditional downstream blocks
Tool-calling while loopDecompose into classification + handler blocks, or keep as-is
Conversation history arrayCalling code manages history, passes relevant context as flow input
JSON mode / response_formatBlock output schema
System messageBlock prompt text
Temperature / max_tokensBlock config via update_block_config