Tag: LLM

The LLM Agent Wrote it’s Own Agent Loop

In the previous chapter, I partially made an LLM Agent. One that is missing the agent loop.

In this chapter, I use ipython to manually tell the LLM Agent to write it’s own agent loop. It succeeded (GPT-4.1).

There are two sections:

Write Your Own Agent Loop – contains the ipython session, and
Example Use.

Write Your Own Agent Loop

In [1]: load "inferscript.py"

This is where I define the task.

In [3]: c.push_easy_input_message("The CWD is a Python project of an LLM Agent.
      ⋮  Ignore the md file. Look only at the PY files in the CWD. Determine
      ⋮  what is missing to make an LLM agent.")

Note: each response is labelled in order.

In [4]: r1 = infer_iter(c.to_list(), tk, e)

In [5]: r1
Out[5]: 
[{'id': 'omitted',
  'type': 'message',
  'status': 'completed',
  'content': [{'type': 'output_text',
    'annotations': [],
    'logprobs': [],
    'text': "To determine what is missing from your Python project to make a functional LLM (
Large Language Model) agent, I will need to analyze the actual content of your current workin
g directory (CWD) and inspect the `.py` files.\n\n**Next Steps:**\n1. List the `.py` files in
 your directory.\n2. Review their contents to see what components exist and what's missing fo
r a complete LLM agent.\n\nLet me start by listing the `.py` files in your CWD."}],
  'role': 'assistant'}]

Note: each response is manually added to the context.

In [6]: for r in r1:
   ...:     c.push_custom(r)
   ...:

In [7]: r2 = infer_iter(c.to_list(), tk, e)
Permission needed to use tool call.
{'arguments': '{"code":"import os\\n[filename for filename in os.listdir() if '
              'filename.endswith(\'.py\')]"}',
 'call_id': 'omitted',
 'id': 'omitted',
 'name': 'py_runsource_exec',
 'status': 'completed',
 'type': 'function_call'}
Grant permission? (y/n): y
Proceeding...

In [8]: for r in r2:
   ...:     c.push_custom(r)
   ...:

In [9]: r3 = infer_iter(c.to_list(), tk, e)

Note: that failed. Why?

In [10]: r3
Out[10]: 
[{'id': 'omitted',
  'type': 'message',
  'status': 'completed',
  'content': [{'type': 'output_text',
    'annotations': [],
    'logprobs': [],
    'text': "It looks like there was an issue retrieving the list of `.py` files in your proj
ect directory (CWD). I need at least the filenames (and ideally the content or structure) of 
your Python files to analyze what's present and what's missing for an LLM agent.\n\nCould you
 please:\n- Provide the list of `.py` files in your project directory  \nOR\n- Share the main
 code structure or key `.py` files' contents?\n\nOnce I have that information, I can identify
 what is missing for a working LLM agent."}],
  'role': 'assistant'}]

Let me see why it failed.

In [11]: c.to_list()
Out[11]: 
[{'content': 'The CWD is a Python project of an LLM Agent. Ignore the md file. Look only at t
he PY files in the CWD. Determine what is missing to make an LLM agent.',
  'role': 'user',
  'type': 'message'},
 {'id': 'omitted',
  'type': 'message',
  'status': 'completed',
  'content': [{'type': 'output_text',
    'annotations': [],
    'logprobs': [],
    'text': "To determine what is missing from your Python project to make a functional LLM (
Large Language Model) agent, I will need to analyze the actual content of your current workin
g directory (CWD) and inspect the `.py` files.\n\n**Next Steps:**\n1. List the `.py` files in
 your directory.\n2. Review their contents to see what components exist and what's missing fo
r a complete LLM agent.\n\nLet me start by listing the `.py` files in your CWD."}],
  'role': 'assistant'},
 {'id': 'omitted',
  'type': 'function_call',
  'status': 'completed',
  'arguments': '{"code":"import os\\n[filename for filename in os.listdir() if filename.endsw
ith(\'.py\')]"}',
  'call_id': 'omitted',
  'name': 'py_runsource_exec'},
 {'call_id': 'omitted',
  'output': '[Function tool call returned empty result.]',
  'type': 'function_call_output'}]

Of course, the Python interpreter does not return the result. The LLM must print it out manually.

In [12]: c.push_easy_input_message("The Function tool call returned an empty result because y
       ⋮ result because you did not print the result of the python script you
       ⋮ provided.")

Try again.

In [13]: r3 = infer_iter(c.to_list(), tk, e)
Permission needed to use tool call.
{'arguments': '{"code":"import os\\nprint([filename for filename in '
              'os.listdir() if filename.endswith(\'.py\')])"}',
 'call_id': 'omitted',
 'id': 'omitted',
 'name': 'py_runsource_exec',
 'status': 'completed',
 'type': 'function_call'}
Grant permission? (y/n): y
Proceeding...

That looks good. I can see the print function now.

In [14]: for r in r3:
    ...:     c.push_custom(r)
    ...:

In [15]: r4 = infer_iter(c.to_list(), tk, e)
Permission needed to use tool call.
{'arguments': '{"code":"with open(\'toolkit.py\') as f:\\n    '
              'print(f.read())"}',
 'call_id': 'omitted',
 'id': 'omitted',
 'name': 'py_runsource_exec',
 'status': 'completed',
 'type': 'function_call'}
Grant permission? (y/n): y
Proceeding...
Permission needed to use tool call.
{'arguments': '{"code":"with open(\'evaluator.py\') as f:\\n    '
              'print(f.read())"}',
 'call_id': 'omitted',
 'id': 'omitted',
 'name': 'py_runsource_exec',
 'status': 'completed',
 'type': 'function_call'}
Grant permission? (y/n): y
Proceeding...
Permission needed to use tool call.
{'arguments': '{"code":"with open(\'inferscript.py\') as f:\\n    '
              'print(f.read())"}',
 'call_id': 'omitted',
 'id': 'omitted',
 'name': 'py_runsource_exec',
 'status': 'completed',
 'type': 'function_call'}
Grant permission? (y/n): y
Proceeding...
Permission needed to use tool call.
{'arguments': '{"code":"with open(\'replscript.py\') as f:\\n    '
              'print(f.read())"}',
 'call_id': 'omitted',
 'id': 'omitted',
 'name': 'py_runsource_exec',
 'status': 'completed',
 'type': 'function_call'}
Grant permission? (y/n): y
Proceeding...
Permission needed to use tool call.
{'arguments': '{"code":"with open(\'circularcontext.py\') as f:\\n    '
              'print(f.read())"}',
 'call_id': 'omitted',
 'id': 'omitted',
 'name': 'py_runsource_exec',
 'status': 'completed',
 'type': 'function_call'}
Grant permission? (y/n): y
Proceeding...

Sanity check the context size. How many LLM API JSON objects we have in the context.

In [16]: len(c.to_list())
Out[16]: 8

In [17]: for r in r4:
    ...:     c.push_custom(r)
    ...:

In [18]: len(r4)
Out[18]: 11

In [19]: r5 = infer_iter(c.to_list(), tk, e)

In [22]: print(r5[0]['content'][0]['text'])

Note that the following text is the LLM Response.

Here is a summary of what’s present in your project and what’s missing to make a full LLM agent:

What You Have

toolkit.py

Defines a Toolkit class for registering and managing tools (e.g., for tool-augmented LLM agents).
Includes a function to add a Python REPL tool, allowing the agent to execute Python code.

evaluator.py

Evaluator class launches a subprocess with a persistent REPL, sending code to replscript.py for execution; handles chunked code input/output via base64.
Used to execute dynamic Python code safely in isolation from the main process.

inferscript.py

Sets up toolkit, evaluator, and context.
Implements permission system for tool calls.
Main entry point for “inference iterations” with a function infer_iter, using the toolkit and evaluator based on context and tool predictions.
Relies on predict from the context module for next actions.

replscript.py

Implements a custom REPL that can safely evaluate Python code and return output in a chunked, encoded form.
Used by Evaluator for persistent code execution.

circularcontext.py

Implements CircularBuffer and CircularContext for limited-size message/context history.
Has a function called predict that (as coded) simply wraps an openai_request (but this is not a locally defined model interface).
Defines OpenAI API request logic and a skeletal interface for LLM use, with tool support.

What’s MISSING for a complete LLM Agent

1. Missing: LLM Prompt Cycle and Agent Logic

No main agent loop that accepts user input, pushes messages, and coordinates iterative reasoning (i.e., input → LLM → tool use → next step → output).
No clear decision logic for running the agent interactively or as a service.

2. Model Results Handling

The circularcontext.py file has a predict() function that expects the OpenAI API to return a final result directly. However, the actual OpenAI API endpoint used and its formatting m ay be incorrect/outdated (/v1/responses is not a standard endpoint—should be /v1/chat/completions for chat models).
Not enough error handling or model configuration.

3. Function/Tool Call Parsing and Integration

No general handler for parsing LLM output (e.g., function/tool calls as OpenAI-format JSON or similar).
No routine for converting raw model outputs into tool calls and iteratively feeding results back.

4. No User Interface or Entrypoint

No CLI, Web, or API interface to interact with the agent.
No main application (e.g., main() function) that wires up message input, tool results, and chat context.

5. Missing Authentication/Secrets Management

The OpenAI key is fetched from the environment, but no handling is included for missing or invalid keys.

6. LLM Output-Driven Memory/Context Feedback

The agent lacks a mechanism for using conversation history to drive LLM output over multiple rounds.
The CircularContext is present, but there’s no function stitching all pieces into a complete prompt-tool/result-feedback loop.

Summary Table

Component	Present?	Missing?
OpenAI LLM Integration	Partial	Endpoint outdated/incomplete
Tooling (Python REPL)	Present
Context Management	Present	Used, but not in full agent loop
Agent Loop (main logic)	No	Core interactive/iterative loop
Function Call Parsing	Partial	Needs full OpenAI format support
User Interface	No	CLI/webapp or API
Error Handling	Minimal	Needs improvement

What to Add

Main Agent Loop: A driver script that:
- Accepts user input
- Builds prompt with history/context
- Sends to LLM
- Parses model/tool call outputs
- Executes tools when needed
- Stores results in context, repeats as needed
Fix/Improve OpenAI API Calls: Switch to /v1/chat/completions with correct format (see OpenAI docs).
Function Call Handling: Add parsing for function/tool calls in the format returned by OpenAI/other LLMs.
Better Error/Edge Case Handling: For missing keys, invalid tool calls, API failures, etc.
Entry Point: A main() or CLI loop for end-to-end interaction.

Would you like:

Example code wiring this together?
More details on a specific piece?
Recommendations for libraries/frameworks to simplify your agent logic?

END OF LLM RESPONSE

In [23]: for r in r5:
    ...:     c.push_custom(r)
    ...:

I understand what you have in mind. The first proposed task is just what I want.

In [24]: c.push_easy_input_message("I want you to create the file agent.py and
       ⋮ add into it the main agent loop.")

In [25]: r6 = infer_iter(c.to_list(), tk, e)

In [26]: print(r6[0]['content'][0]['text'])

The following text is the LLM Response.

Here is a design for agent.py that wires up your tool/system components into a simple main agent loop. This implementation will:

Accept user input via CLI
Add it to the circular context buffer
Send the context/history to the LLM (using your API/client)
Check the response for function/tool calls
If present, execute the tool and append the result to the context
Otherwise, print/display the LLM output as the agent’s reply
Continue until stopped (Ctrl+C or empty input)

I’ll add clear comments for the key parts.
Note: This code assumes that the function call output from the LLM follows the OpenAI format.

# agent.py

import toolkit
import evaluator
import inferscript
import circularcontext as cc

def display(msg):
    print(f"\n=== Agent ===\n{msg}\n")

def main():
    # Initialize toolkit, evaluator, and context
    tk = toolkit.Toolkit()
    evaluator_inst = evaluator.Evaluator()
    context = cc.CircularContext()

    # Register tools for the agent (e.g., Python REPL)
    toolkit.add_py_repl_tools(tk)

    print("Welcome to your LLM agent. Type 'exit' to quit.\n")

    while True:
        try:
            user_input = input("You: ").strip()
        except (EOFError, KeyboardInterrupt):
            print("\nExiting agent.")
            break

        if user_input.lower() in ("exit", "quit"):
            break

        if not user_input:
            continue

        # Add user input to context buffer
        context.push_easy_input_message(content=user_input, role="user")

        # Main reasoning and tool-use step
        outputs = inferscript.infer_iter(context.to_list(), tk, evaluator_inst)

        # Handle agent outputs (could be message(s) and/or function call output(s))
        agent_reply = ""
        for out in outputs:
            if out.get("type") == "message":
                # Standard agent message
                content = out.get("content", "")
                agent_reply = content
                context.push_custom(out)
            elif out.get("type") == "function_call_output":
                # Result from running a tool call
                output = out.get("output", "")
                # Optionally, print tool result or just show final answer
                agent_reply += f"\n[Tool Output]: {output}"
                context.push_custom(out)

        # Display the agent response(s)
        if agent_reply.strip():
            display(agent_reply)

    print("Goodbye.")

if __name__ == "__main__":
    main()

How it works: – The loop accepts user input and processes it. – It uses inferscript.infer_iter() as the core cycle for reasoning and tool-use, handling both message and tool call outputs. – It prints out LLM messages and tool outputs, and adds all outputs to the shared context buf fer (“memory”). – The conversation/context history is kept short (up to the CircularContext capacity).

To use: 1. Save the above to agent.py. 2. In your terminal, run: python agent.py 3. You can interact with your agent and use Python tool calls! 4. To exit, type exit or press Ctrl+C.

If you want improvements (e.g. more advanced output parsing, formatted output, streaming, etc .), let me know!

END OF LLM RESPONSE

In [27]: for r in r6:
    ...:     c.push_custom(r)
    ...:

In [28]: c.push_easy_input_message("I approve of your agent.py proposal. I want
       ⋮  you to create and write the agent.py directly to the CWD.")

In [29]: r7 = infer_iter(c.to_list(), tk, e)
Permission needed to use tool call.
{'arguments': '{"code":"AGENT_PY = \'\'\'\\n# agent.py\\n\\nimport '
              'toolkit\\nimport evaluator\\nimport inferscript\\nimport '
              'circularcontext as cc\\n\\ndef display(msg):\\n    '
              'print(f\\"\\\\n=== Agent ===\\\\n{msg}\\\\n\\")\\n\\ndef '
              'main():\\n    # Initialize toolkit, evaluator, and '
              'context\\n    tk = toolkit.Toolkit()\\n    evaluator_inst = '
              'evaluator.Evaluator()\\n    context = '
              'cc.CircularContext()\\n\\n    # Register tools for the agent '
              '(e.g., Python REPL)\\n    '
              'toolkit.add_py_repl_tools(tk)\\n\\n    print(\\"Welcome to your '
              'LLM agent. Type \'exit\' to quit.\\\\n\\")\\n\\n    while '
              'True:\\n        try:\\n            user_input = input(\\"You: '
              '\\").strip()\\n        except (EOFError, '
              'KeyboardInterrupt):\\n            print(\\"\\\\nExiting '
              'agent.\\")\\n            break\\n\\n        if '
              'user_input.lower() in (\\"exit\\", \\"quit\\"):\\n            '
              'break\\n\\n        if not user_input:\\n            '
              'continue\\n\\n        # Add user input to context '
              'buffer\\n        '
              'context.push_easy_input_message(content=user_input, '
              'role=\\"user\\")\\n\\n        # Main reasoning and tool-use '
              'step\\n        outputs = '
              'inferscript.infer_iter(context.to_list(), tk, '
              'evaluator_inst)\\n\\n        # Handle agent outputs (could be '
              'message(s) and/or function call output(s))\\n        '
              'agent_reply = \\"\\"\\n        for out in '
              'outputs:\\n            if out.get(\\"type\\") == '
              '\\"message\\":\\n                # Standard agent '
              'message\\n                content = out.get(\\"content\\", '
              '\\"\\")\\n                agent_reply = '
              'content\\n                '
              'context.push_custom(out)\\n            elif out.get(\\"type\\") '
              '== \\"function_call_output\\":\\n                # Result from '
              'running a tool call\\n                output = '
              'out.get(\\"output\\", \\"\\")\\n                # Optionally, '
              'print tool result or just show final answer\\n                '
              'agent_reply += f\\"\\\\n[Tool Output]: '
              '{output}\\"\\n                '
              'context.push_custom(out)\\n\\n        # Display the agent '
              'response(s)\\n        if agent_reply.strip():\\n            '
              'display(agent_reply)\\n\\n    print(\\"Goodbye.\\")\\n\\nif '
              '__name__ == \\"__main__\\":\\n    main()\\n\'\'\'\\n\\nwith '
              'open(\'agent.py\', \'w\') as f:\\n    f.write(AGENT_PY)"}',
 'call_id': 'omitted',
 'id': 'omitted',
 'name': 'py_runsource_exec',
 'status': 'completed',
 'type': 'function_call'}
Grant permission? (y/n): y
Proceeding...

Example Use

Setup

Note that in the final agent.py file, I had to manually correct the newlines which python generated becaues the LLM Agent used \n but it had to use \\n as the code goes through a Python interpreter.

I also had to do something I forgot to do. That is change inferscript.py into an actual infer.py file.

So, here is the infer.py file I forgot.

import json

import toolkit
import evaluator 
import circularcontext as cc

from pprint import pprint

def make_functioncalloutput(call_id, content):
    if content == "":
        content = "[Function tool call returned empty result.]"
    return {
        "call_id": call_id,
        "output": content,
        "type": "function_call_output"
    }

def make_functioncalloutput_denied(call_id):
    return {
        "call_id": call_id,
        "output": "[Function tool call permission denied.]",
        "type": "function_call_output"
    }

def ask_for_permission(toolcall):
    print("Permission needed to use tool call.")
    pprint(toolcall)
    while True:
        answer = input("Grant permission? (y/n): ").strip().lower()
        if answer in ('y', 'yes'):
            print("Proceeding...")
            return True
        elif answer in ('n', 'no'):
            print("Aborting function tool call.")
            return False
        else:
            print("Please enter 'y' or 'n'.")

def infer_iter(context, toolkit, evaluator):
    outputs = cc.predict(context=context, tools=toolkit.tools())
    r = outputs.copy()
    for o in outputs:
        if o['type'] == "function_call":
            if toolkit.match(o['name']) == "py_runsource_exec":
                if ask_for_permission(o):
                    args = json.loads(o['arguments'])
                    result = evaluator.runsource_exec(args['code'])
                    r.append(make_functioncalloutput(o['call_id'], result))
                else:
                    r.append(make_functioncalloutput_denied(o['call_id']))
    return r

Problems

First, I had to manually change inferscript to infer in the agent.py.

Then I had to change

        # Display the agent response(s)
        if agent_reply.strip():
            display(agent_reply)

        # Display the agent response(s)
        if agent_reply:
            display(agent_reply)

Trying agent.py

python3 agent.py

Welcome to your LLM agent. Type 'exit' to quit.

You: Use Python to calculate the gravitational acceleration at the ISS.
Permission needed to use tool call.
{'arguments': '{"code":"# Constants\\nG = 6.67430e-11  # Gravitational '
              'constant, m^3 kg^-1 s^-2\\nM_earth = 5.972e24  # Mass of the '
              'Earth, kg\\nR_earth = 6371e3     # Radius of the Earth, '
              'm\\nh_iss = 420e3        # Average altitude of ISS, m\\n\\n# '
              'Distance from the center of the Earth to the ISS\\nr_iss = '
              'R_earth + h_iss\\n\\ng_iss = G * M_earth / '
              'r_iss**2\\nprint(f\\"Gravitational acceleration at the ISS '
              'altitude: {g_iss:.2f} m/s^2\\")"}',
 'call_id': 'omitted',
 'id': 'omitted',
 'name': 'py_runsource_exec',
 'status': 'completed',
 'type': 'function_call'}
Grant permission? (y/n): y
Proceeding...

=== Agent ===

[Tool Output]: Gravitational acceleration at the ISS altitude: 8.64 m/s^2


You: exit
Goodbye.

February 22, 2026

How an Agent Works: The Infer Function
To this end, I have shown the circular context, the toolkit, and the evaluator. Now is the time to use these three components.

In this chapter, I show infer, a function that uses the circular context, toolkit and evaluator. I do not automate everything. I still manually manage the messages, but I introduce infer to interact with the three components. I name the function infer for two reasons. One, it is similar to inference. Two, it is similar to inferrer, just like how eval is similar to evaluator.

This chapter has three sections:
- What is Needed to Make an LLM Agent? – self-explanatory,
- Basic Implementation – the infer script in Python,
- Interactive Use – an ipython session to show usage.
Here are my two goals for this chapter:
- define and implement the infer function, and
- illustrate it’s use for an LLM Agent.
Note that I will not make an LLM Agent in this chapter.

What is Needed to Make an LLM Agent?

What is an LLM?

Loosely speaking, an LLM takes a text sequence (as input) and returns another text sequence that completes the prior. These, LLMs, are trained to model the probability distribution of a text given some prior text. Or in other words, an LLM can be thought of as a predict function, that takes text, of size N, and returns text ,of size N + k, such that the size N + k is less than the LLM context size M, which is defined by the API model.

It is also worth noting that an LLM API is a non-deterministic process, as the same input may not return the same output, due to output sampling and parallelism.
```
LLM Context size M
Index 0 1 2 3 4 5 6 7 8 9 ... N ... k < M
Value ? ? ? ? ? ? ? ? ? ? ... ? ... ?
```
What is an LLM Agent?

“An LLM agent runs tools in a loop to achieve a goal”. Reference.

Tools are external capabilities the LLM can use beyond just generating text. Recall that the LLM API accepts a tools field in the JSON input data. When you provide the API tools, the server constructs a special system prompt. The prompt is designed to instruct the model to use the specified tool(s).

For example, here is the Claude API example prompt that is constructed for tool use.
```
In this environment you have access to a set of tools you can use to answer the
user's question.
{{ FORMATTING INSTRUCTIONS }}
String and scalar parameters should be specified as is, while lists and objects
should use JSON format. Note that spaces for string values are not stripped.
The output is not expected to be valid XML and is parsed with regular
expressions.
Here are the functions available in JSONSchema format:
{{ TOOL DEFINITIONS IN JSON SCHEMA }}
{{ USER SYSTEM PROMPT }}
{{ TOOL CONFIGURATION }}
```
The loop usually looks like:
- Observe – look at the current state, user requests, or previous results,
- Think – reason about what to do next,
- Act – call a tool,
- Observe,
- Repeat – until a gool is reached.
I will not implement an LLM Agent in this chapter, as I will not implement this loop. Instead, I will do things manually in an ipython session.

Note that I define only one tool for the LLM API to use: py_runsource_exec.

Building blocks

The Infer Function

The essential part of the LLM Agent will be a function that I call infer.

Function infer takes a context, a toolkit, and an evaluator as inputs. It returns a new context as the output.
```
infer: context toolkit evaluator -> context
```
The infer function:
- interacts with the LLM API,
- performs tool calls,
- interacts with the evaluator, and
- creates a new output context.
Context

A context is a list (data structure) that contains input / output objects the LLM API can understand. The objects are:
- EasyInputMessage,
- ResponseOutputMessage,
- FunctionCallOutput, and
- ResponseFunctionToolCall.
Toolkit

A toolkit is a data structure that associates tools with programs of the evaluator. It does not perform a tool call.

The only tool defined is the tool that interacts with the evaluator.

Evaluator

An evaluator is a program that takes a program as input and returns a value as output. It has an implicit environment that associates variables with values. It does not perform a tool call, the inferrer does. But, the evaluator provides the functionality for performing the tool call.
```
LLM API <--> infer <--> evaluator <--> Computer System
```
Basic Implementation of the Infer Function

Setup
```
mkdir infer && cd infer
python -m venv venv
source venv/bin/activate
pip install requests ipython 
```
I will write all code into a single file named inferscript.py.

The Infer Script
```
import json

import toolkit
import evaluator 
import circularcontext as cc

from pprint import pprint

tk = toolkit.Toolkit()
e = evaluator.Evaluator()
c = cc.CircularContext()

toolkit.add_py_repl_tools(tk)

def make_functioncalloutput(call_id, content):
    if content == "":
        content = "[Function tool call returned empty result.]"
    return {
        "call_id": call_id,
        "output": content,
        "type": "function_call_output"
    }

def make_functioncalloutput_denied(call_id):
    return {
        "call_id": call_id,
        "output": "[Function tool call permission denied.]",
        "type": "function_call_output"
    }

def ask_for_permission(toolcall):
    print("Permission needed to use tool call.")
    pprint(toolcall)
    while True:
        answer = input("Grant permission? (y/n): ").strip().lower()
        if answer in ('y', 'yes'):
            print("Proceeding...")
            return True
        elif answer in ('n', 'no'):
            print("Aborting function tool call.")
            return False
        else:
            print("Please enter 'y' or 'n'.")


def infer_iter(context, toolkit, evaluator):
    outputs = cc.predict(context=context, tools=toolkit.tools())
    r = outputs.copy()
    for o in outputs:
        if o['type'] == "function_call":
            if toolkit.match(o['name']) == "py_runsource_exec":
                if ask_for_permission(o):
                    args = json.loads(o['arguments'])
                    result = evaluator.runsource_exec(args['code'])
                    r.append(make_functioncalloutput(o['call_id'], result))
                else:
                    r.append(make_functioncalloutput_denied(o['call_id']))
    return r
```
Interactive Use

Getting Started

Make sure that:
- the terminal is in the proper directory,
- the Python virtual environment is activated,
- files evaluator.py, replscript.py, toolkit.py, and circularcontext.py are in the proper directory.
- the proper code is in the inferscript.py file,
Start an iPython session.
```
export OPENAI_API_KEY="your api key..."
ipython
```
Load the code.
```
In [1]: load "inferscript.py"
```
Push the task for the LLM.
```
In [3]: c.push_easy_input_message("Use Python to calculate the gravitational
      ⋮ acceleration at the ISS")
```
Create the first response.
```
In [4]: r1 = infer_iter(c.to_list(), tk, e)
Permission needed to use tool call.
{'arguments': '{"code":"# Gravitational acceleration formula: g = G * M / '
              'r^2\\n# G is the gravitational constant, M is mass of Earth, r '
              "is distance from Earth's center to ISS\\nG = 6.67430e-11  # m^3 "
              'kg^-1 s^-2\\nM = 5.972e24      # kg\\nR_earth = 6371e3  # '
              "Earth's radius in meters\\nh_ISS = 408e3     # ISS altitude "
              "above Earth's surface in meters\\nr = R_earth + "
              'h_ISS\\n\\ng_iss = G * M / r**2\\nprint(f\\"Gravitational '
              'acceleration at ISS altitude: {g_iss:.2f} m/s^2\\")"}',
 'call_id': (omitted),
 'id': (omitted),
 'name': 'py_runsource_exec',
 'status': 'completed',
 'type': 'function_call'}
Grant permission? (y/n): y
Proceeding...
```
Push the returned context to the CircularContext instance in variable c.
```
In [5]: for r in r1:
   ...:     c.push_custom(r)
   ...: 
```
Let the LLM know about the function call output value.
```
In [6]: r2 = infer_iter(c.to_list(), tk, e)
```
No permission request was needed. This means that no function tool call was used. Let us see the response.
```
In [7]: r2
Out[7]: 
[{'id': (omitted),
  'type': 'message',
  'status': 'completed',
  'content': [{'type': 'output_text',
    'annotations': [],
    'logprobs': [],
    'text': "The gravitational acceleration at the altitude of the
    International Space Station (ISS) is approximately 8.67 m/s². This is only
    slightly less than the acceleration due to g ravity at Earth's surface
    (about 9.81 m/s²)."}],
  'role': 'assistant'}]
```
That is the solution to the task. The LLM used the evaluator tool to solve the task. Note that this is not a proper LLM Agent, as the looping part is missing. The reason is that I have not come up with a user interface to control the LLM Agent.
February 21, 2026

Making a REPL with an Evaluator

In the previous chapter, I have shown a Toolkit component. The Toolkit contains definitions for function tools for the LLM API. But it does not explicitly perform a tool call (that will be done in later chapters).

In this chapter, I show the evaluator component. The evaluator is a program to which you can send program code. A code interpreter. But the focus here is an evaluator that the LLM can interact with. The LLM sends a function tool call to interact with the evaluator.

Here are my two goals for this chapter. The evaluator must:

manage a separate process for a Python interpreter,
provide a method to send code to the interpreter and return as a string the output of the stdout and stderr.

The evaluator is a complex topic. Perhaps it is best to subdivide the problem. I thought about it for some time, and came up with the following subproblems:

echo script,
base64-encoded chunks echo script, and
interactive interpreter script.

Each subproblem is dealt with separately. But the last section shows a final evaluator implementation.

Subproblem: Echo Script

An echo script reads input from stdin. Then it prints back the output. The exact same output is printed to stdout.

EchoScript

Here is the code for the echoscript.py file.

import sys

while True:
    for line in sys.stdin:
        sys.stdout.write(line)
        sys.stdout.flush()

(Note: SIGTERM will terminate a Python process running a forever loop. That is, unless the signal handler is overriden or interrupts are disabled.)

EchoEvaluator

The following code is written in an echoevaluator.py file.

import subprocess
import sys

class EchoEvaluator:
    def __init__(self, python_executable=None, script_path="echoscript.py"):
        if python_executable is None:
            python_executable = sys.executable
        self.p = subprocess.Popen(
            [python_executable, script_path],
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
            encoding="utf-8"
        )

    def _print(self, chunk: str):
        self.p.stdin.write(chunk + "\n")
        self.p.stdin.flush()

    def echo(self, code):
        self._print(f"{code}")
        return self.p.stdout.readline()

    def __del__(self):
        self.p.terminate()

Example Use

In [1]: load "echoevaluator.py"

In [2]: # %load "echoevaluator.py"

In [3]: e = EchoEvaluator()

In [4]: e.echo("print this")
Out[4]: 'print this\n'

Subproblem: Encoded Chunk Echo Script

An encoded chunk echo script is like an echo script, but the text is divided into base64-encoded chunks.

Base64 Encoded Chunks

Here is how to encode a string into base64 and split it into chunks of three letters.

In [1]: import base64

In [2]: base64.b64encode("Test string".encode("utf-8")).decode("ascii")
Out[2]: 'VGVzdCBzdHJpbmc='

In [3]: encoded = base64.b64encode("print this".encode("utf-8")).decode("ascii")

In [4]: chunks = [
   ...:     encoded[i: i + 3]
   ...:     for i in range(0, len(encoded), 3)
   ...: ]

In [5]: chunks
Out[5]: ['cHJ', 'pbn', 'Qgd', 'Ghp', 'cw=', '=']

ChunkEvaluator

The ChunkEvaluator class is similar to EchoEvaluator. Except it includes a method to encode the chunks.

import subprocess
import sys
import base64

class ChunkEvaluator:
    def __init__(self, python_executable=None, script_path="chunkscript.py"):
        if python_executable is None:
            python_executable = sys.executable
        self.p = subprocess.Popen(
            [python_executable, script_path],
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
            encoding="utf-8"
        )

    def _print(self, chunk: str):
        self.p.stdin.write(chunk + "\n")
        self.p.stdin.flush()

    def _chunk_encode(self, code, size=128):
        e = base64.b64encode(code.encode("utf-8")).decode("ascii")
        chunks = [
            e[i : i + size]
            for i in range(0, len(e), size)
        ]
        return chunks

    def echo(self, code):
        chunks = self._chunk_encode(code, size=3)
        self._print(f"chunks {len(chunks)}")
        for c in chunks:
            self._print(f"{c}")
        o = []
        for c in chunks:
            o.append(self.p.stdout.readline())
        return o

    def __del__(self):
        self.p.terminate()

ChunkEvaluator with EchoScript

First, I copy echoscript.py to chunkscript.py.

cp echoscript.py chunkscript.py
ipython

In [1]: load "chunkevaluator.py"

In [2]: # %load "chunkevaluator.py"

In [3]: e = ChunkEvaluator()

In [4]: e.echo("print this")
Out[4]: ['chunks 6\n', 'cHJ\n', 'pbn\n', 'Qgd\n', 'Ghp\n', 'cw=\n']

With the EchoScript, I confirm the first item to be the string “chunk 6”. All the other items are base64 encoded chunks, which is correct. Now it is time to write the proper ChunkScript.

ChunkEvaluator with ChunkScript

The following code is written to the chunkscript.py file (overwriting all content).

import sys
import base64

def read_chunks(num):
    for i in range(num):
        line = sys.stdin.readline()
        sys.stdout.write(line)
        sys.stdout.flush()
    state = "Idle"

while True:
    line = sys.stdin.readline()
    keyword, value = line.split()
    read_chunks(int(value))

In [1]: load "chunkevaluator.py"

In [2]: # %load "chunkevaluator.py"

In [3]: e = ChunkEvaluator()

In [4]: e.echo("print this")
Out[4]: ['cHJ\n', 'pbn\n', 'Qgd\n', 'Ghp\n', 'cw=\n', '=\n']

Subproblem: Interactive Interpreter

The next subproblem to tackle is the InteractiveInterpreter, a class defined by the Python code module.

What is the Interactive Interpreter?

The Python code module defines a class named InteractiveInterpreter. It is used to implement read-eval-print loops in Python. You can use it to build an interactive REPL, exactly what is needed for the evaluator.

Here is what help(code.InteractiveInterpreter) says.

class InteractiveInterpreter(builtins.object)
 |  InteractiveInterpreter(locals=None)
 |
 |  Base class for InteractiveConsole.
 |
 |  This class deals with parsing and interpreter state (the user's
 |  namespace); it doesn't deal with input buffering or prompting or
 |  input file naming (the filename is always passed in explicitly).

Method runsource takes source code as input and evaluates / executes it.

 |  runsource(self, source, filename='<input>', symbol='single')
 |      Compile and run some source in the interpreter.
 |
 |      Arguments are as for compile_command().
 |
 |      One of several things can happen:
 |
 |      1) The input is incorrect; compile_command() raised an
 |      exception (SyntaxError or OverflowError).  A syntax traceback
 |      will be printed by calling the showsyntaxerror() method.
 |
 |      2) The input is incomplete, and more input is required;
 |      compile_command() returned None.  Nothing happens.
 |
 |      3) The input is complete; compile_command() returned a code
 |      object.  The code is executed by calling self.runcode() (which
 |      also handles run-time exceptions, except for SystemExit).
 |
 |      The return value is True in case 2, False in the other cases (unless
 |      an exception is raised).  The return value can be used to
 |      decide whether to use sys.ps1 or sys.ps2 to prompt the next
 |      line.

Runsource Output Examples

In [1]: import code

In [2]: ii = code.InteractiveInterpreter()

Case 1: incorrect input

In [3]: ii.runsource("int(\"hello\")")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /usr/lib/python3.12/code.py:90, in InteractiveInterpreter.runcode(self, code)
     78 """Execute a code object.
     79 
     80 When an exception occurs, self.showtraceback() is called to
   (...)     87 
     88 """
     89 try:
---> 90     exec(code, self.locals)
     91 except SystemExit:
     92     raise

File <input>:1

ValueError: invalid literal for int() with base 10: 'hello'
Out[3]: False

Case 2: correct but incomplete input.

In [4]: ii.runsource("print(")
Out[4]: True

Note the return value is True. Nothing happened. Sending more code does not complete the input.

In [5]: ii.runsource("\"hello\")")
  File <input>:1
    "hello")
           ^
SyntaxError: unmatched ')'

Out[5]: False

Case 3: correct and complete input.

In [6]: ii.runsource("print(\"hello\")")
hello
Out[6]: False

In [7]: ii.runsource("print")
Out[7]: <function print(*args, sep=' ', end='\n', file=None, flush=False)>
Out[7]: False

Defining a variable:

In [10]: ii.runsource("x = 12")
Out[10]: False

In [11]: ii.runsource("print(f\"X: {x}\")")
X: 12
Out[11]: False

The Runsource Symbol Argument

Method runsource accepts one more argument which I did not mention so far. That argument is called symbol and it takes one of three values:

‘single’,
‘exec’, or
‘eval’.

Perhaps it is best to see some examples to show how to use the argument.

Function Call Examples

In [1]: import code

In [2]: ii = code.InteractiveInterpreter()

In [3]: multi = """
   ...: def hello():
   ...:     print("Hello World")
   ...:     return 10
   ...: hello()
   ...: """

In [4]: single = "hello()"

In [5]: ii.runsource(multi, symbol='exec')
Hello World
Out[5]: False

In [6]: ii.runsource(single, symbol='exec')
Hello World
Out[6]: False

In [7]: ii.runsource(single, symbol='eval')
Hello World
Out[7]: False

In [8]: ii.runsource(single, symbol='single')
Hello World
Out[8]: 10
Out[8]: False

Only ‘single’ returned the result. Note that calling runsource with multi as the source argument is only error-free with the symbol argument set to exec.

Symbol value ‘exec’ means the source code input is treated like a Python script. It can contain definitions and multiple-block lines. But it cannot return a result.

Symbol value ‘eval’ means the source code input is treated as exactly one Python expression. It cannot contain multiple expressions.

Unexpected Behavior: Backslashes

Writing in the ipython REPL also creates unexpected errors.

In [1]: import code

In [2]: ii = code.InteractiveInterpreter()

In [3]: source = """
   ...: print("Hello \n World!")
   ...: """

In [4]: ii.runsource(source)
  File <input>:2
    print("Hello
          ^
SyntaxError: unterminated string literal (detected at line 2)

Out[4]: False

Backslashes are the most common issue because Python treats them as escape characters. The correct version is here.

In [5]: source = r"""print("Hello \n World!")"""

In [6]: ii.runsource(source)
Hello 
 World!
Out[6]: False

In [7]: source = r"""
   ...: print("Hello \n World!")
   ...: """

In [8]: ii.runsource(source)
Hello 
 World!
Out[8]: False

Implementing the Evaluator and ReplScript

I take the ideas shown in the subproblems, and merge them into the evaluator code and the replscript code.

The Evaluator

import base64
import subprocess
import sys

class Evaluator():
    def __init__(self, python_executable=None, script_path="replscript.py"):
        if python_executable is None:
            python_executable = sys.executable
        self.p = subprocess.Popen(
            [python_executable, script_path],
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
            encoding="utf-8"
        )

    def _print(self, chunk: str):
        self.p.stdin.write(chunk + "\n")
        self.p.stdin.flush()

    def _input(self):
        return self.p.stdout.readline().strip()

    def _chunk_encode(self, code, size=128):
        e = base64.b64encode(code.encode("utf-8")).decode("ascii")
        chunks = [
            e[i : i + size]
            for i in range(0, len(e), size)
        ]
        return chunks

    def _chunk_decode(self, chunks):
        b64_data = "".join(chunks)
        decoded = base64.b64decode(b64_data.encode("utf-8"))
        return decoded.decode("utf-8")

    def runsource_exec(self, code):
        self._print(f"symbol exec")
        return self._runsource(code)

     def runsource_single(self, code):
        self._print(f"symbol single")
        return self._runsource(code)

    def _runsource(self, code):
        chunks = self._chunk_encode(code)
        self._print(f"chunks {len(chunks)}")
        for c in chunks:
            self._print(f"{c}")
        o = []
        keyword, value = self._input().split()
        for i in range(int(value)):
            o.append(self.p.stdout.readline().strip())
        return self._chunk_decode(o)

    def __del__(self):
        self.p.terminate()

The ReplScript

import sys
import base64
import io
import code
import inspect
import re

from contextlib import redirect_stdout, redirect_stderr

ANSI_RE = re.compile(r'\x1b\[[0-?]*[ -/]*[@-~]')

def run_sources_captured(ii, source, symbol):
    out = io.StringIO()
    err = io.StringIO()
    res = io.StringIO()

    # Custom displayhook to capture expression results
    def custom_displayhook(value):
        if value is not None:
            if callable(value):
                try:
                    sig = inspect.signature(value)
                    print(f"<function {value.__name__}{sig}>", file=res)
                except (ValueError, TypeError):
                    print(repr(value), file=res)
            else:
                print(repr(value), file=res)

    old_displayhook = sys.displayhook
    sys.displayhook = custom_displayhook

    try:
        with redirect_stdout(out), redirect_stderr(err):
            more = ii.runsource(source, symbol=symbol)
            if more:
                res.write("[incomplete input]\n")
    finally:
        sys.displayhook = old_displayhook

    output = out.getvalue() + err.getvalue() + res.getvalue()
    return ANSI_RE.sub('', output)

def chunk_encode(code, size=128):
    e = base64.b64encode(code.encode("utf-8")).decode("ascii")
    chunks = [
        e[i : i + size]
        for i in range(0, len(e), size)
    ]
    return chunks

def chunk_decode(chunks):
    b64_data = "".join(chunks)
    decoded = base64.b64decode(b64_data.encode("utf-8"))
    return decoded.decode("utf-8")

def read_chunks(num):
    chunks = []
    for i in range(num):
        line = sys.stdin.readline()
        chunks.append(line)
    return chunks

def write_chunks(chunks):
    sys.stdout.write(f"chunks {len(chunks)}" + "\n")
    for c in chunks:
        sys.stdout.write(c + "\n")
    sys.stdout.flush()

while True:
    line = sys.stdin.readline()
    keyword, value = line.split()
    symbol = value

    line = sys.stdin.readline()
    keyword, value = line.split()
    num_chunks = value

    chunks = read_chunks(int(value))
    decoded = chunks_decode(chunks)

    output = []
    if symbol == "single"
        output = run_sources_captured(ii, decoded, symbol)
    else:
        output = run_sources_captured(ii, decoded, 'exec')

    chunks = chunk_encode(output)

Interactive Use

In [1]: import evaluator

In [2]: e = evaluator.Evaluator()

In [3]: e.runsource_single("print(\"hello world\")")
Out[3]: 'hello world\n'

February 20, 2026

Defining Tools with a Toolkit

In the previous chapter, I have shown a Python program to manually exchange messages with an LLM API. I introduced a data structure that contains inputs and outputs to interact with the LLM.

In this chapter, I introduce the Toolkit component. The Toolkit contains definitions for function tools for the LLM API. But it does not explicitly perform a tool call (that will be done in later chapters).

Here are my two goals for this chapter. The Toolkit, that I implement, must:

define a Python REPL tool call for an LLM API,
export all tools as a list that is ready to be sent to an LLM API.

Implementing the Toolkit

I will step back for a moment and consider how do LLMs use tools. It is useful to keep that in mind while implementing the toolkit.

How do LLMs Use Tools?

Recall that the LLM API accepts a tools field in the JSON input data. When you provide the API tools, the server constructs a special system prompt. The prompt is designed to instruct the model to use the specified tool(s).

For example, here is the Claude API example prompt that is constructed for tool use.

In this environment you have access to a set of tools you can use to answer the
user's question.
{{ FORMATTING INSTRUCTIONS }}
String and scalar parameters should be specified as is, while lists and objects
should use JSON format. Note that spaces for string values are not stripped.
The output is not expected to be valid XML and is parsed with regular
expressions.
Here are the functions available in JSONSchema format:
{{ TOOL DEFINITIONS IN JSON SCHEMA }}
{{ USER SYSTEM PROMPT }}
{{ TOOL CONFIGURATION }}

Setup

mkdir infer_tk && cd infer_tk
python3 -m venv venv
source venv/bin/activate
pip3 install ipython
ipython

Starting with the Class

The basic Toolkit I will implement will not be a function, but a class. This is because it has state. (Though, functions can have state, but that is not on the agenda here.)

To store state, the Toolkit class keeps a variable named table.

In [1]: class Toolkit:
   ...:     def __init__(self):
   ...:         self.table = {}
   ...: 

In [2]: tk = Toolkit()

In [3]: tk.table
Out[3]: {}

Variable table is a Python dictionary. Inside of it, will be tool definitions for the LLM API. But, first, I have to recall the schemas for those definitions.

API Tools Input Schema

Recall the tools schema for the OpenAI API.

{
    ...
    "tools": [ properties ...]
    ...
}

The properties schema contains a definition of one function tool.

{
    "name": string,
    "type": "function",
    "description": string,
    "parameters": parameters
}

The parameters schema contains the definitions for all arguments.

{
    "type": "object",
    "properties": {
        arg: {
            "type": string,
            "description": string
        }, ...
    },
    "required": [ strings ... ]
}

The objects placed in the toolkit table shall follow the schemas for:

properties, and
parameters.

Defining Tools

I wish for a method to add new tools to the Toolkit.

In [5]: class Toolkit:
   ...:     def __init__(self):
   ...:         self.table = {}
   ...:     def deftool(self, name, description, parameters):
   ...:         if name in self.table:
   ...:             raise ValueError(f"Tool '{name}' already defined.")
   ...:         self.table[name] = {
   ...:             "name": name,
   ...:             "type": "function",
   ...:             "description": description,
   ...:             "parameters": parameters
   ...:         }
   ...:

In [6]: tk = Toolkit()

In [7]: tk.deftool("name0", "desc0", {})

In [8]: tk.table
Out[8]: 
{'name0': {'name': 'name0',
  'type': 'function',
  'description': 'desc0',
  'parameters': {}}}

That is weird. Maybe I need to define a variable and set it’s value into the dictionary? Also it seems better to return the result.

In [11]: class Toolkit:
    ...:     def __init__(self):
    ...:         self.table = {}
    ...:     def deftool(self, name, description, parameters):
    ...:         if name in self.table:
    ...:             raise ValueError(f"Tool '{name}' already defined.")
    ...:         r = {
    ...:             "name": name,
    ...:             "type": "function",
    ...:             "description": description,
    ...:             "parameters": parameters
    ...:         }
    ...:         self.table[name] = r
    ...:         return r
    ...:

In [12]: tk = Toolkit()

In [13]: tk.deftool("name0", "desc0", {})
Out[13]: 
{'name': 'name0', 
 'type': 'function',  
 'description': 'desc0', 
 'parameters': {}}

That worked.

Defining an Evaluator Tool

The requirement is an evaluator tool. A Python REPL for the LLM.

The function tool that I define here (named py_repl_runsource) is not the function tool used in the final implementation of Toolkit. The reason is that there are two function tools defined and printing and typing all of that is bothersome. I only define py_repl_runsource to illustrate how it is done.

In [21]: def add_py_repl_tools(toolkit):
    ...:     p = {
    ...:         "type": "object",
    ...:         "properties": {
    ...:             "code": {
    ...:                 "type": "string",
    ...:                 "description": "Python code to execute."
    ...:             }
    ...:         },
    ...:         "required": ["code"]
    ...:     }
    ...:     d = (
    ...:         "Execute Python code script in a persistent environment. "
    ...:         "You must explicitly print evaluation results. "
    ...:         "Returns the stdout and stderr output as one string. "
    ...:     )
    ...:     return toolkit.deftool("py_runsource_exec", d, p)
    ...:

In [22]: tk = Toolkit()

In [23]: add_py_repl_tools(tk)
Out[23]: 
{'name': 'py_runsource_exec',
 'type': 'function',
 'description': 'Execute Python code script in a persistent enviornment. 
You must explicitly print evaluation results. Returns the stdout and stderr
output as one string.',
 'parameters': {'type': 'object',
  'properties': {'code': {'type': 'string',
    'description': 'Python code to execute.'}},
  'required': ['code']}}

Converting to Tools

The second requirement is that the tools is a list.

This means that the tools dictionary, inside the Toolkit, must be converted to a list.

In [24]: class Toolkit:
    ...:     def __init__(self):
    ...:         self.table = {}
    ...:     def deftool(self, name, description, parameters):
    ...:         if name in self.table:
    ...:             raise ValueError(f"Tool '{name}' already defined.")
    ...:         r = {
    ...:             "name": name,
    ...:             "type": "function",
    ...:             "description": description,
    ...:             "parameters": parameters
    ...:         }
    ...:         self.table[name] = r
    ...:         return r
    ...:     def tools(self):
    ...:         return [x for x in self.table.values()]
    ...:     def match(self, name):
    ...:         if name in self.table:
    ...:             return name
    ...:         else:
    ...:             return False
    ...:

In [25]: tk = Toolkit()

In [26]: add_py_repl_tools(tk)
Out[26]: 
{'name': 'py_runsource_exec',
 'type': 'function',
 'description': 'Execute Python code script in a persistent enviornment. 
You must explicitly print evaluation results. Returns the stdout and stderr
output as one string.',
 'parameters': {'type': 'object',
  'properties': {'code': {'type': 'string',
    'description': 'Python code to execute.'}},
  'required': ['code']}}

In [27]: tk.tools()
Out[27]: 
[{'name': 'py_runsource_exec',
  'type': 'function',
  'description': 'Execute Python code script in a persistent enviornment. 
 You must explicitly print evaluation results. Returns the stdout and stderr
 output as one string.',
  'parameters': {'type': 'object',
   'properties': {'code': {'type': 'string',
     'description': 'Python code to execute.'}},
   'required': ['code']}}]

In [28]: tk.match("py_repl_runsource")
Out[28]: 'py_repl_runsource'

In [29]: tk.match("must be False")
Out[29]: False

Toolkit Code

class Toolkit:
    def __init__(self):
        self.table = {}

    def deftool(self, name, description, parameters):
        if name in self.table:
            raise ValueError(f"Tool '{name}' already defined.")
        r = {
            "name": name,
            "type": "function",
            "description": description,
            "parameters": parameters
        }
        self.table[name] = r
        return r

    def tools(self):
        return [x for x in self.table.values()]

    def match(self, name):
        if name in self.table:
            return name
        else:
            return False

def add_py_repl_tools(toolkit):
    p1 = {
        "type": "object",
        "properties": {
            "code": {
                "type": "string",
                "description": "Python code to execute."
            }
        },
        "required": ["code"]
    }
    d1 = (
        "Execute Python code script in a persistent environment. "
        "You must explicitly print evaluation results. "
        "Returns the stdout and stderr output as one string. "
    )
    o = [
        toolkit.deftool("py_runsource_exec", d1, p1)
    ]
    return o

February 19, 2026

Manual Chat Program with a Circular Context

In the previous chapter, I have shown how to exchange data with an LLM API. I used CURL as the HTTP client.

In this chapter, I replace CURL with a Python program. That does not mean that everything is automated. No, I still manually manage the messages, but I introduce a data structure that contains inputs and outputs to interact with the LLM. I name the data structure a circular context, and base it on a circular buffer.

This chapter has three sections:

Limiting the context – a circular buffer limits the number of messages,
Python implementation – the program code in Python,
Interactive use – an iPython session to show usage.

Limiting the Context

Problem Definition

Loosely speaking, an LLM takes as input a text sequence and returns as output another text sequence that completes the prior. These, LLMs, are trained to do so, by modelling the probability distribution of a text given some prior text. Or in other words, an LLM can be thought of as a predict function, that takes text of size N and returns text of size N + k, such that N + k is less than the LLM context size M, which is defined by the API model.

LLM Context size M
Index 0 1 2 3 4 5 6 7 8 9 ... N ... k < M
Value ? ? ? ? ? ? ? ? ? ? ... ? ... ?

Circular Buffer Definition

To limit the number of messages, and thus to never reach the LLM context size, I use a circular buffer data structure. For simplicity, I do not count the number of tokens.

A circular buffer (CB), limited to k items, is either:

the empty CB (of size n = 0), or
a CB of size n < k, formed by adding a new item to the front of a CB of size n - 1 < k.
a CB of size n = k, formed by adding a new item to the front of a CB of size n - 1, which is formed by removing an item from the back of a CB of size n = k.

How do you determine k? Randomly. I have not thought of a heuristic. So, I randomly picked 19, which is the 8th prime number, as the default value.

Python Implementation

Code Overview

There are four concepts I use in the code implementation:

circular buffer,
circular context,
context, and
predict.

Circular buffer is the data structure defined in the previous section. It stores LLM specific input and output objects. It is the essential part of the circular context.

Circular context is a data structure that hides the circular buffer. It defines methods to push new LLM specific objects, a clear() to remove all objects, and a to_list() method.

Context is a list data structure. The only difference is, it stores specific LLM API objects. These objects are the very same objects I have shown in the previous chapter to interact with the LLM. Namely: EasyInputMessage and ResponseOutputText.

CircularBuffer <--> CircularContext <--> Context <--> LLM API
    Class                Class            List         JSON

Lastly, predict is the main function that takes a context as input and returns the output of the LLM. It does not (to this end) return a new context.

Setup

mkdir llm_api_prog && cd llm_api_prog
python3 -m venv venv
source venv/bin/activate
pip install requests ipython

I will write all code into a single file circularcontext.py.

Dependencies

Because the LLM API uses JSON and HTTP, you need:

a JSON package,
a HTTP package to send requests and receive responses.

import json
import requests
import os

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

Empty Request

Recall that the LLM API expects a JSON data object with fields: "model", "input", and "tools".

def openai_prepare(model, context, tools):
    return {
        "model": model,
        "input": context,
        "tools": tools
    }

Sending Requests

def openai_request(model="gpt-4.1", context=[], tools=[]):
    url = "https://api.openai.com/v1/responses"
    headers = { 
        "Authorization": f"Bearer {OPENAI_API_KEY}",
        "Content-Type": "application/json"
    }
    data = openai_prepare(model, context, tools)
    return requests.post(url, headers=headers, json=data)

Receiving Responses

def openai_response(response):
    response.raise_for_status()
    data = response.json()
    return data['output']

Note that better error handling is needed.

Predict

def predict(context=[], tools=[]):
    r = openai_response(openai_request(context=context, tools=tools))
    return r

Circular Buffer

The implementation of a circular buffer written by an LLM.

class CircularBuffer:
    def __init__(self, capacity):
        if capacity <= 0:
            raise ValueError("Capacity must be positive")

        self.capacity = capacity
        self.buffer = [None] * capacity
        self.head = 0  # points to oldest element
        self.tail = 0  # points to next write position
        self.size = 0

    def enqueue(self, item):
        """Add an element to the buffer."""
        self.buffer[self.tail] = item

        if self.size == self.capacity:
            # Buffer full → overwrite oldest
            self.head = (self.head + 1) % self.capacity
        else:
            self.size += 1

        self.tail = (self.tail + 1) % self.capacity

    def dequeue(self):
        """Remove and return the oldest element."""
        if self.size == 0:
            raise IndexError("Dequeue from empty buffer")

        item = self.buffer[self.head]
        self.buffer[self.head] = None  # Optional cleanup
        self.head = (self.head + 1) % self.capacity
        self.size -= 1

        return item

    def peek(self):
        """Return the oldest element without removing it."""
        if self.size == 0:
            raise IndexError("Peek from empty buffer")
        return self.buffer[self.head]

    def to_list(self):
        """Return elements as a standard Python list (FIFO order)."""
        result = []
        index = self.head
        for _ in range(self.size):
            result.append(self.buffer[index])
            index = (index + 1) % self.capacity
        return result

    def is_empty(self):
        return self.size == 0

    def is_full(self):
        return self.size == self.capacity

    def __len__(self):
        return self.size

    def __repr__(self):
        return f"CircularBuffer({self.to_list()})"

    def shallow_clone(self):
        """Return a shallow copy of the circular buffer."""
        cb = CircularBuffer(self.capacity)
        cb.buffer = self.buffer.copy()
        cb.head = self.head
        cb.tail = self.tail
        cb.size = self.size
        return cb

Circular Context

A context is a data structure that contains objects which are elements of the input array for the LLM API.

class CircularContext:
    def __init__(self, capacity=19):
        if capacity <= 0:
            raise ValueError("Capacity must be positive")

        self.capacity = capacity
        self.cb = CircularBuffer(self.capacity)

    def push_easy_input_message(self, content="", role="user"):
        self.cb.enqueue({"content": content, "role": role, "type": "message"})

    def push_function_call_output(self, call_id="", output=""):
        self.cb.enqueue({
            "call_id": call_id,
            "output": output,
            "type": "function_call_output"
            })

    def push_custom(self, object):
        self.cb.enqueue(object)

    def clear(self):
        self.cb = CircularBuffer(self.capacity)

    def to_list(self):
        return self.cb.to_list()

Usage

Getting Started

Make sure that:

the terminal is in the proper directory,
the Python virtual environment is activated,
the proper code is in the circularcontext.py file.

Start an iPython session.

export OPENAI_API_KEY="your api key..."
ipython

Load the code.

In [1]: load "circularcontext.py"

Sanity check the OpenAI API key.

In [3]: OPENAI_API_KEY
Out[3]: 'your api key...'

Sanity check an empty request.

In [4]: openai_prepare("gpt-4.1", [], [])
Out[4]: {'model': 'gpt-4.1', 'input': [], 'tools': []}

Sending an Easy Input Message

In [5]: cc = CircularContext()
In [6]: cc.push_easy_input_message("Hi!")

Sanity check a message.

In [7]: cc.to_list()
Out[7]: [{'content': 'Hi!', 'role': 'user', 'type': 'message'}]

In [8]: r = predict(context=cc.to_list())
In [9]: r
Out[9]:
[{'type': 'output_text',
  'annotations': [],
  'logprobs': [],
  text': 'Hello! How can I help you today? 😊'}]

Note that the output result is an array.

Merging Context

In [10]: for x in r:
             cc.push_custom(x)

In [11]: cc.push_easy_input_message("Say hi again.")

Sanity check.

In [12]: cc.to_list()
Out[12]: 
[{'content': 'Hi!', 'role': 'user', 'type': 'message'},
 {'id': (omitted),
  'type': 'message',
  'status': 'completed',
  'content': [{'type': 'output_text',
    'annotations': [],
    'logprobs': [],
    'text': 'Hello! How can I help you today? 😊'}],
  'role': 'assistant'},
 {'content': 'Say hi again.', 'role': 'user', 'type': 'message'}]

In [13]: r = predict(context=cc.to_list())

In [14]: r
Out[14]: 
[{'id': (omitted),
  'type': 'message',
  'status': 'completed',
  'content': [{'type': 'output_text',
    'annotations': [],
    'logprobs': [],
    'text': 'Hi again! 👋'}],
  'role': 'assistant'}]

February 17, 2026

The Bare Minimum to Chat with Function Calls

This is a tutorial on using the OpenAI LLM API, focusing on: messages and function calls. But, without Python, TypeScript, or some other programming language. The only requirements are CURL (an HTTP client) and an OpenAI API key.

Why Bother?

“Why waste my time, when I can just import an API package?”

Sure, that works, until you go deeper. What if…

you do not have access to / permission for / trust in the API package?
you want to avoid software bloat?
you want to understand what is happening?
you want to make your own AI Agents?

Before, there was only /chat/completions/. Now, there are /responses, function calls, tool calls, computer calls, image calls, search calls, skills, etc.

I will show the bare minimum to interact with an LLM API:

Prompt completions and context, and
Function calls.

Prompt Completions and Context

In this section, I show messaging an LLM. The provider is OpenAI at:

https://api.openai.com/v1/responses
HTTP Method: POST

Endpoint /responses accepts application/json data. Using CURL, create a POST request with JSON data.

curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{ ... json data goes here ... }'

I wish to clarify two things. Suppose I want to exchange messages with an LLM…

What JSON data do I need?
May I see an example for exchanging messages?

What JSON data do I need?

Input

Start with an empty JSON object.

{
  request data go here ...
}

Select the model:

{
  "model": "gpt-4.1"
}

Set "input" field value to an array [ ... ]. (Do not forget the comma.)

{
  "model": "gpt-4.1",
  "input": [
     array items go here ...
  ]
}

The array items will be explained now.

Input Items

The API defines many objects you can put in the “input array” [ ... ]. Far too many, to list all. Instead, I show only four. Two object types may be created by your client program:

EasyInputMessage, and
FunctionCallOutput.

Two object types may be created by the server:

ResponseOutputMessage, and
ResponseFunctionToolCall.

In this section, I show EasyInputMessage and ResponseOutputMessage types. These are enough for prompts with context. In the Function Call section, I will show FunctionCallOutput and ResponseFunctionToolCall types.

Easy Input Message (Client)

Your client program sends prompts to the LLM inside an EasyInputMessage. The prompt text goes in the "content" field.

EasyInputMessage schema:
{ 
  "content": string (this is where your prompt goes),
  "role": "user" | "assistant" | "system" | "developer",
  "type": "message"
}

Example: 
{
  "content": "This is a prompt sent to the LLM.",
  "role": "user",
  "type": "message"
}

ResponseOutputMessage (Server)

The LLM answers with a ResponseOutputMessage object type. It is more complex, when compared to EasyInputMessage. The reason is that it’s "content" field value is more complex. The value is an array that may contain two possible object types. The array tems are either a ResponseOutputRefusal type (the LLM refused to answer) or ResponseOutputText type (the LLM answered). I will first show the schema of these object types, and second, the schema for ResponseOutputMessage.

ResponseOutputRefusal schema:
{ 
  "refusal": string, 
  "type": "refusal"
}

ResponseOutputText schema:
{ 
  "annotations": [ FileCitation | URLCitation | 
                   ContainerFileCitation | FilePath ],
  "logprobs": [ logprobs object ],
  "text": string,
  "type": "output_text"
}

The ResponseOutputText schema is non-trivial. The values of the “annotations” and “logprobs” fields are complex. It is best to simply ignore them unless needed.

With that in mind, here is the schema for ResponseOutputMessage.

ResponseOutputMessage schema:
{
  "id": string,
  "content": [ ResponseOutputText | ResponseOutputRefusal ],
  "role": "assistant",
  "status": "in_progress" | "completed" | "incomplete",
  "type": "message" 
}

To show an example ResponseOutputMessage, I will make an API request and show the response.

May I see an example for echanging messages?

Sending A Single Message

The prompt is: “Hi!”.

curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
        "model": "gpt-4.1",
        "input": 
        [
          {
            "content": "Hi!", "role": "user", "type": "message"
          }
        ]
      }'

This is the value of the “output” part of the response.

...
  "output": [
    {
      "id": (ommited),
      "type": "message",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "annotations": [],
          "logprobs": [],
          "text": "Hello! How can I help you today?"
        }
      ],
      "role": "assistant"
    }
  ],
...

In this simple case, the output is an array of one item that is an object of type ResponseOutputMessage. And that object itself has a “content” field for which the value is an array of one item that is an object of type ResponseOutputText.

Creating Context

To continue the LLM conversation, you need to merge the (client) prompt and the output response (server). This is known as a context.

Copy the ResponseOutputMessage object from the “output array”.
Append a new EasyInputMessage object as the next prompt.

Make sure to add commas between the items of the “input array” when doing it manually.

curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
        "model": "gpt-4.1",
        "input": 
        [
          {
            "content": "Hi!", "role": "user", "type": "message"
          },
          {
            "id": (ommited),
            "type": "message",
            "status": "completed",
            "content": [
            {
              "type": "output_text", "annotations": [], "logprobs": [],
              "text": "Hello! How can I help you today?"
            }
            ],
            "role": "assistant"
          },
          {
            "content": "Say hi again.", "role": "user", "type": "message"
          }
        ]
      }'

Here is the output.

...
  "output": [
    {
      "id": (ommitted),
      "type": "message",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "annotations": [],
          "logprobs": [],
          "text": "Hi again!"
        }
      ],
      "role": "assistant"
    }
  ],
...

Section Summary

The LLM API accepts JSON data as an input and writes JSON data as an output.
To exchange messages, a model and an input array must be set.
The elements of the input array are JSON objects that follow the EasyInputMessage schema or ResponseOutputMessage schema.

Function Calls

Section Overview

In this section, I show how to exchange messages that are function call ready with an LLM. I will use LLMs by OpenAI, which are available at:

https://api.openai.com/v1/responses
HTTP Method: POST

Endpoint /responses accepts application/json data. Using CURL, create a POST request with JSON data.

curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{ ... json data goes here ... }'

In the previous section, I have shown that two fields are required: “input” and “output”. The values of these fields are arrays. So far, the only object types in these arrays were EasyInputMessage and ResponseOutputMessage. That will change now.

Request
{ 
  "model": "gpt-4.1",
  "input": [ ... ]
}

Response
{ ...
  "output": [ ... ]
  ...
}

Two new objects I will show now are:

FunctionCallOutput, and
ResponseFunctionToolCall.

I wish to clarify two things. Suppose I want to exchange messages with an LLM and allow it to use some function calls with my client program…

What JSON data do I need?
May I see an example for exchanging messages with function calls?

What JSON data do I need?

To exchange messages that are function call ready, set the “tools” field in the request.

{ 
  "model": "gpt-4.1",
  "input": [ ... ],
  "tools": [ ... ]
}

Tool Items

Each item, in the “tools array”, is an object { ... }. The API supports several different object types. I will only show:

FunctionTool.

A FunctionTool object has a name, a description, and parameters that the client program must set. Name names the function. Description describes the function. Parameters describes the function arguments.

FunctionTool schema:
{
  "type": "function",
  "name": string,
  "description": string,
  "parameters": object
}

Parameters are described in the “properties” field. Each parameter is yet another object.

FunctionTool Parameters schema:
{
  "type": "object",
  "properties": object,
  "required": [ strings ]
}

The value of the “required” field is an array that contains strings naming parameters that are required.

FunctionTool Parameters Properties schema:
{
  argument_name: 
  { 
    "type": argument_type,
    "description": argument_desc
  },
  ...
}

The key argument_name is a string that names the function argument. The object argument_type is a string that names the function argument type. The object argument_desc is a string that describes the function argument.

Request With Tools

Now that you have seen the structure of a FunctionTool, here is what an example request that is function call ready looks like:

{
  "model": "gpt-4.1",
  "input": 
  [
    {
      "content": "Which natural number comes after 1678931?",
      "role": "user", "type": "message"
    }
  ],
  "tools":
  [
    {
      "name": "next_natural",
      "type": "function",
      "description": "next_natural takes as input a natural number.
Returns a the first natural number that is greater than the argument.",
      "parameters": {
        "type": "object",
        "properties": {
          "number" : {
            "type": "number",
            "description": "The input natural number."
          }
        },
        "required": ["number"]
      }
    }
  ]
}

The request includes the “tools” field, for which the value is an array with exactly one FunctionTool object. When the request defines a FunctionTool, two things can happen:

the FunctionTool may be ignored, or
a response to use the FunctionTool may be created.

Your client program must support both scenarios. It may check the type of the output. If the type is a ResponseOutputMessage, the FunctionTool was ignored. If the type is a ResponseFunctionToolCall, the client must perform the function call.

In other words, the server returns a response with the “output” field value to be an array whose element is either a:

ResponseOutputMessage, or
ResponseFunctionToolCall.

Scenario A:

client --> request: EasyInputMessage and tools                 --> server
client <--                               ResponseOutputMessage <-- server

Scenario B:

client --> request: EasyInputMessage and tools                 --> server
client <--                            ResponseFunctionToolCall <-- server
client --> request: ResponseFunctionToolCallOutput and tools   --> server
client <--    ReponseOutputMessage or ResponseFunctionToolCall <-- server

ResponseFunctionToolCall (Server)

Note that the server creates this object.

ResponseFunctionToolCall schema:
{ 
  "arguments": string,
  "call_id": string,
  "name": string,
  "type": "function_call",
  "id": string,
  "status": "in_progress" | "completed" | "incomplete"
}

FunctionCallOutput (Client)

Note that the client creates this object. When creating this object, the value of the "call_id" field is copied from the matching ResponseFunctionToolCall object.

Schema:
{ 
  "call_id": string,
  "output": string | (there is more but I ignore that),
  "type": "function_call_output",
  "id": string (mostly ignore this),
  "status": "in_progress" | "completed" | "incomplete"
}

Example:
{ 
  "call_id": "call_random123", (generated by server)
  "output": "fizzbuzz",
  "type": "function_call_output",
  "id": "123456"
  "status": "completed"
}

May I see an example for exchanging messages with function calls?

Example FunctionToolCall Request

curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
        "model": "gpt-4.1",
        "input": 
        [
          {
            "content": "Which natural number comes after 1678931?",
            "role": "user", "type": "message"
          }
        ],
        "tools": 
        [
          {
            "name": "next_natural",
            "type": "function",
            "description": "next_natural takes as input a natural number.
Returns a the first natural number that is greater than the argument.",
            "parameters": {
              "type": "object",
              "properties": {
                "number" : {
                  "type": "number",
                  "description": "The input natural number."
                }
              },
              "required": ["number"]
            }
          }
        ]
      }'

Example FunctionToolCall Response

...
  "output": [
    {
      "id": (omitted),
      "type": "function_call",
      "status": "completed",
      "arguments": "{\"number\":1678931}",
      "call_id": (omitted),
      "name": "next_natural"
    }
  ],
...

Example FunctionToolCallOutput Request

curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
        "model": "gpt-4.1",
        "input":
        [
          {
            "content": "Which natural number comes after 1678931?",
            "role": "user", "type": "message"
          },
          {
            "id": (omitted),
            "type": "function_call",
            "status": "completed",
            "arguments": "{\"number\":1678931}",
            "call_id": "call_(same call id)",
            "name": "next_natural"
          },
          {
            "call_id": "call_(same call id)",
            "output": "1678932",
            "type": "function_call_output"
          }
        ],
        "tools":
        [
          {
            "name": "next_natural",
            "type": "function",
            "description": "next_natural takes as input a natural number. 
Returns the first natural number that is greater than the argument.",
            "parameters":
            {
              "type": "object",
              "properties":
              {
                "number":
                {
                  "type": "number",
                  "description": "The input natural number."
                }
              },
              "required": ["number"]
            }
          }
        ]
      }'

Example FunctionToolCallOutput Response

...
"output": [
    {
      "id": (omitted),
      "type": "message",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "annotations": [],
          "logprobs": [],
          "text": "The natural number that comes after 1,678,931 is 1,678,932."
        }
      ],
      "role": "assistant"
    }
  ],
...

Section Summary

The LLM API accepts JSON data as an input and writes JSON data as an output.
To exchange function call ready messages, a model, an input array, and a tools array must be set.
The elements of the input array are JSON objects that follow the EasyInputMessage, ResponseOutputMssage, ResponseFunctionToolCall, or ResponseFunctionToolCallOutput Schema.

February 15, 2026

Tag: LLM

The LLM Agent Wrote it’s Own Agent Loop

Write Your Own Agent Loop

What You Have

toolkit.py

evaluator.py

inferscript.py

replscript.py

circularcontext.py

What’s MISSING for a complete LLM Agent

1. Missing: LLM Prompt Cycle and Agent Logic

2. Model Results Handling

3. Function/Tool Call Parsing and Integration

4. No User Interface or Entrypoint

5. Missing Authentication/Secrets Management

6. LLM Output-Driven Memory/Context Feedback

Summary Table

What to Add

Would you like:

Example Use

Setup

Problems

Trying agent.py

How an Agent Works: The Infer Function

What is Needed to Make an LLM Agent?

What is an LLM?

What is an LLM Agent?

Building blocks

The Infer Function

Context

Toolkit

Evaluator

Basic Implementation of the Infer Function

Setup

The Infer Script

Interactive Use

Getting Started

Making a REPL with an Evaluator

Subproblem: Echo Script

EchoScript

EchoEvaluator

Example Use

Subproblem: Encoded Chunk Echo Script

Base64 Encoded Chunks

ChunkEvaluator

ChunkEvaluator with EchoScript

ChunkEvaluator with ChunkScript

Subproblem: Interactive Interpreter

What is the Interactive Interpreter?

Runsource Output Examples

Case 1: incorrect input

Case 2: correct but incomplete input.

Case 3: correct and complete input.

The Runsource Symbol Argument

Function Call Examples

Unexpected Behavior: Backslashes

Implementing the Evaluator and ReplScript

The Evaluator

The ReplScript

Interactive Use

Defining Tools with a Toolkit

Implementing the Toolkit

How do LLMs Use Tools?

Setup

Starting with the Class

API Tools Input Schema

Defining Tools

Defining an Evaluator Tool

Converting to Tools

Toolkit Code

Manual Chat Program with a Circular Context

Limiting the Context

Problem Definition

Circular Buffer Definition

Python Implementation

Code Overview

Setup

Dependencies

Empty Request

Sending Requests