Tag: API

Making a REPL with an Evaluator

In the previous chapter, I have shown a Toolkit component. The Toolkit contains definitions for function tools for the LLM API. But it does not explicitly perform a tool call (that will be done in later chapters).

In this chapter, I show the evaluator component. The evaluator is a program to which you can send program code. A code interpreter. But the focus here is an evaluator that the LLM can interact with. The LLM sends a function tool call to interact with the evaluator.

Here are my two goals for this chapter. The evaluator must:

manage a separate process for a Python interpreter,
provide a method to send code to the interpreter and return as a string the output of the stdout and stderr.

The evaluator is a complex topic. Perhaps it is best to subdivide the problem. I thought about it for some time, and came up with the following subproblems:

echo script,
base64-encoded chunks echo script, and
interactive interpreter script.

Each subproblem is dealt with separately. But the last section shows a final evaluator implementation.

Subproblem: Echo Script

An echo script reads input from stdin. Then it prints back the output. The exact same output is printed to stdout.

EchoScript

Here is the code for the echoscript.py file.

import sys

while True:
    for line in sys.stdin:
        sys.stdout.write(line)
        sys.stdout.flush()

(Note: SIGTERM will terminate a Python process running a forever loop. That is, unless the signal handler is overriden or interrupts are disabled.)

EchoEvaluator

The following code is written in an echoevaluator.py file.

import subprocess
import sys

class EchoEvaluator:
    def __init__(self, python_executable=None, script_path="echoscript.py"):
        if python_executable is None:
            python_executable = sys.executable
        self.p = subprocess.Popen(
            [python_executable, script_path],
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
            encoding="utf-8"
        )

    def _print(self, chunk: str):
        self.p.stdin.write(chunk + "\n")
        self.p.stdin.flush()

    def echo(self, code):
        self._print(f"{code}")
        return self.p.stdout.readline()

    def __del__(self):
        self.p.terminate()

Example Use

In [1]: load "echoevaluator.py"

In [2]: # %load "echoevaluator.py"

In [3]: e = EchoEvaluator()

In [4]: e.echo("print this")
Out[4]: 'print this\n'

Subproblem: Encoded Chunk Echo Script

An encoded chunk echo script is like an echo script, but the text is divided into base64-encoded chunks.

Base64 Encoded Chunks

Here is how to encode a string into base64 and split it into chunks of three letters.

In [1]: import base64

In [2]: base64.b64encode("Test string".encode("utf-8")).decode("ascii")
Out[2]: 'VGVzdCBzdHJpbmc='

In [3]: encoded = base64.b64encode("print this".encode("utf-8")).decode("ascii")

In [4]: chunks = [
   ...:     encoded[i: i + 3]
   ...:     for i in range(0, len(encoded), 3)
   ...: ]

In [5]: chunks
Out[5]: ['cHJ', 'pbn', 'Qgd', 'Ghp', 'cw=', '=']

ChunkEvaluator

The ChunkEvaluator class is similar to EchoEvaluator. Except it includes a method to encode the chunks.

import subprocess
import sys
import base64

class ChunkEvaluator:
    def __init__(self, python_executable=None, script_path="chunkscript.py"):
        if python_executable is None:
            python_executable = sys.executable
        self.p = subprocess.Popen(
            [python_executable, script_path],
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
            encoding="utf-8"
        )

    def _print(self, chunk: str):
        self.p.stdin.write(chunk + "\n")
        self.p.stdin.flush()

    def _chunk_encode(self, code, size=128):
        e = base64.b64encode(code.encode("utf-8")).decode("ascii")
        chunks = [
            e[i : i + size]
            for i in range(0, len(e), size)
        ]
        return chunks

    def echo(self, code):
        chunks = self._chunk_encode(code, size=3)
        self._print(f"chunks {len(chunks)}")
        for c in chunks:
            self._print(f"{c}")
        o = []
        for c in chunks:
            o.append(self.p.stdout.readline())
        return o

    def __del__(self):
        self.p.terminate()

ChunkEvaluator with EchoScript

First, I copy echoscript.py to chunkscript.py.

cp echoscript.py chunkscript.py
ipython

In [1]: load "chunkevaluator.py"

In [2]: # %load "chunkevaluator.py"

In [3]: e = ChunkEvaluator()

In [4]: e.echo("print this")
Out[4]: ['chunks 6\n', 'cHJ\n', 'pbn\n', 'Qgd\n', 'Ghp\n', 'cw=\n']

With the EchoScript, I confirm the first item to be the string “chunk 6”. All the other items are base64 encoded chunks, which is correct. Now it is time to write the proper ChunkScript.

ChunkEvaluator with ChunkScript

The following code is written to the chunkscript.py file (overwriting all content).

import sys
import base64

def read_chunks(num):
    for i in range(num):
        line = sys.stdin.readline()
        sys.stdout.write(line)
        sys.stdout.flush()
    state = "Idle"

while True:
    line = sys.stdin.readline()
    keyword, value = line.split()
    read_chunks(int(value))

In [1]: load "chunkevaluator.py"

In [2]: # %load "chunkevaluator.py"

In [3]: e = ChunkEvaluator()

In [4]: e.echo("print this")
Out[4]: ['cHJ\n', 'pbn\n', 'Qgd\n', 'Ghp\n', 'cw=\n', '=\n']

Subproblem: Interactive Interpreter

The next subproblem to tackle is the InteractiveInterpreter, a class defined by the Python code module.

What is the Interactive Interpreter?

The Python code module defines a class named InteractiveInterpreter. It is used to implement read-eval-print loops in Python. You can use it to build an interactive REPL, exactly what is needed for the evaluator.

Here is what help(code.InteractiveInterpreter) says.

class InteractiveInterpreter(builtins.object)
 |  InteractiveInterpreter(locals=None)
 |
 |  Base class for InteractiveConsole.
 |
 |  This class deals with parsing and interpreter state (the user's
 |  namespace); it doesn't deal with input buffering or prompting or
 |  input file naming (the filename is always passed in explicitly).

Method runsource takes source code as input and evaluates / executes it.

 |  runsource(self, source, filename='<input>', symbol='single')
 |      Compile and run some source in the interpreter.
 |
 |      Arguments are as for compile_command().
 |
 |      One of several things can happen:
 |
 |      1) The input is incorrect; compile_command() raised an
 |      exception (SyntaxError or OverflowError).  A syntax traceback
 |      will be printed by calling the showsyntaxerror() method.
 |
 |      2) The input is incomplete, and more input is required;
 |      compile_command() returned None.  Nothing happens.
 |
 |      3) The input is complete; compile_command() returned a code
 |      object.  The code is executed by calling self.runcode() (which
 |      also handles run-time exceptions, except for SystemExit).
 |
 |      The return value is True in case 2, False in the other cases (unless
 |      an exception is raised).  The return value can be used to
 |      decide whether to use sys.ps1 or sys.ps2 to prompt the next
 |      line.

Runsource Output Examples

In [1]: import code

In [2]: ii = code.InteractiveInterpreter()

Case 1: incorrect input

In [3]: ii.runsource("int(\"hello\")")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /usr/lib/python3.12/code.py:90, in InteractiveInterpreter.runcode(self, code)
     78 """Execute a code object.
     79 
     80 When an exception occurs, self.showtraceback() is called to
   (...)     87 
     88 """
     89 try:
---> 90     exec(code, self.locals)
     91 except SystemExit:
     92     raise

File <input>:1

ValueError: invalid literal for int() with base 10: 'hello'
Out[3]: False

Case 2: correct but incomplete input.

In [4]: ii.runsource("print(")
Out[4]: True

Note the return value is True. Nothing happened. Sending more code does not complete the input.

In [5]: ii.runsource("\"hello\")")
  File <input>:1
    "hello")
           ^
SyntaxError: unmatched ')'

Out[5]: False

Case 3: correct and complete input.

In [6]: ii.runsource("print(\"hello\")")
hello
Out[6]: False

In [7]: ii.runsource("print")
Out[7]: <function print(*args, sep=' ', end='\n', file=None, flush=False)>
Out[7]: False

Defining a variable:

In [10]: ii.runsource("x = 12")
Out[10]: False

In [11]: ii.runsource("print(f\"X: {x}\")")
X: 12
Out[11]: False

The Runsource Symbol Argument

Method runsource accepts one more argument which I did not mention so far. That argument is called symbol and it takes one of three values:

‘single’,
‘exec’, or
‘eval’.

Perhaps it is best to see some examples to show how to use the argument.

Function Call Examples

In [1]: import code

In [2]: ii = code.InteractiveInterpreter()

In [3]: multi = """
   ...: def hello():
   ...:     print("Hello World")
   ...:     return 10
   ...: hello()
   ...: """

In [4]: single = "hello()"

In [5]: ii.runsource(multi, symbol='exec')
Hello World
Out[5]: False

In [6]: ii.runsource(single, symbol='exec')
Hello World
Out[6]: False

In [7]: ii.runsource(single, symbol='eval')
Hello World
Out[7]: False

In [8]: ii.runsource(single, symbol='single')
Hello World
Out[8]: 10
Out[8]: False

Only ‘single’ returned the result. Note that calling runsource with multi as the source argument is only error-free with the symbol argument set to exec.

Symbol value ‘exec’ means the source code input is treated like a Python script. It can contain definitions and multiple-block lines. But it cannot return a result.

Symbol value ‘eval’ means the source code input is treated as exactly one Python expression. It cannot contain multiple expressions.

Unexpected Behavior: Backslashes

Writing in the ipython REPL also creates unexpected errors.

In [1]: import code

In [2]: ii = code.InteractiveInterpreter()

In [3]: source = """
   ...: print("Hello \n World!")
   ...: """

In [4]: ii.runsource(source)
  File <input>:2
    print("Hello
          ^
SyntaxError: unterminated string literal (detected at line 2)

Out[4]: False

Backslashes are the most common issue because Python treats them as escape characters. The correct version is here.

In [5]: source = r"""print("Hello \n World!")"""

In [6]: ii.runsource(source)
Hello 
 World!
Out[6]: False

In [7]: source = r"""
   ...: print("Hello \n World!")
   ...: """

In [8]: ii.runsource(source)
Hello 
 World!
Out[8]: False

Implementing the Evaluator and ReplScript

I take the ideas shown in the subproblems, and merge them into the evaluator code and the replscript code.

The Evaluator

import base64
import subprocess
import sys

class Evaluator():
    def __init__(self, python_executable=None, script_path="replscript.py"):
        if python_executable is None:
            python_executable = sys.executable
        self.p = subprocess.Popen(
            [python_executable, script_path],
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.STDOUT,
            text=True,
            encoding="utf-8"
        )

    def _print(self, chunk: str):
        self.p.stdin.write(chunk + "\n")
        self.p.stdin.flush()

    def _input(self):
        return self.p.stdout.readline().strip()

    def _chunk_encode(self, code, size=128):
        e = base64.b64encode(code.encode("utf-8")).decode("ascii")
        chunks = [
            e[i : i + size]
            for i in range(0, len(e), size)
        ]
        return chunks

    def _chunk_decode(self, chunks):
        b64_data = "".join(chunks)
        decoded = base64.b64decode(b64_data.encode("utf-8"))
        return decoded.decode("utf-8")

    def runsource_exec(self, code):
        self._print(f"symbol exec")
        return self._runsource(code)

     def runsource_single(self, code):
        self._print(f"symbol single")
        return self._runsource(code)

    def _runsource(self, code):
        chunks = self._chunk_encode(code)
        self._print(f"chunks {len(chunks)}")
        for c in chunks:
            self._print(f"{c}")
        o = []
        keyword, value = self._input().split()
        for i in range(int(value)):
            o.append(self.p.stdout.readline().strip())
        return self._chunk_decode(o)

    def __del__(self):
        self.p.terminate()

The ReplScript

import sys
import base64
import io
import code
import inspect
import re

from contextlib import redirect_stdout, redirect_stderr

ANSI_RE = re.compile(r'\x1b\[[0-?]*[ -/]*[@-~]')

def run_sources_captured(ii, source, symbol):
    out = io.StringIO()
    err = io.StringIO()
    res = io.StringIO()

    # Custom displayhook to capture expression results
    def custom_displayhook(value):
        if value is not None:
            if callable(value):
                try:
                    sig = inspect.signature(value)
                    print(f"<function {value.__name__}{sig}>", file=res)
                except (ValueError, TypeError):
                    print(repr(value), file=res)
            else:
                print(repr(value), file=res)

    old_displayhook = sys.displayhook
    sys.displayhook = custom_displayhook

    try:
        with redirect_stdout(out), redirect_stderr(err):
            more = ii.runsource(source, symbol=symbol)
            if more:
                res.write("[incomplete input]\n")
    finally:
        sys.displayhook = old_displayhook

    output = out.getvalue() + err.getvalue() + res.getvalue()
    return ANSI_RE.sub('', output)

def chunk_encode(code, size=128):
    e = base64.b64encode(code.encode("utf-8")).decode("ascii")
    chunks = [
        e[i : i + size]
        for i in range(0, len(e), size)
    ]
    return chunks

def chunk_decode(chunks):
    b64_data = "".join(chunks)
    decoded = base64.b64decode(b64_data.encode("utf-8"))
    return decoded.decode("utf-8")

def read_chunks(num):
    chunks = []
    for i in range(num):
        line = sys.stdin.readline()
        chunks.append(line)
    return chunks

def write_chunks(chunks):
    sys.stdout.write(f"chunks {len(chunks)}" + "\n")
    for c in chunks:
        sys.stdout.write(c + "\n")
    sys.stdout.flush()

while True:
    line = sys.stdin.readline()
    keyword, value = line.split()
    symbol = value

    line = sys.stdin.readline()
    keyword, value = line.split()
    num_chunks = value

    chunks = read_chunks(int(value))
    decoded = chunks_decode(chunks)

    output = []
    if symbol == "single"
        output = run_sources_captured(ii, decoded, symbol)
    else:
        output = run_sources_captured(ii, decoded, 'exec')

    chunks = chunk_encode(output)

Interactive Use

In [1]: import evaluator

In [2]: e = evaluator.Evaluator()

In [3]: e.runsource_single("print(\"hello world\")")
Out[3]: 'hello world\n'

February 20, 2026

Defining Tools with a Toolkit

In the previous chapter, I have shown a Python program to manually exchange messages with an LLM API. I introduced a data structure that contains inputs and outputs to interact with the LLM.

In this chapter, I introduce the Toolkit component. The Toolkit contains definitions for function tools for the LLM API. But it does not explicitly perform a tool call (that will be done in later chapters).

Here are my two goals for this chapter. The Toolkit, that I implement, must:

define a Python REPL tool call for an LLM API,
export all tools as a list that is ready to be sent to an LLM API.

Implementing the Toolkit

I will step back for a moment and consider how do LLMs use tools. It is useful to keep that in mind while implementing the toolkit.

How do LLMs Use Tools?

Recall that the LLM API accepts a tools field in the JSON input data. When you provide the API tools, the server constructs a special system prompt. The prompt is designed to instruct the model to use the specified tool(s).

For example, here is the Claude API example prompt that is constructed for tool use.

In this environment you have access to a set of tools you can use to answer the
user's question.
{{ FORMATTING INSTRUCTIONS }}
String and scalar parameters should be specified as is, while lists and objects
should use JSON format. Note that spaces for string values are not stripped.
The output is not expected to be valid XML and is parsed with regular
expressions.
Here are the functions available in JSONSchema format:
{{ TOOL DEFINITIONS IN JSON SCHEMA }}
{{ USER SYSTEM PROMPT }}
{{ TOOL CONFIGURATION }}

Setup

mkdir infer_tk && cd infer_tk
python3 -m venv venv
source venv/bin/activate
pip3 install ipython
ipython

Starting with the Class

The basic Toolkit I will implement will not be a function, but a class. This is because it has state. (Though, functions can have state, but that is not on the agenda here.)

To store state, the Toolkit class keeps a variable named table.

In [1]: class Toolkit:
   ...:     def __init__(self):
   ...:         self.table = {}
   ...: 

In [2]: tk = Toolkit()

In [3]: tk.table
Out[3]: {}

Variable table is a Python dictionary. Inside of it, will be tool definitions for the LLM API. But, first, I have to recall the schemas for those definitions.

API Tools Input Schema

Recall the tools schema for the OpenAI API.

{
    ...
    "tools": [ properties ...]
    ...
}

The properties schema contains a definition of one function tool.

{
    "name": string,
    "type": "function",
    "description": string,
    "parameters": parameters
}

The parameters schema contains the definitions for all arguments.

{
    "type": "object",
    "properties": {
        arg: {
            "type": string,
            "description": string
        }, ...
    },
    "required": [ strings ... ]
}

The objects placed in the toolkit table shall follow the schemas for:

properties, and
parameters.

Defining Tools

I wish for a method to add new tools to the Toolkit.

In [5]: class Toolkit:
   ...:     def __init__(self):
   ...:         self.table = {}
   ...:     def deftool(self, name, description, parameters):
   ...:         if name in self.table:
   ...:             raise ValueError(f"Tool '{name}' already defined.")
   ...:         self.table[name] = {
   ...:             "name": name,
   ...:             "type": "function",
   ...:             "description": description,
   ...:             "parameters": parameters
   ...:         }
   ...:

In [6]: tk = Toolkit()

In [7]: tk.deftool("name0", "desc0", {})

In [8]: tk.table
Out[8]: 
{'name0': {'name': 'name0',
  'type': 'function',
  'description': 'desc0',
  'parameters': {}}}

That is weird. Maybe I need to define a variable and set it’s value into the dictionary? Also it seems better to return the result.

In [11]: class Toolkit:
    ...:     def __init__(self):
    ...:         self.table = {}
    ...:     def deftool(self, name, description, parameters):
    ...:         if name in self.table:
    ...:             raise ValueError(f"Tool '{name}' already defined.")
    ...:         r = {
    ...:             "name": name,
    ...:             "type": "function",
    ...:             "description": description,
    ...:             "parameters": parameters
    ...:         }
    ...:         self.table[name] = r
    ...:         return r
    ...:

In [12]: tk = Toolkit()

In [13]: tk.deftool("name0", "desc0", {})
Out[13]: 
{'name': 'name0', 
 'type': 'function',  
 'description': 'desc0', 
 'parameters': {}}

That worked.

Defining an Evaluator Tool

The requirement is an evaluator tool. A Python REPL for the LLM.

The function tool that I define here (named py_repl_runsource) is not the function tool used in the final implementation of Toolkit. The reason is that there are two function tools defined and printing and typing all of that is bothersome. I only define py_repl_runsource to illustrate how it is done.

In [21]: def add_py_repl_tools(toolkit):
    ...:     p = {
    ...:         "type": "object",
    ...:         "properties": {
    ...:             "code": {
    ...:                 "type": "string",
    ...:                 "description": "Python code to execute."
    ...:             }
    ...:         },
    ...:         "required": ["code"]
    ...:     }
    ...:     d = (
    ...:         "Execute Python code script in a persistent environment. "
    ...:         "You must explicitly print evaluation results. "
    ...:         "Returns the stdout and stderr output as one string. "
    ...:     )
    ...:     return toolkit.deftool("py_runsource_exec", d, p)
    ...:

In [22]: tk = Toolkit()

In [23]: add_py_repl_tools(tk)
Out[23]: 
{'name': 'py_runsource_exec',
 'type': 'function',
 'description': 'Execute Python code script in a persistent enviornment. 
You must explicitly print evaluation results. Returns the stdout and stderr
output as one string.',
 'parameters': {'type': 'object',
  'properties': {'code': {'type': 'string',
    'description': 'Python code to execute.'}},
  'required': ['code']}}

Converting to Tools

The second requirement is that the tools is a list.

This means that the tools dictionary, inside the Toolkit, must be converted to a list.

In [24]: class Toolkit:
    ...:     def __init__(self):
    ...:         self.table = {}
    ...:     def deftool(self, name, description, parameters):
    ...:         if name in self.table:
    ...:             raise ValueError(f"Tool '{name}' already defined.")
    ...:         r = {
    ...:             "name": name,
    ...:             "type": "function",
    ...:             "description": description,
    ...:             "parameters": parameters
    ...:         }
    ...:         self.table[name] = r
    ...:         return r
    ...:     def tools(self):
    ...:         return [x for x in self.table.values()]
    ...:     def match(self, name):
    ...:         if name in self.table:
    ...:             return name
    ...:         else:
    ...:             return False
    ...:

In [25]: tk = Toolkit()

In [26]: add_py_repl_tools(tk)
Out[26]: 
{'name': 'py_runsource_exec',
 'type': 'function',
 'description': 'Execute Python code script in a persistent enviornment. 
You must explicitly print evaluation results. Returns the stdout and stderr
output as one string.',
 'parameters': {'type': 'object',
  'properties': {'code': {'type': 'string',
    'description': 'Python code to execute.'}},
  'required': ['code']}}

In [27]: tk.tools()
Out[27]: 
[{'name': 'py_runsource_exec',
  'type': 'function',
  'description': 'Execute Python code script in a persistent enviornment. 
 You must explicitly print evaluation results. Returns the stdout and stderr
 output as one string.',
  'parameters': {'type': 'object',
   'properties': {'code': {'type': 'string',
     'description': 'Python code to execute.'}},
   'required': ['code']}}]

In [28]: tk.match("py_repl_runsource")
Out[28]: 'py_repl_runsource'

In [29]: tk.match("must be False")
Out[29]: False

Toolkit Code

class Toolkit:
    def __init__(self):
        self.table = {}

    def deftool(self, name, description, parameters):
        if name in self.table:
            raise ValueError(f"Tool '{name}' already defined.")
        r = {
            "name": name,
            "type": "function",
            "description": description,
            "parameters": parameters
        }
        self.table[name] = r
        return r

    def tools(self):
        return [x for x in self.table.values()]

    def match(self, name):
        if name in self.table:
            return name
        else:
            return False

def add_py_repl_tools(toolkit):
    p1 = {
        "type": "object",
        "properties": {
            "code": {
                "type": "string",
                "description": "Python code to execute."
            }
        },
        "required": ["code"]
    }
    d1 = (
        "Execute Python code script in a persistent environment. "
        "You must explicitly print evaluation results. "
        "Returns the stdout and stderr output as one string. "
    )
    o = [
        toolkit.deftool("py_runsource_exec", d1, p1)
    ]
    return o

February 19, 2026

Manual Chat Program with a Circular Context

In the previous chapter, I have shown how to exchange data with an LLM API. I used CURL as the HTTP client.

In this chapter, I replace CURL with a Python program. That does not mean that everything is automated. No, I still manually manage the messages, but I introduce a data structure that contains inputs and outputs to interact with the LLM. I name the data structure a circular context, and base it on a circular buffer.

This chapter has three sections:

Limiting the context – a circular buffer limits the number of messages,
Python implementation – the program code in Python,
Interactive use – an iPython session to show usage.

Limiting the Context

Problem Definition

Loosely speaking, an LLM takes as input a text sequence and returns as output another text sequence that completes the prior. These, LLMs, are trained to do so, by modelling the probability distribution of a text given some prior text. Or in other words, an LLM can be thought of as a predict function, that takes text of size N and returns text of size N + k, such that N + k is less than the LLM context size M, which is defined by the API model.

LLM Context size M
Index 0 1 2 3 4 5 6 7 8 9 ... N ... k < M
Value ? ? ? ? ? ? ? ? ? ? ... ? ... ?

Circular Buffer Definition

To limit the number of messages, and thus to never reach the LLM context size, I use a circular buffer data structure. For simplicity, I do not count the number of tokens.

A circular buffer (CB), limited to k items, is either:

the empty CB (of size n = 0), or
a CB of size n < k, formed by adding a new item to the front of a CB of size n - 1 < k.
a CB of size n = k, formed by adding a new item to the front of a CB of size n - 1, which is formed by removing an item from the back of a CB of size n = k.

How do you determine k? Randomly. I have not thought of a heuristic. So, I randomly picked 19, which is the 8th prime number, as the default value.

Python Implementation

Code Overview

There are four concepts I use in the code implementation:

circular buffer,
circular context,
context, and
predict.

Circular buffer is the data structure defined in the previous section. It stores LLM specific input and output objects. It is the essential part of the circular context.

Circular context is a data structure that hides the circular buffer. It defines methods to push new LLM specific objects, a clear() to remove all objects, and a to_list() method.

Context is a list data structure. The only difference is, it stores specific LLM API objects. These objects are the very same objects I have shown in the previous chapter to interact with the LLM. Namely: EasyInputMessage and ResponseOutputText.

CircularBuffer <--> CircularContext <--> Context <--> LLM API
    Class                Class            List         JSON

Lastly, predict is the main function that takes a context as input and returns the output of the LLM. It does not (to this end) return a new context.

Setup

mkdir llm_api_prog && cd llm_api_prog
python3 -m venv venv
source venv/bin/activate
pip install requests ipython

I will write all code into a single file circularcontext.py.

Dependencies

Because the LLM API uses JSON and HTTP, you need:

a JSON package,
a HTTP package to send requests and receive responses.

import json
import requests
import os

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

Empty Request

Recall that the LLM API expects a JSON data object with fields: "model", "input", and "tools".

def openai_prepare(model, context, tools):
    return {
        "model": model,
        "input": context,
        "tools": tools
    }

Sending Requests

def openai_request(model="gpt-4.1", context=[], tools=[]):
    url = "https://api.openai.com/v1/responses"
    headers = { 
        "Authorization": f"Bearer {OPENAI_API_KEY}",
        "Content-Type": "application/json"
    }
    data = openai_prepare(model, context, tools)
    return requests.post(url, headers=headers, json=data)

Receiving Responses

def openai_response(response):
    response.raise_for_status()
    data = response.json()
    return data['output']

Note that better error handling is needed.

Predict

def predict(context=[], tools=[]):
    r = openai_response(openai_request(context=context, tools=tools))
    return r

Circular Buffer

The implementation of a circular buffer written by an LLM.

class CircularBuffer:
    def __init__(self, capacity):
        if capacity <= 0:
            raise ValueError("Capacity must be positive")

        self.capacity = capacity
        self.buffer = [None] * capacity
        self.head = 0  # points to oldest element
        self.tail = 0  # points to next write position
        self.size = 0

    def enqueue(self, item):
        """Add an element to the buffer."""
        self.buffer[self.tail] = item

        if self.size == self.capacity:
            # Buffer full → overwrite oldest
            self.head = (self.head + 1) % self.capacity
        else:
            self.size += 1

        self.tail = (self.tail + 1) % self.capacity

    def dequeue(self):
        """Remove and return the oldest element."""
        if self.size == 0:
            raise IndexError("Dequeue from empty buffer")

        item = self.buffer[self.head]
        self.buffer[self.head] = None  # Optional cleanup
        self.head = (self.head + 1) % self.capacity
        self.size -= 1

        return item

    def peek(self):
        """Return the oldest element without removing it."""
        if self.size == 0:
            raise IndexError("Peek from empty buffer")
        return self.buffer[self.head]

    def to_list(self):
        """Return elements as a standard Python list (FIFO order)."""
        result = []
        index = self.head
        for _ in range(self.size):
            result.append(self.buffer[index])
            index = (index + 1) % self.capacity
        return result

    def is_empty(self):
        return self.size == 0

    def is_full(self):
        return self.size == self.capacity

    def __len__(self):
        return self.size

    def __repr__(self):
        return f"CircularBuffer({self.to_list()})"

    def shallow_clone(self):
        """Return a shallow copy of the circular buffer."""
        cb = CircularBuffer(self.capacity)
        cb.buffer = self.buffer.copy()
        cb.head = self.head
        cb.tail = self.tail
        cb.size = self.size
        return cb

Circular Context

A context is a data structure that contains objects which are elements of the input array for the LLM API.

class CircularContext:
    def __init__(self, capacity=19):
        if capacity <= 0:
            raise ValueError("Capacity must be positive")

        self.capacity = capacity
        self.cb = CircularBuffer(self.capacity)

    def push_easy_input_message(self, content="", role="user"):
        self.cb.enqueue({"content": content, "role": role, "type": "message"})

    def push_function_call_output(self, call_id="", output=""):
        self.cb.enqueue({
            "call_id": call_id,
            "output": output,
            "type": "function_call_output"
            })

    def push_custom(self, object):
        self.cb.enqueue(object)

    def clear(self):
        self.cb = CircularBuffer(self.capacity)

    def to_list(self):
        return self.cb.to_list()

Usage

Getting Started

Make sure that:

the terminal is in the proper directory,
the Python virtual environment is activated,
the proper code is in the circularcontext.py file.

Start an iPython session.

export OPENAI_API_KEY="your api key..."
ipython

Load the code.

In [1]: load "circularcontext.py"

Sanity check the OpenAI API key.

In [3]: OPENAI_API_KEY
Out[3]: 'your api key...'

Sanity check an empty request.

In [4]: openai_prepare("gpt-4.1", [], [])
Out[4]: {'model': 'gpt-4.1', 'input': [], 'tools': []}

Sending an Easy Input Message

In [5]: cc = CircularContext()
In [6]: cc.push_easy_input_message("Hi!")

Sanity check a message.

In [7]: cc.to_list()
Out[7]: [{'content': 'Hi!', 'role': 'user', 'type': 'message'}]

In [8]: r = predict(context=cc.to_list())
In [9]: r
Out[9]:
[{'type': 'output_text',
  'annotations': [],
  'logprobs': [],
  text': 'Hello! How can I help you today? 😊'}]

Note that the output result is an array.

Merging Context

In [10]: for x in r:
             cc.push_custom(x)

In [11]: cc.push_easy_input_message("Say hi again.")

Sanity check.

In [12]: cc.to_list()
Out[12]: 
[{'content': 'Hi!', 'role': 'user', 'type': 'message'},
 {'id': (omitted),
  'type': 'message',
  'status': 'completed',
  'content': [{'type': 'output_text',
    'annotations': [],
    'logprobs': [],
    'text': 'Hello! How can I help you today? 😊'}],
  'role': 'assistant'},
 {'content': 'Say hi again.', 'role': 'user', 'type': 'message'}]

In [13]: r = predict(context=cc.to_list())

In [14]: r
Out[14]: 
[{'id': (omitted),
  'type': 'message',
  'status': 'completed',
  'content': [{'type': 'output_text',
    'annotations': [],
    'logprobs': [],
    'text': 'Hi again! 👋'}],
  'role': 'assistant'}]

February 17, 2026

The Bare Minimum to Chat with Function Calls

This is a tutorial on using the OpenAI LLM API, focusing on: messages and function calls. But, without Python, TypeScript, or some other programming language. The only requirements are CURL (an HTTP client) and an OpenAI API key.

Why Bother?

“Why waste my time, when I can just import an API package?”

Sure, that works, until you go deeper. What if…

you do not have access to / permission for / trust in the API package?
you want to avoid software bloat?
you want to understand what is happening?
you want to make your own AI Agents?

Before, there was only /chat/completions/. Now, there are /responses, function calls, tool calls, computer calls, image calls, search calls, skills, etc.

I will show the bare minimum to interact with an LLM API:

Prompt completions and context, and
Function calls.

Prompt Completions and Context

In this section, I show messaging an LLM. The provider is OpenAI at:

https://api.openai.com/v1/responses
HTTP Method: POST

Endpoint /responses accepts application/json data. Using CURL, create a POST request with JSON data.

curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{ ... json data goes here ... }'

I wish to clarify two things. Suppose I want to exchange messages with an LLM…

What JSON data do I need?
May I see an example for exchanging messages?

What JSON data do I need?

Input

Start with an empty JSON object.

{
  request data go here ...
}

Select the model:

{
  "model": "gpt-4.1"
}

Set "input" field value to an array [ ... ]. (Do not forget the comma.)

{
  "model": "gpt-4.1",
  "input": [
     array items go here ...
  ]
}

The array items will be explained now.

Input Items

The API defines many objects you can put in the “input array” [ ... ]. Far too many, to list all. Instead, I show only four. Two object types may be created by your client program:

EasyInputMessage, and
FunctionCallOutput.

Two object types may be created by the server:

ResponseOutputMessage, and
ResponseFunctionToolCall.

In this section, I show EasyInputMessage and ResponseOutputMessage types. These are enough for prompts with context. In the Function Call section, I will show FunctionCallOutput and ResponseFunctionToolCall types.

Easy Input Message (Client)

Your client program sends prompts to the LLM inside an EasyInputMessage. The prompt text goes in the "content" field.

EasyInputMessage schema:
{ 
  "content": string (this is where your prompt goes),
  "role": "user" | "assistant" | "system" | "developer",
  "type": "message"
}

Example: 
{
  "content": "This is a prompt sent to the LLM.",
  "role": "user",
  "type": "message"
}

ResponseOutputMessage (Server)

The LLM answers with a ResponseOutputMessage object type. It is more complex, when compared to EasyInputMessage. The reason is that it’s "content" field value is more complex. The value is an array that may contain two possible object types. The array tems are either a ResponseOutputRefusal type (the LLM refused to answer) or ResponseOutputText type (the LLM answered). I will first show the schema of these object types, and second, the schema for ResponseOutputMessage.

ResponseOutputRefusal schema:
{ 
  "refusal": string, 
  "type": "refusal"
}

ResponseOutputText schema:
{ 
  "annotations": [ FileCitation | URLCitation | 
                   ContainerFileCitation | FilePath ],
  "logprobs": [ logprobs object ],
  "text": string,
  "type": "output_text"
}

The ResponseOutputText schema is non-trivial. The values of the “annotations” and “logprobs” fields are complex. It is best to simply ignore them unless needed.

With that in mind, here is the schema for ResponseOutputMessage.

ResponseOutputMessage schema:
{
  "id": string,
  "content": [ ResponseOutputText | ResponseOutputRefusal ],
  "role": "assistant",
  "status": "in_progress" | "completed" | "incomplete",
  "type": "message" 
}

To show an example ResponseOutputMessage, I will make an API request and show the response.

May I see an example for echanging messages?

Sending A Single Message

The prompt is: “Hi!”.

curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
        "model": "gpt-4.1",
        "input": 
        [
          {
            "content": "Hi!", "role": "user", "type": "message"
          }
        ]
      }'

This is the value of the “output” part of the response.

...
  "output": [
    {
      "id": (ommited),
      "type": "message",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "annotations": [],
          "logprobs": [],
          "text": "Hello! How can I help you today?"
        }
      ],
      "role": "assistant"
    }
  ],
...

In this simple case, the output is an array of one item that is an object of type ResponseOutputMessage. And that object itself has a “content” field for which the value is an array of one item that is an object of type ResponseOutputText.

Creating Context

To continue the LLM conversation, you need to merge the (client) prompt and the output response (server). This is known as a context.

Copy the ResponseOutputMessage object from the “output array”.
Append a new EasyInputMessage object as the next prompt.

Make sure to add commas between the items of the “input array” when doing it manually.

curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
        "model": "gpt-4.1",
        "input": 
        [
          {
            "content": "Hi!", "role": "user", "type": "message"
          },
          {
            "id": (ommited),
            "type": "message",
            "status": "completed",
            "content": [
            {
              "type": "output_text", "annotations": [], "logprobs": [],
              "text": "Hello! How can I help you today?"
            }
            ],
            "role": "assistant"
          },
          {
            "content": "Say hi again.", "role": "user", "type": "message"
          }
        ]
      }'

Here is the output.

...
  "output": [
    {
      "id": (ommitted),
      "type": "message",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "annotations": [],
          "logprobs": [],
          "text": "Hi again!"
        }
      ],
      "role": "assistant"
    }
  ],
...

Section Summary

The LLM API accepts JSON data as an input and writes JSON data as an output.
To exchange messages, a model and an input array must be set.
The elements of the input array are JSON objects that follow the EasyInputMessage schema or ResponseOutputMessage schema.

Function Calls

Section Overview

In this section, I show how to exchange messages that are function call ready with an LLM. I will use LLMs by OpenAI, which are available at:

https://api.openai.com/v1/responses
HTTP Method: POST

Endpoint /responses accepts application/json data. Using CURL, create a POST request with JSON data.

curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{ ... json data goes here ... }'

In the previous section, I have shown that two fields are required: “input” and “output”. The values of these fields are arrays. So far, the only object types in these arrays were EasyInputMessage and ResponseOutputMessage. That will change now.

Request
{ 
  "model": "gpt-4.1",
  "input": [ ... ]
}

Response
{ ...
  "output": [ ... ]
  ...
}

Two new objects I will show now are:

FunctionCallOutput, and
ResponseFunctionToolCall.

I wish to clarify two things. Suppose I want to exchange messages with an LLM and allow it to use some function calls with my client program…

What JSON data do I need?
May I see an example for exchanging messages with function calls?

What JSON data do I need?

To exchange messages that are function call ready, set the “tools” field in the request.

{ 
  "model": "gpt-4.1",
  "input": [ ... ],
  "tools": [ ... ]
}

Tool Items

Each item, in the “tools array”, is an object { ... }. The API supports several different object types. I will only show:

FunctionTool.

A FunctionTool object has a name, a description, and parameters that the client program must set. Name names the function. Description describes the function. Parameters describes the function arguments.

FunctionTool schema:
{
  "type": "function",
  "name": string,
  "description": string,
  "parameters": object
}

Parameters are described in the “properties” field. Each parameter is yet another object.

FunctionTool Parameters schema:
{
  "type": "object",
  "properties": object,
  "required": [ strings ]
}

The value of the “required” field is an array that contains strings naming parameters that are required.

FunctionTool Parameters Properties schema:
{
  argument_name: 
  { 
    "type": argument_type,
    "description": argument_desc
  },
  ...
}

The key argument_name is a string that names the function argument. The object argument_type is a string that names the function argument type. The object argument_desc is a string that describes the function argument.

Request With Tools

Now that you have seen the structure of a FunctionTool, here is what an example request that is function call ready looks like:

{
  "model": "gpt-4.1",
  "input": 
  [
    {
      "content": "Which natural number comes after 1678931?",
      "role": "user", "type": "message"
    }
  ],
  "tools":
  [
    {
      "name": "next_natural",
      "type": "function",
      "description": "next_natural takes as input a natural number.
Returns a the first natural number that is greater than the argument.",
      "parameters": {
        "type": "object",
        "properties": {
          "number" : {
            "type": "number",
            "description": "The input natural number."
          }
        },
        "required": ["number"]
      }
    }
  ]
}

The request includes the “tools” field, for which the value is an array with exactly one FunctionTool object. When the request defines a FunctionTool, two things can happen:

the FunctionTool may be ignored, or
a response to use the FunctionTool may be created.

Your client program must support both scenarios. It may check the type of the output. If the type is a ResponseOutputMessage, the FunctionTool was ignored. If the type is a ResponseFunctionToolCall, the client must perform the function call.

In other words, the server returns a response with the “output” field value to be an array whose element is either a:

ResponseOutputMessage, or
ResponseFunctionToolCall.

Scenario A:

client --> request: EasyInputMessage and tools                 --> server
client <--                               ResponseOutputMessage <-- server

Scenario B:

client --> request: EasyInputMessage and tools                 --> server
client <--                            ResponseFunctionToolCall <-- server
client --> request: ResponseFunctionToolCallOutput and tools   --> server
client <--    ReponseOutputMessage or ResponseFunctionToolCall <-- server

ResponseFunctionToolCall (Server)

Note that the server creates this object.

ResponseFunctionToolCall schema:
{ 
  "arguments": string,
  "call_id": string,
  "name": string,
  "type": "function_call",
  "id": string,
  "status": "in_progress" | "completed" | "incomplete"
}

FunctionCallOutput (Client)

Note that the client creates this object. When creating this object, the value of the "call_id" field is copied from the matching ResponseFunctionToolCall object.

Schema:
{ 
  "call_id": string,
  "output": string | (there is more but I ignore that),
  "type": "function_call_output",
  "id": string (mostly ignore this),
  "status": "in_progress" | "completed" | "incomplete"
}

Example:
{ 
  "call_id": "call_random123", (generated by server)
  "output": "fizzbuzz",
  "type": "function_call_output",
  "id": "123456"
  "status": "completed"
}

May I see an example for exchanging messages with function calls?

Example FunctionToolCall Request

curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
        "model": "gpt-4.1",
        "input": 
        [
          {
            "content": "Which natural number comes after 1678931?",
            "role": "user", "type": "message"
          }
        ],
        "tools": 
        [
          {
            "name": "next_natural",
            "type": "function",
            "description": "next_natural takes as input a natural number.
Returns a the first natural number that is greater than the argument.",
            "parameters": {
              "type": "object",
              "properties": {
                "number" : {
                  "type": "number",
                  "description": "The input natural number."
                }
              },
              "required": ["number"]
            }
          }
        ]
      }'

Example FunctionToolCall Response

...
  "output": [
    {
      "id": (omitted),
      "type": "function_call",
      "status": "completed",
      "arguments": "{\"number\":1678931}",
      "call_id": (omitted),
      "name": "next_natural"
    }
  ],
...

Example FunctionToolCallOutput Request

curl https://api.openai.com/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
        "model": "gpt-4.1",
        "input":
        [
          {
            "content": "Which natural number comes after 1678931?",
            "role": "user", "type": "message"
          },
          {
            "id": (omitted),
            "type": "function_call",
            "status": "completed",
            "arguments": "{\"number\":1678931}",
            "call_id": "call_(same call id)",
            "name": "next_natural"
          },
          {
            "call_id": "call_(same call id)",
            "output": "1678932",
            "type": "function_call_output"
          }
        ],
        "tools":
        [
          {
            "name": "next_natural",
            "type": "function",
            "description": "next_natural takes as input a natural number. 
Returns the first natural number that is greater than the argument.",
            "parameters":
            {
              "type": "object",
              "properties":
              {
                "number":
                {
                  "type": "number",
                  "description": "The input natural number."
                }
              },
              "required": ["number"]
            }
          }
        ]
      }'

Example FunctionToolCallOutput Response

...
"output": [
    {
      "id": (omitted),
      "type": "message",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "annotations": [],
          "logprobs": [],
          "text": "The natural number that comes after 1,678,931 is 1,678,932."
        }
      ],
      "role": "assistant"
    }
  ],
...

Section Summary

The LLM API accepts JSON data as an input and writes JSON data as an output.
To exchange function call ready messages, a model, an input array, and a tools array must be set.
The elements of the input array are JSON objects that follow the EasyInputMessage, ResponseOutputMssage, ResponseFunctionToolCall, or ResponseFunctionToolCallOutput Schema.

February 15, 2026