Tag: API

  • Making a REPL with an Evaluator

    In the previous chapter, I have shown a Toolkit component. The Toolkit contains definitions for function tools for the LLM API. But it does not explicitly perform a tool call (that will be done in later chapters).

    In this chapter, I show the evaluator component. The evaluator is a program to which you can send program code. A code interpreter. But the focus here is an evaluator that the LLM can interact with. The LLM sends a function tool call to interact with the evaluator.

    Here are my two goals for this chapter. The evaluator must:

    • manage a separate process for a Python interpreter,
    • provide a method to send code to the interpreter and return as a string the output of the stdout and stderr.

    The evaluator is a complex topic. Perhaps it is best to subdivide the problem. I thought about it for some time, and came up with the following subproblems:

    • echo script,
    • base64-encoded chunks echo script, and
    • interactive interpreter script.

    Each subproblem is dealt with separately. But the last section shows a final evaluator implementation.

    Subproblem: Echo Script

    An echo script reads input from stdin. Then it prints back the output. The exact same output is printed to stdout.

    EchoScript

    Here is the code for the echoscript.py file.

    import sys
    
    while True:
        for line in sys.stdin:
            sys.stdout.write(line)
            sys.stdout.flush()

    (Note: SIGTERM will terminate a Python process running a forever loop. That is, unless the signal handler is overriden or interrupts are disabled.)

    EchoEvaluator

    The following code is written in an echoevaluator.py file.

    import subprocess
    import sys
    
    class EchoEvaluator:
        def __init__(self, python_executable=None, script_path="echoscript.py"):
            if python_executable is None:
                python_executable = sys.executable
            self.p = subprocess.Popen(
                [python_executable, script_path],
                stdin=subprocess.PIPE,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                text=True,
                encoding="utf-8"
            )
    
        def _print(self, chunk: str):
            self.p.stdin.write(chunk + "\n")
            self.p.stdin.flush()
    
        def echo(self, code):
            self._print(f"{code}")
            return self.p.stdout.readline()
    
        def __del__(self):
            self.p.terminate()

    Example Use

    In [1]: load "echoevaluator.py"
    
    In [2]: # %load "echoevaluator.py"
    In [3]: e = EchoEvaluator()
    In [4]: e.echo("print this")
    Out[4]: 'print this\n'

    Subproblem: Encoded Chunk Echo Script

    An encoded chunk echo script is like an echo script, but the text is divided into base64-encoded chunks.

    Base64 Encoded Chunks

    Here is how to encode a string into base64 and split it into chunks of three letters.

    In [1]: import base64
    In [2]: base64.b64encode("Test string".encode("utf-8")).decode("ascii")
    Out[2]: 'VGVzdCBzdHJpbmc='
    In [3]: encoded = base64.b64encode("print this".encode("utf-8")).decode("ascii")
    In [4]: chunks = [
       ...:     encoded[i: i + 3]
       ...:     for i in range(0, len(encoded), 3)
       ...: ]
    In [5]: chunks
    Out[5]: ['cHJ', 'pbn', 'Qgd', 'Ghp', 'cw=', '=']

    ChunkEvaluator

    The ChunkEvaluator class is similar to EchoEvaluator. Except it includes a method to encode the chunks.

    import subprocess
    import sys
    import base64
    
    class ChunkEvaluator:
        def __init__(self, python_executable=None, script_path="chunkscript.py"):
            if python_executable is None:
                python_executable = sys.executable
            self.p = subprocess.Popen(
                [python_executable, script_path],
                stdin=subprocess.PIPE,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                text=True,
                encoding="utf-8"
            )
    
        def _print(self, chunk: str):
            self.p.stdin.write(chunk + "\n")
            self.p.stdin.flush()
    
        def _chunk_encode(self, code, size=128):
            e = base64.b64encode(code.encode("utf-8")).decode("ascii")
            chunks = [
                e[i : i + size]
                for i in range(0, len(e), size)
            ]
            return chunks
    
        def echo(self, code):
            chunks = self._chunk_encode(code, size=3)
            self._print(f"chunks {len(chunks)}")
            for c in chunks:
                self._print(f"{c}")
            o = []
            for c in chunks:
                o.append(self.p.stdout.readline())
            return o
    
        def __del__(self):
            self.p.terminate()

    ChunkEvaluator with EchoScript

    First, I copy echoscript.py to chunkscript.py.

    cp echoscript.py chunkscript.py
    ipython
    In [1]: load "chunkevaluator.py"
    
    In [2]: # %load "chunkevaluator.py"
    In [3]: e = ChunkEvaluator()
    In [4]: e.echo("print this")
    Out[4]: ['chunks 6\n', 'cHJ\n', 'pbn\n', 'Qgd\n', 'Ghp\n', 'cw=\n']

    With the EchoScript, I confirm the first item to be the string “chunk 6”. All the other items are base64 encoded chunks, which is correct. Now it is time to write the proper ChunkScript.

    ChunkEvaluator with ChunkScript

    The following code is written to the chunkscript.py file (overwriting all content).

    import sys
    import base64
    
    def read_chunks(num):
        for i in range(num):
            line = sys.stdin.readline()
            sys.stdout.write(line)
            sys.stdout.flush()
        state = "Idle"
    
    while True:
        line = sys.stdin.readline()
        keyword, value = line.split()
        read_chunks(int(value))
    In [1]: load "chunkevaluator.py"
    
    In [2]: # %load "chunkevaluator.py"
    In [3]: e = ChunkEvaluator()
    In [4]: e.echo("print this")
    Out[4]: ['cHJ\n', 'pbn\n', 'Qgd\n', 'Ghp\n', 'cw=\n', '=\n']

    Subproblem: Interactive Interpreter

    The next subproblem to tackle is the InteractiveInterpreter, a class defined by the Python code module.

    What is the Interactive Interpreter?

    The Python code module defines a class named InteractiveInterpreter. It is used to implement read-eval-print loops in Python. You can use it to build an interactive REPL, exactly what is needed for the evaluator.

    Here is what help(code.InteractiveInterpreter) says.

    class InteractiveInterpreter(builtins.object)
     |  InteractiveInterpreter(locals=None)
     |
     |  Base class for InteractiveConsole.
     |
     |  This class deals with parsing and interpreter state (the user's
     |  namespace); it doesn't deal with input buffering or prompting or
     |  input file naming (the filename is always passed in explicitly).

    Method runsource takes source code as input and evaluates / executes it.

     |  runsource(self, source, filename='<input>', symbol='single')
     |      Compile and run some source in the interpreter.
     |
     |      Arguments are as for compile_command().
     |
     |      One of several things can happen:
     |
     |      1) The input is incorrect; compile_command() raised an
     |      exception (SyntaxError or OverflowError).  A syntax traceback
     |      will be printed by calling the showsyntaxerror() method.
     |
     |      2) The input is incomplete, and more input is required;
     |      compile_command() returned None.  Nothing happens.
     |
     |      3) The input is complete; compile_command() returned a code
     |      object.  The code is executed by calling self.runcode() (which
     |      also handles run-time exceptions, except for SystemExit).
     |
     |      The return value is True in case 2, False in the other cases (unless
     |      an exception is raised).  The return value can be used to
     |      decide whether to use sys.ps1 or sys.ps2 to prompt the next
     |      line.

    Runsource Output Examples

    In [1]: import code
    In [2]: ii = code.InteractiveInterpreter()

    Case 1: incorrect input

    In [3]: ii.runsource("int(\"hello\")")
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    File /usr/lib/python3.12/code.py:90, in InteractiveInterpreter.runcode(self, code)
         78 """Execute a code object.
         79 
         80 When an exception occurs, self.showtraceback() is called to
       (...)     87 
         88 """
         89 try:
    ---> 90     exec(code, self.locals)
         91 except SystemExit:
         92     raise
    
    File <input>:1
    
    ValueError: invalid literal for int() with base 10: 'hello'
    Out[3]: False

    Case 2: correct but incomplete input.

    In [4]: ii.runsource("print(")
    Out[4]: True

    Note the return value is True. Nothing happened. Sending more code does not complete the input.

    In [5]: ii.runsource("\"hello\")")
      File <input>:1
        "hello")
               ^
    SyntaxError: unmatched ')'
    
    Out[5]: False

    Case 3: correct and complete input.

    In [6]: ii.runsource("print(\"hello\")")
    hello
    Out[6]: False
    In [7]: ii.runsource("print")
    Out[7]: <function print(*args, sep=' ', end='\n', file=None, flush=False)>
    Out[7]: False

    Defining a variable:

    In [10]: ii.runsource("x = 12")
    Out[10]: False
    In [11]: ii.runsource("print(f\"X: {x}\")")
    X: 12
    Out[11]: False

    The Runsource Symbol Argument

    Method runsource accepts one more argument which I did not mention so far. That argument is called symbol and it takes one of three values:

    • ‘single’,
    • ‘exec’, or
    • ‘eval’.

    Perhaps it is best to see some examples to show how to use the argument.

    Function Call Examples

    In [1]: import code
    
    In [2]: ii = code.InteractiveInterpreter()
    
    In [3]: multi = """
       ...: def hello():
       ...:     print("Hello World")
       ...:     return 10
       ...: hello()
       ...: """
    
    In [4]: single = "hello()"
    
    In [5]: ii.runsource(multi, symbol='exec')
    Hello World
    Out[5]: False
    
    In [6]: ii.runsource(single, symbol='exec')
    Hello World
    Out[6]: False
    
    In [7]: ii.runsource(single, symbol='eval')
    Hello World
    Out[7]: False
    
    In [8]: ii.runsource(single, symbol='single')
    Hello World
    Out[8]: 10
    Out[8]: False

    Only ‘single’ returned the result. Note that calling runsource with multi as the source argument is only error-free with the symbol argument set to exec.

    Symbol value ‘exec’ means the source code input is treated like a Python script. It can contain definitions and multiple-block lines. But it cannot return a result.

    Symbol value ‘eval’ means the source code input is treated as exactly one Python expression. It cannot contain multiple expressions.

    Unexpected Behavior: Backslashes

    Writing in the ipython REPL also creates unexpected errors.

    In [1]: import code
    
    In [2]: ii = code.InteractiveInterpreter()
    
    In [3]: source = """
       ...: print("Hello \n World!")
       ...: """
    
    In [4]: ii.runsource(source)
      File <input>:2
        print("Hello
              ^
    SyntaxError: unterminated string literal (detected at line 2)
    
    Out[4]: False

    Backslashes are the most common issue because Python treats them as escape characters. The correct version is here.

    In [5]: source = r"""print("Hello \n World!")"""
    
    In [6]: ii.runsource(source)
    Hello 
     World!
    Out[6]: False
    
    In [7]: source = r"""
       ...: print("Hello \n World!")
       ...: """
    
    In [8]: ii.runsource(source)
    Hello 
     World!
    Out[8]: False

    Implementing the Evaluator and ReplScript

    I take the ideas shown in the subproblems, and merge them into the evaluator code and the replscript code.

    The Evaluator

    import base64
    import subprocess
    import sys
    
    class Evaluator():
        def __init__(self, python_executable=None, script_path="replscript.py"):
            if python_executable is None:
                python_executable = sys.executable
            self.p = subprocess.Popen(
                [python_executable, script_path],
                stdin=subprocess.PIPE,
                stdout=subprocess.PIPE,
                stderr=subprocess.STDOUT,
                text=True,
                encoding="utf-8"
            )
    
        def _print(self, chunk: str):
            self.p.stdin.write(chunk + "\n")
            self.p.stdin.flush()
    
        def _input(self):
            return self.p.stdout.readline().strip()
    
        def _chunk_encode(self, code, size=128):
            e = base64.b64encode(code.encode("utf-8")).decode("ascii")
            chunks = [
                e[i : i + size]
                for i in range(0, len(e), size)
            ]
            return chunks
    
        def _chunk_decode(self, chunks):
            b64_data = "".join(chunks)
            decoded = base64.b64decode(b64_data.encode("utf-8"))
            return decoded.decode("utf-8")
    
        def runsource_exec(self, code):
            self._print(f"symbol exec")
            return self._runsource(code)
    
         def runsource_single(self, code):
            self._print(f"symbol single")
            return self._runsource(code)
    
        def _runsource(self, code):
            chunks = self._chunk_encode(code)
            self._print(f"chunks {len(chunks)}")
            for c in chunks:
                self._print(f"{c}")
            o = []
            keyword, value = self._input().split()
            for i in range(int(value)):
                o.append(self.p.stdout.readline().strip())
            return self._chunk_decode(o)
    
        def __del__(self):
            self.p.terminate()

    The ReplScript

    import sys
    import base64
    import io
    import code
    import inspect
    import re
    
    from contextlib import redirect_stdout, redirect_stderr
    
    ANSI_RE = re.compile(r'\x1b\[[0-?]*[ -/]*[@-~]')
    
    def run_sources_captured(ii, source, symbol):
        out = io.StringIO()
        err = io.StringIO()
        res = io.StringIO()
    
        # Custom displayhook to capture expression results
        def custom_displayhook(value):
            if value is not None:
                if callable(value):
                    try:
                        sig = inspect.signature(value)
                        print(f"<function {value.__name__}{sig}>", file=res)
                    except (ValueError, TypeError):
                        print(repr(value), file=res)
                else:
                    print(repr(value), file=res)
    
        old_displayhook = sys.displayhook
        sys.displayhook = custom_displayhook
    
        try:
            with redirect_stdout(out), redirect_stderr(err):
                more = ii.runsource(source, symbol=symbol)
                if more:
                    res.write("[incomplete input]\n")
        finally:
            sys.displayhook = old_displayhook
    
        output = out.getvalue() + err.getvalue() + res.getvalue()
        return ANSI_RE.sub('', output)
    
    def chunk_encode(code, size=128):
        e = base64.b64encode(code.encode("utf-8")).decode("ascii")
        chunks = [
            e[i : i + size]
            for i in range(0, len(e), size)
        ]
        return chunks
    
    def chunk_decode(chunks):
        b64_data = "".join(chunks)
        decoded = base64.b64decode(b64_data.encode("utf-8"))
        return decoded.decode("utf-8")
    
    def read_chunks(num):
        chunks = []
        for i in range(num):
            line = sys.stdin.readline()
            chunks.append(line)
        return chunks
    
    def write_chunks(chunks):
        sys.stdout.write(f"chunks {len(chunks)}" + "\n")
        for c in chunks:
            sys.stdout.write(c + "\n")
        sys.stdout.flush()
    
    while True:
        line = sys.stdin.readline()
        keyword, value = line.split()
        symbol = value
    
        line = sys.stdin.readline()
        keyword, value = line.split()
        num_chunks = value
    
        chunks = read_chunks(int(value))
        decoded = chunks_decode(chunks)
    
        output = []
        if symbol == "single"
            output = run_sources_captured(ii, decoded, symbol)
        else:
            output = run_sources_captured(ii, decoded, 'exec')
    
        chunks = chunk_encode(output)

    Interactive Use

    In [1]: import evaluator
    
    In [2]: e = evaluator.Evaluator()
    
    In [3]: e.runsource_single("print(\"hello world\")")
    Out[3]: 'hello world\n'
  • Defining Tools with a Toolkit

    In the previous chapter, I have shown a Python program to manually exchange messages with an LLM API. I introduced a data structure that contains inputs and outputs to interact with the LLM.

    In this chapter, I introduce the Toolkit component. The Toolkit contains definitions for function tools for the LLM API. But it does not explicitly perform a tool call (that will be done in later chapters).

    Here are my two goals for this chapter. The Toolkit, that I implement, must:

    • define a Python REPL tool call for an LLM API,
    • export all tools as a list that is ready to be sent to an LLM API.

    Implementing the Toolkit

    I will step back for a moment and consider how do LLMs use tools. It is useful to keep that in mind while implementing the toolkit.

    How do LLMs Use Tools?

    Recall that the LLM API accepts a tools field in the JSON input data. When you provide the API tools, the server constructs a special system prompt. The prompt is designed to instruct the model to use the specified tool(s).

    For example, here is the Claude API example prompt that is constructed for tool use.

    In this environment you have access to a set of tools you can use to answer the
    user's question.
    {{ FORMATTING INSTRUCTIONS }}
    String and scalar parameters should be specified as is, while lists and objects
    should use JSON format. Note that spaces for string values are not stripped.
    The output is not expected to be valid XML and is parsed with regular
    expressions.
    Here are the functions available in JSONSchema format:
    {{ TOOL DEFINITIONS IN JSON SCHEMA }}
    {{ USER SYSTEM PROMPT }}
    {{ TOOL CONFIGURATION }}

    Setup

    mkdir infer_tk && cd infer_tk
    python3 -m venv venv
    source venv/bin/activate
    pip3 install ipython
    ipython

    Starting with the Class

    The basic Toolkit I will implement will not be a function, but a class. This is because it has state. (Though, functions can have state, but that is not on the agenda here.)

    To store state, the Toolkit class keeps a variable named table.

    In [1]: class Toolkit:
       ...:     def __init__(self):
       ...:         self.table = {}
       ...: 
    
    In [2]: tk = Toolkit()
    
    In [3]: tk.table
    Out[3]: {}
    

    Variable table is a Python dictionary. Inside of it, will be tool definitions for the LLM API. But, first, I have to recall the schemas for those definitions.

    API Tools Input Schema

    Recall the tools schema for the OpenAI API.

    {
        ...
        "tools": [ properties ...]
        ...
    }

    The properties schema contains a definition of one function tool.

    {
        "name": string,
        "type": "function",
        "description": string,
        "parameters": parameters
    }

    The parameters schema contains the definitions for all arguments.

    {
        "type": "object",
        "properties": {
            arg: {
                "type": string,
                "description": string
            }, ...
        },
        "required": [ strings ... ]
    }

    The objects placed in the toolkit table shall follow the schemas for:

    • properties, and
    • parameters.

    Defining Tools

    I wish for a method to add new tools to the Toolkit.

    In [5]: class Toolkit:
       ...:     def __init__(self):
       ...:         self.table = {}
       ...:     def deftool(self, name, description, parameters):
       ...:         if name in self.table:
       ...:             raise ValueError(f"Tool '{name}' already defined.")
       ...:         self.table[name] = {
       ...:             "name": name,
       ...:             "type": "function",
       ...:             "description": description,
       ...:             "parameters": parameters
       ...:         }
       ...: 
    In [6]: tk = Toolkit()
    In [7]: tk.deftool("name0", "desc0", {})
    In [8]: tk.table
    Out[8]: 
    {'name0': {'name': 'name0',
      'type': 'function',
      'description': 'desc0',
      'parameters': {}}}

    That is weird. Maybe I need to define a variable and set it’s value into the dictionary? Also it seems better to return the result.

    In [11]: class Toolkit:
        ...:     def __init__(self):
        ...:         self.table = {}
        ...:     def deftool(self, name, description, parameters):
        ...:         if name in self.table:
        ...:             raise ValueError(f"Tool '{name}' already defined.")
        ...:         r = {
        ...:             "name": name,
        ...:             "type": "function",
        ...:             "description": description,
        ...:             "parameters": parameters
        ...:         }
        ...:         self.table[name] = r
        ...:         return r
        ...: 
    In [12]: tk = Toolkit()
    In [13]: tk.deftool("name0", "desc0", {})
    Out[13]: 
    {'name': 'name0', 
     'type': 'function',  
     'description': 'desc0', 
     'parameters': {}}

    That worked.

    Defining an Evaluator Tool

    The requirement is an evaluator tool. A Python REPL for the LLM.

    The function tool that I define here (named py_repl_runsource) is not the function tool used in the final implementation of Toolkit. The reason is that there are two function tools defined and printing and typing all of that is bothersome. I only define py_repl_runsource to illustrate how it is done.

    In [21]: def add_py_repl_tools(toolkit):
        ...:     p = {
        ...:         "type": "object",
        ...:         "properties": {
        ...:             "code": {
        ...:                 "type": "string",
        ...:                 "description": "Python code to execute."
        ...:             }
        ...:         },
        ...:         "required": ["code"]
        ...:     }
        ...:     d = (
        ...:         "Execute Python code script in a persistent environment. "
        ...:         "You must explicitly print evaluation results. "
        ...:         "Returns the stdout and stderr output as one string. "
        ...:     )
        ...:     return toolkit.deftool("py_runsource_exec", d, p)
        ...: 
    In [22]: tk = Toolkit()
    In [23]: add_py_repl_tools(tk)
    Out[23]: 
    {'name': 'py_runsource_exec',
     'type': 'function',
     'description': 'Execute Python code script in a persistent enviornment. 
    You must explicitly print evaluation results. Returns the stdout and stderr
    output as one string.',
     'parameters': {'type': 'object',
      'properties': {'code': {'type': 'string',
        'description': 'Python code to execute.'}},
      'required': ['code']}}

    Converting to Tools

    The second requirement is that the tools is a list.

    This means that the tools dictionary, inside the Toolkit, must be converted to a list.

    In [24]: class Toolkit:
        ...:     def __init__(self):
        ...:         self.table = {}
        ...:     def deftool(self, name, description, parameters):
        ...:         if name in self.table:
        ...:             raise ValueError(f"Tool '{name}' already defined.")
        ...:         r = {
        ...:             "name": name,
        ...:             "type": "function",
        ...:             "description": description,
        ...:             "parameters": parameters
        ...:         }
        ...:         self.table[name] = r
        ...:         return r
        ...:     def tools(self):
        ...:         return [x for x in self.table.values()]
        ...:     def match(self, name):
        ...:         if name in self.table:
        ...:             return name
        ...:         else:
        ...:             return False
        ...: 
    In [25]: tk = Toolkit()
    In [26]: add_py_repl_tools(tk)
    Out[26]: 
    {'name': 'py_runsource_exec',
     'type': 'function',
     'description': 'Execute Python code script in a persistent enviornment. 
    You must explicitly print evaluation results. Returns the stdout and stderr
    output as one string.',
     'parameters': {'type': 'object',
      'properties': {'code': {'type': 'string',
        'description': 'Python code to execute.'}},
      'required': ['code']}}
    In [27]: tk.tools()
    Out[27]: 
    [{'name': 'py_runsource_exec',
      'type': 'function',
      'description': 'Execute Python code script in a persistent enviornment. 
     You must explicitly print evaluation results. Returns the stdout and stderr
     output as one string.',
      'parameters': {'type': 'object',
       'properties': {'code': {'type': 'string',
         'description': 'Python code to execute.'}},
       'required': ['code']}}]
    In [28]: tk.match("py_repl_runsource")
    Out[28]: 'py_repl_runsource'
    In [29]: tk.match("must be False")
    Out[29]: False

    Toolkit Code

    class Toolkit:
        def __init__(self):
            self.table = {}
    
        def deftool(self, name, description, parameters):
            if name in self.table:
                raise ValueError(f"Tool '{name}' already defined.")
            r = {
                "name": name,
                "type": "function",
                "description": description,
                "parameters": parameters
            }
            self.table[name] = r
            return r
    
        def tools(self):
            return [x for x in self.table.values()]
    
        def match(self, name):
            if name in self.table:
                return name
            else:
                return False
    
    def add_py_repl_tools(toolkit):
        p1 = {
            "type": "object",
            "properties": {
                "code": {
                    "type": "string",
                    "description": "Python code to execute."
                }
            },
            "required": ["code"]
        }
        d1 = (
            "Execute Python code script in a persistent environment. "
            "You must explicitly print evaluation results. "
            "Returns the stdout and stderr output as one string. "
        )
        o = [
            toolkit.deftool("py_runsource_exec", d1, p1)
        ]
        return o
  • Manual Chat Program with a Circular Context

    In the previous chapter, I have shown how to exchange data with an LLM API. I used CURL as the HTTP client.

    In this chapter, I replace CURL with a Python program. That does not mean that everything is automated. No, I still manually manage the messages, but I introduce a data structure that contains inputs and outputs to interact with the LLM. I name the data structure a circular context, and base it on a circular buffer.

    This chapter has three sections:

    • Limiting the context – a circular buffer limits the number of messages,
    • Python implementation – the program code in Python,
    • Interactive use – an iPython session to show usage.

    Limiting the Context

    Problem Definition

    Loosely speaking, an LLM takes as input a text sequence and returns as output another text sequence that completes the prior. These, LLMs, are trained to do so, by modelling the probability distribution of a text given some prior text. Or in other words, an LLM can be thought of as a predict function, that takes text of size N and returns text of size N + k, such that N + k is less than the LLM context size M, which is defined by the API model.

    LLM Context size M
    Index 0 1 2 3 4 5 6 7 8 9 ... N ... k < M
    Value ? ? ? ? ? ? ? ? ? ? ... ? ... ?

    Circular Buffer Definition

    To limit the number of messages, and thus to never reach the LLM context size, I use a circular buffer data structure. For simplicity, I do not count the number of tokens.

    A circular buffer (CB), limited to k items, is either:

    • the empty CB (of size n = 0), or
    • a CB of size n < k, formed by adding a new item to the front of a CB of size n - 1 < k.
    • a CB of size n = k, formed by adding a new item to the front of a CB of size n - 1, which is formed by removing an item from the back of a CB of size n = k.

    How do you determine k? Randomly. I have not thought of a heuristic. So, I randomly picked 19, which is the 8th prime number, as the default value.

    Python Implementation

    Code Overview

    There are four concepts I use in the code implementation:

    • circular buffer,
    • circular context,
    • context, and
    • predict.

    Circular buffer is the data structure defined in the previous section. It stores LLM specific input and output objects. It is the essential part of the circular context.

    Circular context is a data structure that hides the circular buffer. It defines methods to push new LLM specific objects, a clear() to remove all objects, and a to_list() method.

    Context is a list data structure. The only difference is, it stores specific LLM API objects. These objects are the very same objects I have shown in the previous chapter to interact with the LLM. Namely: EasyInputMessage and ResponseOutputText.

    CircularBuffer <--> CircularContext <--> Context <--> LLM API
        Class                Class            List         JSON

    Lastly, predict is the main function that takes a context as input and returns the output of the LLM. It does not (to this end) return a new context.

    Setup

    mkdir llm_api_prog && cd llm_api_prog
    python3 -m venv venv
    source venv/bin/activate
    pip install requests ipython

    I will write all code into a single file circularcontext.py.

    Dependencies

    Because the LLM API uses JSON and HTTP, you need:

    • a JSON package,
    • a HTTP package to send requests and receive responses.
    import json
    import requests
    import os
    
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

    Empty Request

    Recall that the LLM API expects a JSON data object with fields: "model", "input", and "tools".

    def openai_prepare(model, context, tools):
        return {
            "model": model,
            "input": context,
            "tools": tools
        }

    Sending Requests

    def openai_request(model="gpt-4.1", context=[], tools=[]):
        url = "https://api.openai.com/v1/responses"
        headers = { 
            "Authorization": f"Bearer {OPENAI_API_KEY}",
            "Content-Type": "application/json"
        }
        data = openai_prepare(model, context, tools)
        return requests.post(url, headers=headers, json=data)

    Receiving Responses

    def openai_response(response):
        response.raise_for_status()
        data = response.json()
        return data['output']

    Note that better error handling is needed.

    Predict

    def predict(context=[], tools=[]):
        r = openai_response(openai_request(context=context, tools=tools))
        return r

    Circular Buffer

    The implementation of a circular buffer written by an LLM.

    class CircularBuffer:
        def __init__(self, capacity):
            if capacity <= 0:
                raise ValueError("Capacity must be positive")
    
            self.capacity = capacity
            self.buffer = [None] * capacity
            self.head = 0  # points to oldest element
            self.tail = 0  # points to next write position
            self.size = 0
    
        def enqueue(self, item):
            """Add an element to the buffer."""
            self.buffer[self.tail] = item
    
            if self.size == self.capacity:
                # Buffer full → overwrite oldest
                self.head = (self.head + 1) % self.capacity
            else:
                self.size += 1
    
            self.tail = (self.tail + 1) % self.capacity
    
        def dequeue(self):
            """Remove and return the oldest element."""
            if self.size == 0:
                raise IndexError("Dequeue from empty buffer")
    
            item = self.buffer[self.head]
            self.buffer[self.head] = None  # Optional cleanup
            self.head = (self.head + 1) % self.capacity
            self.size -= 1
    
            return item
    
        def peek(self):
            """Return the oldest element without removing it."""
            if self.size == 0:
                raise IndexError("Peek from empty buffer")
            return self.buffer[self.head]
    
        def to_list(self):
            """Return elements as a standard Python list (FIFO order)."""
            result = []
            index = self.head
            for _ in range(self.size):
                result.append(self.buffer[index])
                index = (index + 1) % self.capacity
            return result
    
        def is_empty(self):
            return self.size == 0
    
        def is_full(self):
            return self.size == self.capacity
    
        def __len__(self):
            return self.size
    
        def __repr__(self):
            return f"CircularBuffer({self.to_list()})"
    
        def shallow_clone(self):
            """Return a shallow copy of the circular buffer."""
            cb = CircularBuffer(self.capacity)
            cb.buffer = self.buffer.copy()
            cb.head = self.head
            cb.tail = self.tail
            cb.size = self.size
            return cb

    Circular Context

    A context is a data structure that contains objects which are elements of the input array for the LLM API.

    class CircularContext:
        def __init__(self, capacity=19):
            if capacity <= 0:
                raise ValueError("Capacity must be positive")
    
            self.capacity = capacity
            self.cb = CircularBuffer(self.capacity)
    
        def push_easy_input_message(self, content="", role="user"):
            self.cb.enqueue({"content": content, "role": role, "type": "message"})
    
        def push_function_call_output(self, call_id="", output=""):
            self.cb.enqueue({
                "call_id": call_id,
                "output": output,
                "type": "function_call_output"
                })
    
        def push_custom(self, object):
            self.cb.enqueue(object)
    
        def clear(self):
            self.cb = CircularBuffer(self.capacity)
    
        def to_list(self):
            return self.cb.to_list()

    Usage

    Getting Started

    Make sure that:

    • the terminal is in the proper directory,
    • the Python virtual environment is activated,
    • the proper code is in the circularcontext.py file.

    Start an iPython session.

    export OPENAI_API_KEY="your api key..."
    ipython

    Load the code.

    In [1]: load "circularcontext.py"

    Sanity check the OpenAI API key.

    In [3]: OPENAI_API_KEY
    Out[3]: 'your api key...'

    Sanity check an empty request.

    In [4]: openai_prepare("gpt-4.1", [], [])
    Out[4]: {'model': 'gpt-4.1', 'input': [], 'tools': []}

    Sending an Easy Input Message

    In [5]: cc = CircularContext()
    In [6]: cc.push_easy_input_message("Hi!")

    Sanity check a message.

    In [7]: cc.to_list()
    Out[7]: [{'content': 'Hi!', 'role': 'user', 'type': 'message'}]
    In [8]: r = predict(context=cc.to_list())
    In [9]: r
    Out[9]:
    [{'type': 'output_text',
      'annotations': [],
      'logprobs': [],
      text': 'Hello! How can I help you today? 😊'}]

    Note that the output result is an array.

    Merging Context

    In [10]: for x in r:
                 cc.push_custom(x)
    
    In [11]: cc.push_easy_input_message("Say hi again.")
    

    Sanity check.

    In [12]: cc.to_list()
    Out[12]: 
    [{'content': 'Hi!', 'role': 'user', 'type': 'message'},
     {'id': (omitted),
      'type': 'message',
      'status': 'completed',
      'content': [{'type': 'output_text',
        'annotations': [],
        'logprobs': [],
        'text': 'Hello! How can I help you today? 😊'}],
      'role': 'assistant'},
     {'content': 'Say hi again.', 'role': 'user', 'type': 'message'}]
    In [13]: r = predict(context=cc.to_list())
    
    In [14]: r
    Out[14]: 
    [{'id': (omitted),
      'type': 'message',
      'status': 'completed',
      'content': [{'type': 'output_text',
        'annotations': [],
        'logprobs': [],
        'text': 'Hi again! 👋'}],
      'role': 'assistant'}]
    
  • The Bare Minimum to Chat with Function Calls

    This is a tutorial on using the OpenAI LLM API, focusing on: messages and function calls. But, without Python, TypeScript, or some other programming language. The only requirements are CURL (an HTTP client) and an OpenAI API key.

    Why Bother?

    “Why waste my time, when I can just import an API package?”

    Sure, that works, until you go deeper. What if…

    • you do not have access to / permission for / trust in the API package?
    • you want to avoid software bloat?
    • you want to understand what is happening?
    • you want to make your own AI Agents?

    Before, there was only /chat/completions/. Now, there are /responses, function calls, tool calls, computer calls, image calls, search calls, skills, etc.

    I will show the bare minimum to interact with an LLM API:

    • Prompt completions and context, and
    • Function calls.

    Prompt Completions and Context

    In this section, I show messaging an LLM. The provider is OpenAI at:

    https://api.openai.com/v1/responses
    HTTP Method: POST

    Endpoint /responses accepts application/json data. Using CURL, create a POST request with JSON data.

    curl https://api.openai.com/v1/responses \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $OPENAI_API_KEY" \
      -d '{ ... json data goes here ... }'

    I wish to clarify two things. Suppose I want to exchange messages with an LLM…

    • What JSON data do I need?
    • May I see an example for exchanging messages?

    What JSON data do I need?

    Input

    Start with an empty JSON object.

    {
      request data go here ...
    }

    Select the model:

    {
      "model": "gpt-4.1"
    }

    Set "input" field value to an array [ ... ]. (Do not forget the comma.)

    {
      "model": "gpt-4.1",
      "input": [
         array items go here ...
      ]
    }

    The array items will be explained now.

    Input Items

    The API defines many objects you can put in the “input array” [ ... ]. Far too many, to list all. Instead, I show only four. Two object types may be created by your client program:

    • EasyInputMessage, and
    • FunctionCallOutput.

    Two object types may be created by the server:

    • ResponseOutputMessage, and
    • ResponseFunctionToolCall.

    In this section, I show EasyInputMessage and ResponseOutputMessage types. These are enough for prompts with context. In the Function Call section, I will show FunctionCallOutput and ResponseFunctionToolCall types.

    Easy Input Message (Client)

    Your client program sends prompts to the LLM inside an EasyInputMessage. The prompt text goes in the "content" field.

    EasyInputMessage schema:
    { 
      "content": string (this is where your prompt goes),
      "role": "user" | "assistant" | "system" | "developer",
      "type": "message"
    }
    
    Example: 
    {
      "content": "This is a prompt sent to the LLM.",
      "role": "user",
      "type": "message"
    }

    ResponseOutputMessage (Server)

    The LLM answers with a ResponseOutputMessage object type. It is more complex, when compared to EasyInputMessage. The reason is that it’s "content" field value is more complex. The value is an array that may contain two possible object types. The array tems are either a ResponseOutputRefusal type (the LLM refused to answer) or ResponseOutputText type (the LLM answered). I will first show the schema of these object types, and second, the schema for ResponseOutputMessage.

    ResponseOutputRefusal schema:
    { 
      "refusal": string, 
      "type": "refusal"
    }
    ResponseOutputText schema:
    { 
      "annotations": [ FileCitation | URLCitation | 
                       ContainerFileCitation | FilePath ],
      "logprobs": [ logprobs object ],
      "text": string,
      "type": "output_text"
    }

    The ResponseOutputText schema is non-trivial. The values of the “annotations” and “logprobs” fields are complex. It is best to simply ignore them unless needed.

    With that in mind, here is the schema for ResponseOutputMessage.

    ResponseOutputMessage schema:
    {
      "id": string,
      "content": [ ResponseOutputText | ResponseOutputRefusal ],
      "role": "assistant",
      "status": "in_progress" | "completed" | "incomplete",
      "type": "message" 
    }
    

    To show an example ResponseOutputMessage, I will make an API request and show the response.

    May I see an example for echanging messages?

    Sending A Single Message

    The prompt is: “Hi!”.

    curl https://api.openai.com/v1/responses \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $OPENAI_API_KEY" \
      -d '{
            "model": "gpt-4.1",
            "input": 
            [
              {
                "content": "Hi!", "role": "user", "type": "message"
              }
            ]
          }'

    This is the value of the “output” part of the response.

    ...
      "output": [
        {
          "id": (ommited),
          "type": "message",
          "status": "completed",
          "content": [
            {
              "type": "output_text",
              "annotations": [],
              "logprobs": [],
              "text": "Hello! How can I help you today?"
            }
          ],
          "role": "assistant"
        }
      ],
    ...

    In this simple case, the output is an array of one item that is an object of type ResponseOutputMessage. And that object itself has a “content” field for which the value is an array of one item that is an object of type ResponseOutputText.

    Creating Context

    To continue the LLM conversation, you need to merge the (client) prompt and the output response (server). This is known as a context.

    • Copy the ResponseOutputMessage object from the “output array”.
    • Append a new EasyInputMessage object as the next prompt.

    Make sure to add commas between the items of the “input array” when doing it manually.

    curl https://api.openai.com/v1/responses \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $OPENAI_API_KEY" \
      -d '{
            "model": "gpt-4.1",
            "input": 
            [
              {
                "content": "Hi!", "role": "user", "type": "message"
              },
              {
                "id": (ommited),
                "type": "message",
                "status": "completed",
                "content": [
                {
                  "type": "output_text", "annotations": [], "logprobs": [],
                  "text": "Hello! How can I help you today?"
                }
                ],
                "role": "assistant"
              },
              {
                "content": "Say hi again.", "role": "user", "type": "message"
              }
            ]
          }'

    Here is the output.

    ...
      "output": [
        {
          "id": (ommitted),
          "type": "message",
          "status": "completed",
          "content": [
            {
              "type": "output_text",
              "annotations": [],
              "logprobs": [],
              "text": "Hi again!"
            }
          ],
          "role": "assistant"
        }
      ],
    ...

    Section Summary

    • The LLM API accepts JSON data as an input and writes JSON data as an output.
    • To exchange messages, a model and an input array must be set.
    • The elements of the input array are JSON objects that follow the EasyInputMessage schema or ResponseOutputMessage schema.

    Function Calls

    Section Overview

    In this section, I show how to exchange messages that are function call ready with an LLM. I will use LLMs by OpenAI, which are available at:

    https://api.openai.com/v1/responses
    HTTP Method: POST

    Endpoint /responses accepts application/json data. Using CURL, create a POST request with JSON data.

    curl https://api.openai.com/v1/responses \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $OPENAI_API_KEY" \
      -d '{ ... json data goes here ... }'

    In the previous section, I have shown that two fields are required: “input” and “output”. The values of these fields are arrays. So far, the only object types in these arrays were EasyInputMessage and ResponseOutputMessage. That will change now.

    Request
    { 
      "model": "gpt-4.1",
      "input": [ ... ]
    }
    
    Response
    { ...
      "output": [ ... ]
      ...
    }

    Two new objects I will show now are:

    • FunctionCallOutput, and
    • ResponseFunctionToolCall.

    I wish to clarify two things. Suppose I want to exchange messages with an LLM and allow it to use some function calls with my client program…

    • What JSON data do I need?
    • May I see an example for exchanging messages with function calls?

    What JSON data do I need?

    To exchange messages that are function call ready, set the “tools” field in the request.

    { 
      "model": "gpt-4.1",
      "input": [ ... ],
      "tools": [ ... ]
    }

    Tool Items

    Each item, in the “tools array”, is an object { ... }. The API supports several different object types. I will only show:

    • FunctionTool.

    A FunctionTool object has a name, a description, and parameters that the client program must set. Name names the function. Description describes the function. Parameters describes the function arguments.

    FunctionTool schema:
    {
      "type": "function",
      "name": string,
      "description": string,
      "parameters": object
    }

    Parameters are described in the “properties” field. Each parameter is yet another object.

    FunctionTool Parameters schema:
    {
      "type": "object",
      "properties": object,
      "required": [ strings ]
    }

    The value of the “required” field is an array that contains strings naming parameters that are required.

    FunctionTool Parameters Properties schema:
    {
      argument_name: 
      { 
        "type": argument_type,
        "description": argument_desc
      },
      ...
    }

    The key argument_name is a string that names the function argument. The object argument_type is a string that names the function argument type. The object argument_desc is a string that describes the function argument.

    Request With Tools

    Now that you have seen the structure of a FunctionTool, here is what an example request that is function call ready looks like:

    {
      "model": "gpt-4.1",
      "input": 
      [
        {
          "content": "Which natural number comes after 1678931?",
          "role": "user", "type": "message"
        }
      ],
      "tools":
      [
        {
          "name": "next_natural",
          "type": "function",
          "description": "next_natural takes as input a natural number.
    Returns a the first natural number that is greater than the argument.",
          "parameters": {
            "type": "object",
            "properties": {
              "number" : {
                "type": "number",
                "description": "The input natural number."
              }
            },
            "required": ["number"]
          }
        }
      ]
    }

    The request includes the “tools” field, for which the value is an array with exactly one FunctionTool object. When the request defines a FunctionTool, two things can happen:

    • the FunctionTool may be ignored, or
    • a response to use the FunctionTool may be created.

    Your client program must support both scenarios. It may check the type of the output. If the type is a ResponseOutputMessage, the FunctionTool was ignored. If the type is a ResponseFunctionToolCall, the client must perform the function call.

    In other words, the server returns a response with the “output” field value to be an array whose element is either a:

    • ResponseOutputMessage, or
    • ResponseFunctionToolCall.
    Scenario A:
    
    client --> request: EasyInputMessage and tools                 --> server
    client <--                               ResponseOutputMessage <-- server
    
    Scenario B:
    
    client --> request: EasyInputMessage and tools                 --> server
    client <--                            ResponseFunctionToolCall <-- server
    client --> request: ResponseFunctionToolCallOutput and tools   --> server
    client <--    ReponseOutputMessage or ResponseFunctionToolCall <-- server

    ResponseFunctionToolCall (Server)

    Note that the server creates this object.

    ResponseFunctionToolCall schema:
    { 
      "arguments": string,
      "call_id": string,
      "name": string,
      "type": "function_call",
      "id": string,
      "status": "in_progress" | "completed" | "incomplete"
    }

    FunctionCallOutput (Client)

    Note that the client creates this object. When creating this object, the value of the "call_id" field is copied from the matching ResponseFunctionToolCall object.

    Schema:
    { 
      "call_id": string,
      "output": string | (there is more but I ignore that),
      "type": "function_call_output",
      "id": string (mostly ignore this),
      "status": "in_progress" | "completed" | "incomplete"
    }
    
    Example:
    { 
      "call_id": "call_random123", (generated by server)
      "output": "fizzbuzz",
      "type": "function_call_output",
      "id": "123456"
      "status": "completed"
    }

    May I see an example for exchanging messages with function calls?

    Example FunctionToolCall Request

    curl https://api.openai.com/v1/responses \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $OPENAI_API_KEY" \
      -d '{
            "model": "gpt-4.1",
            "input": 
            [
              {
                "content": "Which natural number comes after 1678931?",
                "role": "user", "type": "message"
              }
            ],
            "tools": 
            [
              {
                "name": "next_natural",
                "type": "function",
                "description": "next_natural takes as input a natural number.
    Returns a the first natural number that is greater than the argument.",
                "parameters": {
                  "type": "object",
                  "properties": {
                    "number" : {
                      "type": "number",
                      "description": "The input natural number."
                    }
                  },
                  "required": ["number"]
                }
              }
            ]
          }'

    Example FunctionToolCall Response

    ...
      "output": [
        {
          "id": (omitted),
          "type": "function_call",
          "status": "completed",
          "arguments": "{\"number\":1678931}",
          "call_id": (omitted),
          "name": "next_natural"
        }
      ],
    ...

    Example FunctionToolCallOutput Request

    curl https://api.openai.com/v1/responses \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $OPENAI_API_KEY" \
      -d '{
            "model": "gpt-4.1",
            "input":
            [
              {
                "content": "Which natural number comes after 1678931?",
                "role": "user", "type": "message"
              },
              {
                "id": (omitted),
                "type": "function_call",
                "status": "completed",
                "arguments": "{\"number\":1678931}",
                "call_id": "call_(same call id)",
                "name": "next_natural"
              },
              {
                "call_id": "call_(same call id)",
                "output": "1678932",
                "type": "function_call_output"
              }
            ],
            "tools":
            [
              {
                "name": "next_natural",
                "type": "function",
                "description": "next_natural takes as input a natural number. 
    Returns the first natural number that is greater than the argument.",
                "parameters":
                {
                  "type": "object",
                  "properties":
                  {
                    "number":
                    {
                      "type": "number",
                      "description": "The input natural number."
                    }
                  },
                  "required": ["number"]
                }
              }
            ]
          }'

    Example FunctionToolCallOutput Response

    ...
    "output": [
        {
          "id": (omitted),
          "type": "message",
          "status": "completed",
          "content": [
            {
              "type": "output_text",
              "annotations": [],
              "logprobs": [],
              "text": "The natural number that comes after 1,678,931 is 1,678,932."
            }
          ],
          "role": "assistant"
        }
      ],
    ...

    Section Summary

    • The LLM API accepts JSON data as an input and writes JSON data as an output.
    • To exchange function call ready messages, a model, an input array, and a tools array must be set.
    • The elements of the input array are JSON objects that follow the EasyInputMessage, ResponseOutputMssage, ResponseFunctionToolCall, or ResponseFunctionToolCallOutput Schema.