Minimal Chat Program with a Circular Context

In the previous chapter, I have shown how to exchange data with an LLM API. I used CURL as the HTTP client.

In this chapter, I replace CURL with a Python program. That does not mean that everything is automated. No, I still manually manage the messages, but I introduce a data structure that contains inputs and outputs to interact with the LLM. I name the data structure a circular context, and base it on a circular buffer.

This chapter has three sections:

  • Limiting the context – a circular buffer limits the number of messages,
  • Python implementation – the program code in Python,
  • Interactive use – an iPython session to show usage.

Limiting the Context

Problem Definition

Loosely speaking, an LLM takes as input a text sequence and returns as output another text sequence that completes the prior. These, LLMs, are trained to do so, by modelling the probability distribution of a text given some prior text. Or in other words, an LLM can be thought of as a predict function, that takes text of size N and returns text of size N + k, such that N + k is less than the LLM context size M, which is defined by the API model.

LLM Context size M
Index 0 1 2 3 4 5 6 7 8 9 ... N ... k < M
Value ? ? ? ? ? ? ? ? ? ? ... ? ... ?

Circular Buffer Definition

To limit the number of messages, and thus to never reach the LLM context size, I use a circular buffer data structure. For simplicity, I do not count the number of tokens.

A circular buffer (CB), limited to k items, is either:

  • the empty CB (of size n = 0), or
  • a CB of size n < k, formed by adding a new item to the front of a CB of size n - 1 < k.
  • a CB of size n = k, formed by adding a new item to the front of a CB of size n - 1, which is formed by removing an item from the back of a CB of size n = k.

How do you determine k? Randomly. I have not thought of a heuristic. So, I randomly picked 19, which is the 8th prime number, as the default value.

Python Implementation

Code Overview

There are four concepts I use in the code implementation:

  • circular buffer,
  • circular context,
  • context, and
  • predict.

Circular buffer is the data structure defined in the previous section. It stores LLM specific input and output objects. It is the essential part of the circular context.

Circular context is a data structure that hides the circular buffer. It defines methods to push new LLM specific objects, a clear() to remove all objects, and a to_list() method.

Context is a list data structure. The only difference is, it stores specific LLM API objects. These objects are the very same objects I have shown in the previous chapter to interact with the LLM. Namely: EasyInputMessage and ResponseOutputText.

CircularBuffer <--> CircularContext <--> Context <--> LLM API
    Class                Class            List         JSON

Lastly, predict is the main function that takes a context as input and returns the output of the LLM. It does not (to this end) return a new context.

Setup

mkdir llm_api_prog && cd llm_api_prog
python3 -m venv venv
source venv/bin/activate
pip install requests ipython

I will write all code into a single file circularcontext.py.

Dependencies

Because the LLM API uses JSON and HTTP, you need:

  • a JSON package,
  • a HTTP package to send requests and receive responses.
import json
import requests
import os

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

Empty Request

Recall that the LLM API expects a JSON data object with fields: "model", "input", and "tools".

def openai_prepare(model, context, tools):
    return {
        "model": model,
        "input": context,
        "tools": tools
    }

Sending Requests

def openai_request(model="gpt-4.1", context=[], tools=[]):
    url = "https://api.openai.com/v1/responses"
    headers = { 
        "Authorization": f"Bearer {OPENAI_API_KEY}",
        "Content-Type": "application/json"
    }
    data = openai_prepare(model, context, tools)
    return requests.post(url, headers=headers, json=data)

Receiving Responses

def openai_response(response):
    response.raise_for_status()
    data = response.json()
    return data['output']

Note that better error handling is needed.

Predict

def predict(context=[], tools=[]):
    r = openai_response(openai_request(context=context, tools=tools))
    return r

Circular Buffer

The implementation of a circular buffer written by an LLM.

class CircularBuffer:
    def __init__(self, capacity):
        if capacity <= 0:
            raise ValueError("Capacity must be positive")

        self.capacity = capacity
        self.buffer = [None] * capacity
        self.head = 0  # points to oldest element
        self.tail = 0  # points to next write position
        self.size = 0

    def enqueue(self, item):
        """Add an element to the buffer."""
        self.buffer[self.tail] = item

        if self.size == self.capacity:
            # Buffer full โ†’ overwrite oldest
            self.head = (self.head + 1) % self.capacity
        else:
            self.size += 1

        self.tail = (self.tail + 1) % self.capacity

    def dequeue(self):
        """Remove and return the oldest element."""
        if self.size == 0:
            raise IndexError("Dequeue from empty buffer")

        item = self.buffer[self.head]
        self.buffer[self.head] = None  # Optional cleanup
        self.head = (self.head + 1) % self.capacity
        self.size -= 1

        return item

    def peek(self):
        """Return the oldest element without removing it."""
        if self.size == 0:
            raise IndexError("Peek from empty buffer")
        return self.buffer[self.head]

    def to_list(self):
        """Return elements as a standard Python list (FIFO order)."""
        result = []
        index = self.head
        for _ in range(self.size):
            result.append(self.buffer[index])
            index = (index + 1) % self.capacity
        return result

    def is_empty(self):
        return self.size == 0

    def is_full(self):
        return self.size == self.capacity

    def __len__(self):
        return self.size

    def __repr__(self):
        return f"CircularBuffer({self.to_list()})"

    def shallow_clone(self):
        """Return a shallow copy of the circular buffer."""
        cb = CircularBuffer(self.capacity)
        cb.buffer = self.buffer.copy()
        cb.head = self.head
        cb.tail = self.tail
        cb.size = self.size
        return cb

Circular Context

A context is a data structure that contains objects which are elements of the input array for the LLM API.

class CircularContext:
    def __init__(self, capacity=19):
        if capacity <= 0:
            raise ValueError("Capacity must be positive")

        self.capacity = capacity
        self.cb = CircularBuffer(self.capacity)

    def push_easy_input_message(self, content="", role="user"):
        self.cb.enqueue({"content": content, "role": role, "type": "message"})

    def push_function_call_output(self, call_id="", output=""):
        self.cb.enqueue({
            "call_id": call_id,
            "output": output,
            "type": "function_call_output"
            })

    def push_custom(self, object):
        self.cb.enqueue(object)

    def clear(self):
        self.cb = CircularBuffer(self.capacity)

    def to_list(self):
        return self.cb.to_list()

Usage

Getting Started

Make sure that:

  • the terminal is in the proper directory,
  • the Python virtual environment is activated,
  • the proper code is in the circularcontext.py file.

Start an iPython session.

export OPENAI_API_KEY="your api key..."
ipython

Load the code.

In [1]: load "circularcontext.py"

Sanity check the OpenAI API key.

In [3]: OPENAI_API_KEY
Out[3]: 'your api key...'

Sanity check an empty request.

In [4]: openai_prepare("gpt-4.1", [], [])
Out[4]: {'model': 'gpt-4.1', 'input': [], 'tools': []}

Sending an Easy Input Message

In [5]: cc = CircularContext()
In [6]: cc.push_easy_input_message("Hi!")

Sanity check a message.

In [7]: cc.to_list()
Out[7]: [{'content': 'Hi!', 'role': 'user', 'type': 'message'}]
In [8]: r = predict(context=cc.to_list())
In [9]: r
Out[9]:
[{'type': 'output_text',
  'annotations': [],
  'logprobs': [],
  text': 'Hello! How can I help you today? ๐Ÿ˜Š'}]

Note that the output result is an array.

Merging Context

In [10]: for x in r:
             cc.push_custom(x)

In [11]: cc.push_easy_input_message("Say hi again.")

Sanity check.

In [12]: cc.to_list()
Out[12]: 
[{'content': 'Hi!', 'role': 'user', 'type': 'message'},
 {'id': (omitted),
  'type': 'message',
  'status': 'completed',
  'content': [{'type': 'output_text',
    'annotations': [],
    'logprobs': [],
    'text': 'Hello! How can I help you today? ๐Ÿ˜Š'}],
  'role': 'assistant'},
 {'content': 'Say hi again.', 'role': 'user', 'type': 'message'}]
In [13]: r = predict(context=cc.to_list())

In [14]: r
Out[14]: 
[{'id': (omitted),
  'type': 'message',
  'status': 'completed',
  'content': [{'type': 'output_text',
    'annotations': [],
    'logprobs': [],
    'text': 'Hi again! ๐Ÿ‘‹'}],
  'role': 'assistant'}]