In the previous chapter, I have shown how to exchange data with an LLM API. I used CURL as the HTTP client.
In this chapter, I replace CURL with a Python program. That does not mean that everything is automated. No, I still manually manage the messages, but I introduce a data structure that contains inputs and outputs to interact with the LLM. I name the data structure a circular context, and base it on a circular buffer.
This chapter has three sections:
- Limiting the context – a circular buffer limits the number of messages,
- Python implementation – the program code in Python,
- Interactive use – an iPython session to show usage.
Limiting the Context
Problem Definition
Loosely speaking, an LLM takes as input a text sequence and returns as output another text sequence that completes the prior. These, LLMs, are trained to do so, by modelling the probability distribution of a text given some prior text. Or in other words, an LLM can be thought of as a predict function, that takes text of size N and returns text of size N + k, such that N + k is less than the LLM context size M, which is defined by the API model.
LLM Context size M
Index 0 1 2 3 4 5 6 7 8 9 ... N ... k < M
Value ? ? ? ? ? ? ? ? ? ? ... ? ... ?
Circular Buffer Definition
To limit the number of messages, and thus to never reach the LLM context size, I use a circular buffer data structure. For simplicity, I do not count the number of tokens.
A circular buffer (CB), limited to k items, is either:
- the empty CB (of size n = 0), or
- a CB of size
n < k, formed by adding a new item to the front of a CB of sizen - 1 < k. - a CB of size
n = k, formed by adding a new item to the front of a CB of sizen - 1, which is formed by removing an item from the back of a CB of sizen = k.
How do you determine k? Randomly. I have not thought of a heuristic. So, I randomly picked 19, which is the 8th prime number, as the default value.
Python Implementation
Code Overview
There are four concepts I use in the code implementation:
- circular buffer,
- circular context,
- context, and
- predict.
Circular buffer is the data structure defined in the previous section. It stores LLM specific input and output objects. It is the essential part of the circular context.
Circular context is a data structure that hides the circular buffer. It defines methods to push new LLM specific objects, a clear() to remove all objects, and a to_list() method.
Context is a list data structure. The only difference is, it stores specific LLM API objects. These objects are the very same objects I have shown in the previous chapter to interact with the LLM. Namely: EasyInputMessage and ResponseOutputText.
CircularBuffer <--> CircularContext <--> Context <--> LLM API
Class Class List JSON
Lastly, predict is the main function that takes a context as input and returns the output of the LLM. It does not (to this end) return a new context.
Setup
mkdir llm_api_prog && cd llm_api_prog
python3 -m venv venv
source venv/bin/activate
pip install requests ipython
I will write all code into a single file circularcontext.py.
Dependencies
Because the LLM API uses JSON and HTTP, you need:
- a JSON package,
- a HTTP package to send requests and receive responses.
import json
import requests
import os
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
Empty Request
Recall that the LLM API expects a JSON data object with fields: "model", "input", and "tools".
def openai_prepare(model, context, tools):
return {
"model": model,
"input": context,
"tools": tools
}
Sending Requests
def openai_request(model="gpt-4.1", context=[], tools=[]):
url = "https://api.openai.com/v1/responses"
headers = {
"Authorization": f"Bearer {OPENAI_API_KEY}",
"Content-Type": "application/json"
}
data = openai_prepare(model, context, tools)
return requests.post(url, headers=headers, json=data)
Receiving Responses
def openai_response(response):
response.raise_for_status()
data = response.json()
return data['output']
Note that better error handling is needed.
Predict
def predict(context=[], tools=[]):
r = openai_response(openai_request(context=context, tools=tools))
return r
Circular Buffer
The implementation of a circular buffer written by an LLM.
class CircularBuffer:
def __init__(self, capacity):
if capacity <= 0:
raise ValueError("Capacity must be positive")
self.capacity = capacity
self.buffer = [None] * capacity
self.head = 0 # points to oldest element
self.tail = 0 # points to next write position
self.size = 0
def enqueue(self, item):
"""Add an element to the buffer."""
self.buffer[self.tail] = item
if self.size == self.capacity:
# Buffer full โ overwrite oldest
self.head = (self.head + 1) % self.capacity
else:
self.size += 1
self.tail = (self.tail + 1) % self.capacity
def dequeue(self):
"""Remove and return the oldest element."""
if self.size == 0:
raise IndexError("Dequeue from empty buffer")
item = self.buffer[self.head]
self.buffer[self.head] = None # Optional cleanup
self.head = (self.head + 1) % self.capacity
self.size -= 1
return item
def peek(self):
"""Return the oldest element without removing it."""
if self.size == 0:
raise IndexError("Peek from empty buffer")
return self.buffer[self.head]
def to_list(self):
"""Return elements as a standard Python list (FIFO order)."""
result = []
index = self.head
for _ in range(self.size):
result.append(self.buffer[index])
index = (index + 1) % self.capacity
return result
def is_empty(self):
return self.size == 0
def is_full(self):
return self.size == self.capacity
def __len__(self):
return self.size
def __repr__(self):
return f"CircularBuffer({self.to_list()})"
def shallow_clone(self):
"""Return a shallow copy of the circular buffer."""
cb = CircularBuffer(self.capacity)
cb.buffer = self.buffer.copy()
cb.head = self.head
cb.tail = self.tail
cb.size = self.size
return cb
Circular Context
A context is a data structure that contains objects which are elements of the input array for the LLM API.
class CircularContext:
def __init__(self, capacity=19):
if capacity <= 0:
raise ValueError("Capacity must be positive")
self.capacity = capacity
self.cb = CircularBuffer(self.capacity)
def push_easy_input_message(self, content="", role="user"):
self.cb.enqueue({"content": content, "role": role, "type": "message"})
def push_function_call_output(self, call_id="", output=""):
self.cb.enqueue({
"call_id": call_id,
"output": output,
"type": "function_call_output"
})
def push_custom(self, object):
self.cb.enqueue(object)
def clear(self):
self.cb = CircularBuffer(self.capacity)
def to_list(self):
return self.cb.to_list()
Usage
Getting Started
Make sure that:
- the terminal is in the proper directory,
- the Python virtual environment is activated,
- the proper code is in the
circularcontext.pyfile.
Start an iPython session.
export OPENAI_API_KEY="your api key..."
ipython
Load the code.
In [1]: load "circularcontext.py"
Sanity check the OpenAI API key.
In [3]: OPENAI_API_KEY
Out[3]: 'your api key...'
Sanity check an empty request.
In [4]: openai_prepare("gpt-4.1", [], [])
Out[4]: {'model': 'gpt-4.1', 'input': [], 'tools': []}
Sending an Easy Input Message
In [5]: cc = CircularContext()
In [6]: cc.push_easy_input_message("Hi!")
Sanity check a message.
In [7]: cc.to_list()
Out[7]: [{'content': 'Hi!', 'role': 'user', 'type': 'message'}]
In [8]: r = predict(context=cc.to_list())
In [9]: r
Out[9]:
[{'type': 'output_text',
'annotations': [],
'logprobs': [],
text': 'Hello! How can I help you today? ๐'}]
Note that the output result is an array.
Merging Context
In [10]: for x in r:
cc.push_custom(x)
In [11]: cc.push_easy_input_message("Say hi again.")
Sanity check.
In [12]: cc.to_list()
Out[12]:
[{'content': 'Hi!', 'role': 'user', 'type': 'message'},
{'id': (omitted),
'type': 'message',
'status': 'completed',
'content': [{'type': 'output_text',
'annotations': [],
'logprobs': [],
'text': 'Hello! How can I help you today? ๐'}],
'role': 'assistant'},
{'content': 'Say hi again.', 'role': 'user', 'type': 'message'}]
In [13]: r = predict(context=cc.to_list())
In [14]: r
Out[14]:
[{'id': (omitted),
'type': 'message',
'status': 'completed',
'content': [{'type': 'output_text',
'annotations': [],
'logprobs': [],
'text': 'Hi again! ๐'}],
'role': 'assistant'}]