To this end, I have shown the circular context, the toolkit, and the evaluator. Now is the time to use these three components.

In this chapter, I show infer, a function that uses the circular context, toolkit and evaluator. I do not automate everything. I still manually manage the messages, but I introduce infer to interact with the three components. I name the function infer for two reasons. One, it is similar to inference. Two, it is similar to inferrer, just like how eval is similar to evaluator.

This chapter has three sections:

What is Needed to Make an LLM Agent? – self-explanatory,
Basic Implementation – the infer script in Python,
Interactive Use – an ipython session to show usage.

Here are my two goals for this chapter:

define and implement the infer function, and
illustrate it’s use for an LLM Agent.

Note that I will not make an LLM Agent in this chapter.

What is Needed to Make an LLM Agent?

What is an LLM?

Loosely speaking, an LLM takes a text sequence (as input) and returns another text sequence that completes the prior. These, LLMs, are trained to model the probability distribution of a text given some prior text. Or in other words, an LLM can be thought of as a predict function, that takes text, of size N, and returns text ,of size N + k, such that the size N + k is less than the LLM context size M, which is defined by the API model.

It is also worth noting that an LLM API is a non-deterministic process, as the same input may not return the same output, due to output sampling and parallelism.

LLM Context size M
Index 0 1 2 3 4 5 6 7 8 9 ... N ... k < M
Value ? ? ? ? ? ? ? ? ? ? ... ? ... ?

What is an LLM Agent?

“An LLM agent runs tools in a loop to achieve a goal”. Reference.

Tools are external capabilities the LLM can use beyond just generating text. Recall that the LLM API accepts a tools field in the JSON input data. When you provide the API tools, the server constructs a special system prompt. The prompt is designed to instruct the model to use the specified tool(s).

For example, here is the Claude API example prompt that is constructed for tool use.

In this environment you have access to a set of tools you can use to answer the
user's question.
{{ FORMATTING INSTRUCTIONS }}
String and scalar parameters should be specified as is, while lists and objects
should use JSON format. Note that spaces for string values are not stripped.
The output is not expected to be valid XML and is parsed with regular
expressions.
Here are the functions available in JSONSchema format:
{{ TOOL DEFINITIONS IN JSON SCHEMA }}
{{ USER SYSTEM PROMPT }}
{{ TOOL CONFIGURATION }}

The loop usually looks like:

Observe – look at the current state, user requests, or previous results,
Think – reason about what to do next,
Act – call a tool,
Observe,
Repeat – until a gool is reached.

I will not implement an LLM Agent in this chapter, as I will not implement this loop. Instead, I will do things manually in an ipython session.

Note that I define only one tool for the LLM API to use: py_runsource_exec.

Building blocks

The Infer Function

The essential part of the LLM Agent will be a function that I call infer.

Function infer takes a context, a toolkit, and an evaluator as inputs. It returns a new context as the output.

infer: context toolkit evaluator -> context

The infer function:

interacts with the LLM API,
performs tool calls,
interacts with the evaluator, and
creates a new output context.

Context

A context is a list (data structure) that contains input / output objects the LLM API can understand. The objects are:

EasyInputMessage,
ResponseOutputMessage,
FunctionCallOutput, and
ResponseFunctionToolCall.

Toolkit

A toolkit is a data structure that associates tools with programs of the evaluator. It does not perform a tool call.

The only tool defined is the tool that interacts with the evaluator.

Evaluator

An evaluator is a program that takes a program as input and returns a value as output. It has an implicit environment that associates variables with values. It does not perform a tool call, the inferrer does. But, the evaluator provides the functionality for performing the tool call.

LLM API <--> infer <--> evaluator <--> Computer System

Basic Implementation of the Infer Function

Setup

mkdir infer && cd infer
python -m venv venv
source venv/bin/activate
pip install requests ipython

I will write all code into a single file named inferscript.py.

The Infer Script

import json

import toolkit
import evaluator 
import circularcontext as cc

from pprint import pprint

tk = toolkit.Toolkit()
e = evaluator.Evaluator()
c = cc.CircularContext()

toolkit.add_py_repl_tools(tk)

def make_functioncalloutput(call_id, content):
    if content == "":
        content = "[Function tool call returned empty result.]"
    return {
        "call_id": call_id,
        "output": content,
        "type": "function_call_output"
    }

def make_functioncalloutput_denied(call_id):
    return {
        "call_id": call_id,
        "output": "[Function tool call permission denied.]",
        "type": "function_call_output"
    }

def ask_for_permission(toolcall):
    print("Permission needed to use tool call.")
    pprint(toolcall)
    while True:
        answer = input("Grant permission? (y/n): ").strip().lower()
        if answer in ('y', 'yes'):
            print("Proceeding...")
            return True
        elif answer in ('n', 'no'):
            print("Aborting function tool call.")
            return False
        else:
            print("Please enter 'y' or 'n'.")


def infer_iter(context, toolkit, evaluator):
    outputs = cc.predict(context=context, tools=toolkit.tools())
    r = outputs.copy()
    for o in outputs:
        if o['type'] == "function_call":
            if toolkit.match(o['name']) == "py_runsource_exec":
                if ask_for_permission(o):
                    args = json.loads(o['arguments'])
                    result = e.runsource_exec(args['code'])
                    r.append(make_functioncalloutput(o['call_id'], result))
                else:
                    r.append(make_functioncalloutput_denied(o['call_id']))
    return r

Interactive Use

Getting Started

Make sure that:

the terminal is in the proper directory,
the Python virtual environment is activated,
files evaluator.py, replscript.py, toolkit.py, and circularcontext.py are in the proper directory.
the proper code is in the inferscript.py file,

Start an iPython session.

export OPENAI_API_KEY="your api key..."
ipython

Load the code.

In [1]: load "inferscript.py"

Push the task for the LLM.

In [3]: c.push_easy_input_message("Use Python to calculate the gravitational
      ⋮ acceleration at the ISS")

Create the first response.

In [4]: r1 = infer_iter(c.to_list(), tk, e)
Permission needed to use tool call.
{'arguments': '{"code":"# Gravitational acceleration formula: g = G * M / '
              'r^2\\n# G is the gravitational constant, M is mass of Earth, r '
              "is distance from Earth's center to ISS\\nG = 6.67430e-11  # m^3 "
              'kg^-1 s^-2\\nM = 5.972e24      # kg\\nR_earth = 6371e3  # '
              "Earth's radius in meters\\nh_ISS = 408e3     # ISS altitude "
              "above Earth's surface in meters\\nr = R_earth + "
              'h_ISS\\n\\ng_iss = G * M / r**2\\nprint(f\\"Gravitational '
              'acceleration at ISS altitude: {g_iss:.2f} m/s^2\\")"}',
 'call_id': (omitted),
 'id': (omitted),
 'name': 'py_runsource_exec',
 'status': 'completed',
 'type': 'function_call'}
Grant permission? (y/n): y
Proceeding...

Push the returned context to the CircularContext instance in variable c.

In [5]: for r in r1:
   ...:     c.push_custom(r)
   ...:

Let the LLM know about the function call output value.

In [6]: r2 = infer_iter(c.to_list(), tk, e)

No permission request was needed. This means that no function tool call was used. Let us see the response.

In [7]: r2
Out[7]: 
[{'id': (omitted),
  'type': 'message',
  'status': 'completed',
  'content': [{'type': 'output_text',
    'annotations': [],
    'logprobs': [],
    'text': "The gravitational acceleration at the altitude of the
    International Space Station (ISS) is approximately 8.67 m/s². This is only
    slightly less than the acceleration due to g ravity at Earth's surface
    (about 9.81 m/s²)."}],
  'role': 'assistant'}]

That is the solution to the task. The LLM used the evaluator tool to solve the task. Note that this is not a proper LLM Agent, as the looping part is missing. The reason is that I have not come up with a user interface to control the LLM Agent.