OpenAI Assistants API & Python SDK

The OpenAI Assistants API is a progression from the Chat Completions API, focusing on creating a conversational assistant-like interface & User Experience.

8 min readDec 8, 2023

--

Introduction

In this article I do a walk through of the most basic Assistant you can build using the Python SDK in a notebook.

All you will need to add to the notebook is your own OpenAI API key.

The aim of OpenAI with their assistant functionality is to create a SDK for makers to develop an assistant which is stateful and prompt-less.

The intention of OpenAI is to simplify the creation of a Virtual Assistant. Currently the Assistant has access to three types of tools; Functions, RAG & Code Interpreter.

I foresee a situation where OpenAI will add more tools to be available to the assistant.

Some Considerations

The OpenAI Assistant functionality is good for experimentation, exploration and serving as a short-term solution. The assistant can also be implemented as a tool, acting as an extension for a larger autonomous agent instance.

For a highly scaleable, inspectable and observable implementations a more decomposed approach is required.

I can imagine an organisation will not want to store and manage their conversation transcripts within the OpenAI environment.

An organisation also would want to move to a more LLM agnostic approach where the LLM becomes a utility.

The OpenAI Assistant is inherently stateless, which means you have to manage conversation state, tool definitions, retrieval documents, and code execution manually.

If I need to create these frameworks in any case, I would aspire to achieve LLM and framework independence.

Conversation history is managed by OpenAI, which is convenient as the maker no longer needs to manage the history size, summarising it, etc. Or, send the entire history each time. But, you will still be charged for the tokens of the entire conversation history with each Run.

This introduces a level of opaqueness to the token usage within the assistant framework which is misleading.

Unlike creating a completion in the Chat Completions API, creating a Run is an asynchronous operation.

To know when the Assistant has completed processing, you will need to poll the Run in a loop. This approach adds complexity and overhead. The one advantage is that the run has a number of status values which can be useful to manage the conversation and inform the user.

Source

Complete Working OpenAI Assistant Notebook

!pip install — upgrade openai
!pip show openai | grep Version

The Python SDK that support the Assistants API, requires OpenAI version > 1.2.3.

Version: 1.3.7

This is very you define the API key.

import json
import os

def show_json(obj):
display(json.loads(obj.model_dump_json()))

os.environ['OPENAI_API_KEY'] = str("Your OpenAI API Key goes here.")

Below, the agent is created…

# You can also create Assistants directly through the Assistants API

from openai import OpenAI

client = OpenAI()

assistant = client.beta.assistants.create(
name="History Tutor",
instructions="You are a personal history tutor. Answer questions briefly, in three sentence or less.",
model="gpt-4-1106-preview",
)
show_json(assistant)

With the JSON output. Once the agent is created, you will see the ID, model, assistant name and other details.

{'id': 'asst_qlaTYRSyl9EWeftjKSskdaco',
'created_at': 1702009585,
'description': None,
'file_ids': [],
'instructions': 'You are a personal history tutor. Answer questions briefly, in three sentence or less.',
'metadata': {},
'model': 'gpt-4-1106-preview',
'name': 'History Tutor',
'object': 'assistant',
'tools': []}

Once the Assistant is created crated, it is visible via the OpenAI Dashboard with the name, description and ID shown.

Regardless of whether you create your Assistant through the Dashboard or with the API, you’ll want to keep track of the Assistant ID.

First the thread is created.

# Creating a new thread:

thread = client.beta.threads.create()
show_json(thread)

Below is output, with the thread ID, etc.

{'id': 'thread_1flknQB4C8KH4BDYPWsyl0no',
'created_at': 1702009588,
'metadata': {},
'object': 'thread'}

Here a message is added to the thread.

# Now we add a message to the thread:

message = client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="What year was the USA founded?",
)
show_json(message)

With the result below. It is here where you need to note, that even if the conversation history is not sent each time, you are still charged for the tokens for the entire conversation history with each Run.

{'id': 'msg_5xOq4FV38cS98ohBpQPbpUiE',
'assistant_id': None,
'content': [{'text': {'annotations': [],
'value': 'What year was the USA founded?'},
'type': 'text'}],
'created_at': 1702009591,
'file_ids': [],
'metadata': {},
'object': 'thread.message',
'role': 'user',
'run_id': None,
'thread_id': 'thread_1flknQB4C8KH4BDYPWsyl0no'}

When run is defined mentioned earlier, you must specify both the Assistant and the Thread.

run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id,
)
show_json(run)

And again the output:

{'id': 'run_PnwSECkqDDdjWkQ5P7Hcyfor',
'assistant_id': 'asst_qlaTYRSyl9EWeftjKSskdaco',
'cancelled_at': None,
'completed_at': None,
'created_at': 1702009598,
'expires_at': 1702010198,
'failed_at': None,
'file_ids': [],
'instructions': 'You are a personal history tutor. Answer questions briefly, in three sentence or less.',
'last_error': None,
'metadata': {},
'model': 'gpt-4-1106-preview',
'object': 'thread.run',
'required_action': None,
'started_at': None,
'status': 'queued',
'thread_id': 'thread_1flknQB4C8KH4BDYPWsyl0no',
'tools': []}

Unlike the completion in the Chat Completions API, creating a Run is an asynchronous operation. It will return immediately with the run metadata, which includes a status that will initially be set to queued. The status will be updated as the Assistant performs operations.

The loop below checks the run status in a while loop until the run status reaches a complete status.

import time

def wait_on_run(run, thread):
while run.status == "queued" or run.status == "in_progress":
run = client.beta.threads.runs.retrieve(
thread_id=thread.id,
run_id=run.id,
)
time.sleep(0.5)
return run

run = wait_on_run(run, thread)
show_json(run)

Below the run result.

{'id': 'run_PnwSECkqDDdjWkQ5P7Hcyfor',
'assistant_id': 'asst_qlaTYRSyl9EWeftjKSskdaco',
'cancelled_at': None,
'completed_at': 1702009605,
'created_at': 1702009598,
'expires_at': None,
'failed_at': None,
'file_ids': [],
'instructions': 'You are a personal history tutor. Answer questions briefly, in three sentence or less.',
'last_error': None,
'metadata': {},
'model': 'gpt-4-1106-preview',
'object': 'thread.run',
'required_action': None,
'started_at': 1702009598,
'status': 'completed',
'thread_id': 'thread_1flknQB4C8KH4BDYPWsyl0no',
'tools': []}

Once the run is completed, we can list all the messages in the thread.

# Now that the Run has completed, list the Messages in the Thread to 
# see what got added by the Assistant.

messages = client.beta.threads.messages.list(thread_id=thread.id)
show_json(messages)

Again the output below…

{'data': [{'id': 'msg_WhzkHcPnszsmbdrn0H5Ugl7I',
'assistant_id': 'asst_qlaTYRSyl9EWeftjKSskdaco',
'content': [{'text': {'annotations': [],
'value': 'The United States of America was founded in 1776, with the adoption of the Declaration of Independence on July 4th of that year.'},
'type': 'text'}],
'created_at': 1702009604,
'file_ids': [],
'metadata': {},
'object': 'thread.message',
'role': 'assistant',
'run_id': 'run_PnwSECkqDDdjWkQ5P7Hcyfor',
'thread_id': 'thread_1flknQB4C8KH4BDYPWsyl0no'},
{'id': 'msg_5xOq4FV38cS98ohBpQPbpUiE',
'assistant_id': None,
'content': [{'text': {'annotations': [],
'value': 'What year was the USA founded?'},
'type': 'text'}],
'created_at': 1702009591,
'file_ids': [],
'metadata': {},
'object': 'thread.message',
'role': 'user',
'run_id': None,
'thread_id': 'thread_1flknQB4C8KH4BDYPWsyl0no'}],
'object': 'list',
'first_id': 'msg_WhzkHcPnszsmbdrn0H5Ugl7I',
'last_id': 'msg_5xOq4FV38cS98ohBpQPbpUiE',
'has_more': False}

A message is appended to the thread…

# Create a message to append to our thread
message = client.beta.threads.messages.create(
thread_id=thread.id, role="user", content="Could you give me a little more detail on this?"
)

# Execute our run
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id,
)

# Wait for completion
wait_on_run(run, thread)

# Retrieve all the messages added after our last user message
messages = client.beta.threads.messages.list(
thread_id=thread.id, order="asc", after=message.id
)
show_json(messages)

With the result, consider the content value…

{'data': [{'id': 'msg_oIOfuARjk20zZRn6lAytf0Hz',
'assistant_id': 'asst_qlaTYRSyl9EWeftjKSskdaco',
'content': [{'text': {'annotations': [],
'value': 'Certainly! The founding of the USA is marked by the Declaration of Independence, which was ratified by the Continental Congress on July 4, 1776. This act declared the thirteen American colonies free and independent states, breaking away from British rule.'},
'type': 'text'}],
'created_at': 1702009645,
'file_ids': [],
'metadata': {},
'object': 'thread.message',
'role': 'assistant',
'run_id': 'run_9dWR1QFrN983q1AG1cjcQ9Le',
'thread_id': 'thread_1flknQB4C8KH4BDYPWsyl0no'}],
'object': 'list',
'first_id': 'msg_oIOfuARjk20zZRn6lAytf0Hz',
'last_id': 'msg_oIOfuARjk20zZRn6lAytf0Hz',
'has_more': False}

When the run has completed, the messages can be listed in the thread.

# Now that the Run has completed, list the Messages in the Thread to see 
# what got added by the Assistant.

messages = client.beta.threads.messages.list(thread_id=thread.id)
show_json(messages)

And again the result.

{'data': [{'id': 'msg_oIOfuARjk20zZRn6lAytf0Hz',
'assistant_id': 'asst_qlaTYRSyl9EWeftjKSskdaco',
'content': [{'text': {'annotations': [],
'value': 'Certainly! The founding of the USA is marked by the Declaration of Independence, which was ratified by the Continental Congress on July 4, 1776. This act declared the thirteen American colonies free and independent states, breaking away from British rule.'},
'type': 'text'}],
'created_at': 1702009645,
'file_ids': [],
'metadata': {},
'object': 'thread.message',
'role': 'assistant',
'run_id': 'run_9dWR1QFrN983q1AG1cjcQ9Le',
'thread_id': 'thread_1flknQB4C8KH4BDYPWsyl0no'},
{'id': 'msg_dDeGGSj4w3CIVRd5hsQpGHmF',
'assistant_id': None,
'content': [{'text': {'annotations': [],
'value': 'Could you give me a little more detail on this?'},
'type': 'text'}],
'created_at': 1702009643,
'file_ids': [],
'metadata': {},
'object': 'thread.message',
'role': 'user',
'run_id': None,
'thread_id': 'thread_1flknQB4C8KH4BDYPWsyl0no'},
{'id': 'msg_WhzkHcPnszsmbdrn0H5Ugl7I',
'assistant_id': 'asst_qlaTYRSyl9EWeftjKSskdaco',
'content': [{'text': {'annotations': [],
'value': 'The United States of America was founded in 1776, with the adoption of the Declaration of Independence on July 4th of that year.'},
'type': 'text'}],
'created_at': 1702009604,
'file_ids': [],
'metadata': {},
'object': 'thread.message',
'role': 'assistant',
'run_id': 'run_PnwSECkqDDdjWkQ5P7Hcyfor',
'thread_id': 'thread_1flknQB4C8KH4BDYPWsyl0no'},
{'id': 'msg_5xOq4FV38cS98ohBpQPbpUiE',
'assistant_id': None,
'content': [{'text': {'annotations': [],
'value': 'What year was the USA founded?'},
'type': 'text'}],
'created_at': 1702009591,
'file_ids': [],
'metadata': {},
'object': 'thread.message',
'role': 'user',
'run_id': None,
'thread_id': 'thread_1flknQB4C8KH4BDYPWsyl0no'}],
'object': 'list',
'first_id': 'msg_oIOfuARjk20zZRn6lAytf0Hz',
'last_id': 'msg_5xOq4FV38cS98ohBpQPbpUiE',
'has_more': False}

⭐️ Follow me on LinkedIn for updates on Large Language Models ⭐️

I’m currently the Chief Evangelist @ Kore AI. I explore & write about all things at the intersection of AI & language; ranging from LLMs, Chatbots, Voicebots, Development Frameworks, Data-Centric latent spaces & more.

LinkedIn

--

--

Cobus Greyling
Cobus Greyling

Written by Cobus Greyling

I’m passionate about exploring the intersection of AI & language. www.cobusgreyling.com

Responses (1)