Interactive Cooperative Planning & Execution Between Humans & AI Agents

When using large language models (LLMs) through chat interfaces, users can dynamically plan and execute tasks by sending prompts, receiving responses, and evaluating the model’s generated content. In contrast…

6 min readJan 10, 2025

--

…planning-enriched workflows, where the LLM first breaks down a high-level task into smaller subtasks and presents them to the user before execution, have been shown to enhance usability and transparency.

Unlike chat interfaces that rely on spontaneous interactions, some systems allow users to manually create detailed plans, which are then used to guide the LLM’s actions.

Chat interfaces, however, may not be ideally suited for such structured workflows. As a result, alternative interaction paradigms are being developed to better handle complex, multi-step, and long-running tasks, moving beyond the traditional chat-based approach.

Structure

So chat and conversational UIs are unstructured and allows human to interact in a conversational unstructured fashion with imbedded context in the dialog.

However, there are environments where users want to create structure with detailed plans an Agentic layer is required for creating detailed multi-step, long running task.

Agentic Workflows

Agentic workflows, which emphasise the level of agency shared between human users and AI Agents, play a crucial role in interactive systems, particularly in document environments.

These environments support co-planning and co-execution, allowing users and AI Agents to collaboratively plan and execute tasks within a shared framework.

By directly representing tasks, roles and progress in the document, these systems enable seamless collaboration on complex, multi-step tasks.

This approach harmonises human and AI efforts, facilitating a partnership where users and AI Agents jointly compose plans and execute steps, enhancing both productivity and the overall interaction experience.

Considering the image above, once the execution of the entire plan is complete, the agent inserts a plan output panel in the document with a modified version of the last step’s output.

This allows the user to view the plan’s results within the context of their document. The panel can be expanded or collapsed, and remains in the document even if the plan itself is collapsed for easy access.

Our Planning / Construction Site

Documents are our construction sites as modern workers…first we spend much of our time searching for documents, and then we spend much of our time planning and sequencing information.

The study states that much of our planning takes place in a document and hence incorporating some level of agency into that site where we spend most of our time plotting, planning and defining our sequence of actions.

It was discovered that real-world research project documents, where researchers informally record ideas, open questions and meeting notes, contain valuable insights into their preliminary reasoning. These documents serve as ideal environments for interacting with an AI Agent to further develop and explore research ideas.

AI Agents

Core components of LLM-based AI Agent architectures typically include memory, reasoning, planning, and tool use.

Central to their operation is multi-step reasoning, often implemented through chain-of-thought (CoT) processes, which provide a series of intermediate reasoning steps to enhance performance on complex tasks.

CoT is essential for task decomposition, supporting the planning capabilities of LLM agents.

Recent developments have emphasised the importance of incorporating user feedback interactively to further improve the planning capabilities of these agents and extend their overall effectiveness.

Too Much Autonomy?

While creating fully autonomous AI Agents may be compelling in theory, it has notable limitations in practice.

First, we cannot steer the agent with our expertise and worldly understanding.

Two salient opportunities for agent steering in task completion workflows which we focus on in this work are planning and execution.

AI Agents’ abilities to generate executable plans of action have been shown to be unreliable across many domains.

Second, and perhaps more significantly, prioritising AI Agency over human agency can increase safety risks, diminish our ability to think critically and creatively, and negatively impact overall well-being.

For many nuanced and subjective tasks, human input must be considered for AI-powered systems to be successful and aligned with users’ personal needs and goals.

Cocoa is described as an interactive system that facilitates co-planning and co-execution with AI Agents in a document environment for scientific researchers.

Cocoa integrates AI Agents into documents using a novel interaction design pattern through which a human user and an AI agent can jointly plan and execute plan steps using a shared representation of tasks, roles, and progress directly in the document.

The image above is an overview of the Cocoa UI highlights its interactive planning feature, which enables collaborative co-planning and co-execution between a researcher and an AI Agent.

Both can edit the plan directly within the document and execute the steps, similar to running code cells in a computational notebook. Tasks can be assigned either to the AI Agent ( B ) or the researcher ( C ), allowing the researcher to modify the AI’s outputs (E) to provide direction through feedback and expertise.

In this example, the initial three steps of a plan for summarising methods to elicit human feedback have been completed, and the AI is now seeking user input for the next step.

In Conclusion

Cocoa is an interactive agentic editing environment designed to assist users in addressing open questions and tasks within their projects in collaboration with an AI Agent.

Cocoa offers a novel interaction model that facilitates co-planning and co-execution between users and AI.

The findings demonstrated that Cocoa’s unique features allowed researchers to more effectively guide the AI without increasing effort, enhancing ease of use.

The study also identified specific scenarios where researchers preferred interactive plans over chat.

Based on these results, the paper discusses practical implications for designing AI systems that blend interactive plans and chat functionality.

The intent of this study is to pave the way for new interaction paradigms that promote effective collaboration between humans and AI Agents.

There might be significant potential to enhance digital experiences and advance long-standing objectives within the HCI and AI communities, while also highlighting the need for refined strategies to ensure effective human-AI collaboration in practical settings.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.

--

--

Cobus Greyling
Cobus Greyling

Written by Cobus Greyling

I’m passionate about exploring the intersection of AI & language. www.cobusgreyling.com

Responses (1)