Conversational Planning For Conversational UIs — Research from Google DeepMind

The focus is shifting over time do different elements of the AI Agent architecture.

Cobus Greyling
6 min readMar 18, 2025

--

The focus of this study is on memory and conversation state management…

Conversation state can and was in the past primarily managed via a rigid graph approach. Or conversation state management can be handed to the AI Agent to iterate until in the Observation step AI Agent determines that it has reached the end of its tasks.

The proposed hierarchy from Google DeepMind sits somewhere in-between.

They propose a language-based hierarchical AI Agent that is able to assist users across their real-life journeys towards goals and tasks that can last across sessions and evolve over long periods of time.

The framework is general and aims to behave in a similar fashion as a human when asked to assist with a plan for a real-life goal.

The researchers demonstrate qualitatively the effectiveness of the framework for conversational coaching and conversational tutoring, noting success across other domains as well.

This approach makes a step toward conversational AI Agents that are not anchored in short-term interactions but can be true companions in supporting user goals over time.

If you like this article & want to show some love ❤️

- Clap 50 times, each one helps more than you think! 👏

- Follow me on Medium and subscribe for free. 🫶

- Find me on LinkedIn or on X!

Summary of the Process

The process is a cyclical, interactive loop involving:

  1. User input to define a goal.
  2. The meta-controller deciding high-level actions (for example, add or alter steps).
  3. A sub-policy breaking those actions into specific steps.
  4. A low-level policy providing relevant resources.
  5. User feedback prompting further adjustments, with the cycle repeating as needed.

Practical Example

In this example, the system starts with a broad CrossFit goal, creates an initial plan, and adapts it based on the user’s desire to focus on cardiovascular health, demonstrating its flexibility and responsiveness in conversational coaching.

Step 1: User Initiates the Interaction

What Happens

The process begins when the user expresses a personal health goal to the AI Agent. In this example, the user says, I want to do CrossFit, indicating their interest in pursuing a fitness-related journey.

Key Component: The user’s input is in natural language, setting the stage for a conversational interaction with the AI Agent.

Step 2: Meta-Controller Decides on a Macro-Action

What Happens

The hierarchical AI Agent, specifically its meta-controller (powered by a chain-of-thought (CoT)-prompted LLM), analyses the user’s input and decides on an initial macro-action.

Here, it chooses add-steps to begin creating a plan for the user’s CrossFit goal.

Key Component: The meta-controller acts as the high-level decision-maker, determining the overarching strategy for the plan.

Step 3: Sub-Policy Generates Specific Steps

What Happens

A sub-policy, also powered by a CoT-prompted LLM, takes the meta-controller’s add-steps macro-action and translates it into a detailed plan. In this case, it generates three specific steps:

  1. Learn the basics of CrossFit
  2. Assess current fitness level
  3. Set realistic goals

Key Component: The sub-policy refines the macro-action into actionable, concrete steps tailored to the user’s goal.

Step 4: Low-Level Policy Fetches and Ranks Content

What Happens

A tool-enhanced low-level policy executes the plan by retrieving relevant content (e.g., articles, videos, or guides) for each of the three steps.

It ranks this content based on relevance or usefulness, providing the user with resources to explore for each step.

Key Component: This step leverages external tools and the LLM’s capabilities to enrich the plan with practical information.

Step 5: User Provides Feedback

What Happens

The AI Agent engages the user by asking follow-up questions generated by the sub-policy (for example What are your fitness goals?) or allowing free-form feedback.

In this example, the user responds to the question with, I would like to improve my cardiovascular health.

Key Component: User interaction enables the system to adapt the plan dynamically based on new information.

Step 6: Meta-Controller Adjusts the Plan

What Happens

Based on the user’s feedback (improve my cardiovascular health), the meta-controller evaluates the current plan and decides on a new macro-action: alter-step, targeting the step Set realistic goals.

Key Component: The meta-controller uses the feedback to refine the plan, ensuring it aligns with the user’s updated preferences or goals.

Step 7: Sub-Policy Executes the Alteration

What Happens

The sub-policy responsible for the alter-step macro-action modifies the Set realistic goals step to reflect the user’s focus on cardiovascular health. For example, it might adjust the step to Set realistic goals for improving cardiovascular endurance.

Key Component: The sub-policy ensures the macro-action is implemented accurately at the plan level.

Step 8: Low-Level Policy Updates Content

What Happens

The low-level policy updates its search keywords based on the altered step (for example incorporating cardiovascular health or endurance).

It then fetches and ranks new or revised content to support the updated step, ensuring the resources remain relevant to the user’s refined goal.

Key Component: This step keeps the supporting materials aligned with the evolving plan.

LLM acts as the meta-controller deciding the agent’s next macro-action, and tool use augmented LLM-based option policies.

In Conclusion

Dialog State Management (DSM) is a critical component of AI agents, requiring a careful balance between structured representation and conversational flexibility.

Effective AI Agents must track conversation history, current context, and user intents while adapting to unexpected inputs or shifts in direction (digression).

Research, such as that from Google DeepMind, underscores how structured state tracking enables coherent, goal-oriented interactions by supporting intent recognition and recovery from misunderstandings.

Without sufficient structure, AI Agents risk losing focus or forgetting key details.

Overly rigid frameworks can render conversations stiff and unnatural.

The optimal approach integrates structured DSM — capturing essential context and goals — with flexible mechanisms to handle ambiguity and conversational digressions, ensuring the agent maintains the dialogue’s thread while aligning with the natural flow of human communication.

As conversational UIs grow in prominence, striking this balance will be vital for developing agents that feel both competent and intuitive to engage with.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.

--

--

Cobus Greyling
Cobus Greyling

Written by Cobus Greyling

I’m passionate about exploring the intersection of AI & language. www.cobusgreyling.com

No responses yet