All Software As AI Agents

Bridging the Gap of Fitting AI Agents into Our Human-Centric Digital World

7 min read1 day ago

In Short

AI Agents face the significant challenge of seamlessly integrating into our existing digital ecosystem, where their effectiveness depends on connecting to real-world applications like weather forecasts, traffic updates and other services through tools that act as their bridge to these systems.

One common approach to this integration is through APIs, which allow AI Agents to tap into external data sources like a weather API or traffic API. Though this method requires standardised interfaces and can be limited by the availability and specificity of those APIs.

Another way to fit AI Agents into the digital world is by interacting with applications via the Graphical User Interface (GUI), mimicking human actions like clicking or typing, but this can be brittle and inefficient due to frequent UI changes and lack of direct data access.

A third, more robust solution involves deep integration at the code level, where AI Agents interact directly with an application’s underlying logic, offering greater control and precision but demanding significant development effort and access to the application’s codebase.

Balancing these three methods — (1) APIs, (2) GUI interaction, and (3) code-level integration — presents a complex challenge, as each has trade-offs in scalability, reliability, and adaptability.

Status Quo

And this is the problem a number of studies are currently trying to address…integrating AI Agents into our human-centric digital world…and this presents a formidable challenge due to the mismatch between AI Agent capabilities and our existing systems.

Humans rely on intuitive graphical interfaces and unpredictable workflows, while AI Agents thrive on structured data and direct access, creating a disconnect in how they interact with our tools.

Bridging this gap demands not just technical innovation — like finding ways to operate without full code access — but also a cultural shift in how we design software to accommodate intelligent, autonomous collaborators.

Ultimately, the success of this integration hinges on making AI Agents flexible enough to navigate our messy, human-shaped digital landscape without upending the way we already work.

If you like this article & want to show some love ❤️

-Clap 50 times, each one helps more than you think! 👏

- Follow me on Medium and subscribe for free. 🫶

- Let’s connect on LinkedIn, and stay in touch on X!

Two Most Common Approaches

The emergence of Multimodal Large Language Models (LLMs) has sparked a new wave of interest in computer interfacing AI Agents — systems capable of interpreting and executing user instructions expressed in natural language by interacting with a operating system. .

Traditionally, AI agents have interacted with computers through two primary methods:

API-driven approaches, where agents communicate with software via predefined programming interfaces. Tools are made use of, and code was developed to integrate to API’s.

The impediments identified with this approach was the fact that this approach is time consuming; for each API integration is required.

Secondly, not all commercial applications have API’s available, or the exposed API functionality does not have all the GUI functionality.

Subsequently we saw a GUI-based approaches, as seen in tools like Anthropic’s computer use agents or OpenAI’s Operator, where AI Agents navigate Graphical User Interfaces much like a human would.

However, this study seeks to illuminate a third, innovative pathway.

Both API and GUI methods have notable limitations in terms of accuracy and efficiency, prompting us to propose a novel approach: equipping LLMs with direct access to a software’s internal workings — its source code and runtime environment — along with the ability to dynamically generate and inject code for execution.

It is argued that this approach could revolutionise software agent design, paving the way for a digital ecosystem where software not only understands and executes tasks but also collaborates and reasons to address intricate user demands.

It is my sense that incumbents will be able to leverage this approach mostly; Apple with iOS and MacOS, Microsoft with Windows, etc…

Just In Time Compilation

A new software-as-agent approach called Just-in-Time Code Generation (JiT-Codegen) is introduced in this paper.

Inspired by just-in-time compilation, JiT-Codegen allows an agent to create executable code that interacts directly with a software’s runtime context — such as functions, data structures, and UI elements — by accessing its source code.

Unlike prior studies on offline software exploration or runtime interaction, JiT-Codegen pioneers having a large language model (LLM) generate and execute code within the software itself. For instance, it can handle a complex task with five GUI interactions using just two lines of code.

JiT-Codegen complements rather than replaces API- and GUI-based agents, offering flexibility by dynamically crafting functions as needed, which can later be saved as APIs.

It also enhances GUI manipulation by working at the code level.

Access To Application Code

This is a significant hurdle…

The JiT-Codegen approach assumes a “whitebox” setting where the AI Agent has full visibility into the software’s internals — source code, functions, and runtime state.

However, in real-world scenarios, many applications, especially proprietary or closed-source software, do not expose their code.

Without this access, the AI Agent cannot analyse or manipulate the software’s internals, rendering the method impractical for such cases.

Even in open-source environments, the AI Agent would need permissions and integration mechanisms that might not be readily available or standardised across systems.

The study acknowledges this implicitly by focusing on controlled case studies with web-based desktop applications, where code access is feasible.

Yet, it doesn’t fully address how the approach scales to diverse, real-world software ecosystems.

Overcoming this could require rethinking deployment strategies — perhaps through sandboxed environments or developer cooperation — but as it stands, restricted code access remains a critical stumbling block.

Agency on a Spectrum

I have written considerable in the recent past on the idea of an Agentic Spectrum. Below are some considerations on why software might not become AI Agents just yet, in the classical sense but instead introduce varying degrees of agency into user applications:

Dependency on Human Input: AI Agents most often rely on human prompts, feedback, and validation, this suggests that agency is constrained — it’s not fully autonomous but activated within the boundaries of user interaction. Software could evolve similarly, with agency emerging as a function of how much initiative it’s allowed to take within an application.
Task-Specific Empowerment: The study’s AI Agents are tailored to evidence synthesis, not general-purpose decision-making. This mirrors how agency might manifest in user applications — contextual and task-specific rather than universal. For example, a writing app might gain “agency” to restructure paragraphs, but only within the user’s defined goals, not as a free-roaming entity.
Collaborative Dynamics: The bidirectional alignment model (human and AI co-evolving their understanding) implies that agency is shared. Software could introduce agency as a collaborative feature — think of it as a sliding scale where users dial up or down how much autonomy they delegate (e.g., auto-filling forms vs. just suggesting options).

Synthesis: Agency in Applications

The strongest case, supported indirectly by the study, is that agency will likely be introduced into user applications as a gradient rather than software morphing into standalone AI Agents.

The study’s AI doesn’t “become” an agent; it empowers users by embedding intelligent capabilities into a workflow. Similarly, future software might offer:

Low Agency: Passive tools (e.g., spellcheck).
Medium Agency: Proactive suggestions (e.g., drafting emails).
High Agency: Context-aware automation (e.g., managing tasks based on patterns) — still within user-defined guardrails.

This spectrum avoids the leap to fully independent agents, keeping agency as a feature of interaction rather than an intrinsic software trait.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.

Every Software as an Agent: Blueprint and Case Study

The rise of (multimodal) large language models (LLMs) has shed light on software agent -- where software can understand…

www.arxiv.org