AIMultipleAIMultiple
No results found.

Building Local AI Agents: Goose, Observer AI, AnythingLLM

Cem Dilmegani
Cem Dilmegani
updated on Nov 17, 2025

We spent three days mapping the ecosystem of local AI agents that run autonomously on personal hardware without depending on external APIs or cloud services. We organized the tools we evaluated into five categories:

  1. Developer & system agents: Showcased Goose running fully locally in Docker.
  2. Local automation & control agents: Used Observer AI’s hosted demo to test its built-in local agents to perform tasks.
  3. Knowledge & productivity agents: Used AnythingLLM Desktop with a local model and tested on-device file saving.
  4. Frameworks:  Demonstrated offline reasoning with LangGraph + Ollama.
  5. Local runtimes & infrastructure: Listed other runtimes as part of the broader ecosystem analysis.

Local AI agent stack

See category descriptions.

How to approach the local AI agent stack:

Start with the smallest set of layers your use case requires. If your agent needs offline reasoning, begin with a local runtime like Ollama or LM Studio. If it needs to understand your files, add a knowledge layer such as AnythingLLM or LocalGPT. For agents that must take actions (opening apps, controlling the browser, managing files) add a local automation layer. Only use frameworks like LangGraph or LlamaIndex when you need multi-step workflows, planning loops, or complex toolchains.

1. Developer & system agents

*Execution types:

  • Fully local: The tool runs natively on personal hardware using local runtimes such as Ollama, LM Studio, or LocalAI. Tools capable of operating entirely offline.
  • Hybrid local: The core model or task execution happens locally, but some features, such as IDE integration, context indexing, synchronization, or reasoning, still rely on cloud services or APIs. 

** Explanation for on-machine column:

  • Fully on-device: Complete offline operation inference, reasoning, and execution all run locally.
  • Local inference, cloud-assisted: Core model runs locally, but IDE or management features use online services.
  • Local execution, remote reasoning: Code runs locally, but external APIs power reasoning or planning steps.

Fully local on-device AI agent example

Goose 

An open-source, on-machine development agent that plans, writes, and tests code autonomously using local runtimes. Goose works with any compatible LLM and integrates seamlessly with MCP servers. It’s available as both a desktop app and a CLI tool. You can check it out on GitHub: Goose

Core capabilities:

  • Generates, edits, and tests source code autonomously within a local repository.
  • Integrates with local LLM runtimes to perform reasoning and code generation.
  • Supports multi-step task execution, including debugging and file management.
  • Works with standard developer tools and file systems without internet dependency.

Typical use cases:

  • Automating multi-file programming tasks or prototype development.
  • Performing local refactoring or feature implementation guided by natural language.
  • Running closed-loop engineering tasks with full offline autonomy.

Real-world example: Building a local AI agent with Goose and Docker

To see how a local AI agent actually runs in the real world, let’s look at a hands-on setup using Goose and Docker. This example, shared by Oleg Selajev, shows how to build and run a private agent that summarizes YouTube videos executed locally inside Docker containers.

This setup involves multiple configuration files and Docker commands. To keep things simple, we will focus directly on the local execution part of the example.

In this example, Goose connects to the local LLM endpoint, uses MCP tools from the gateway, and runs commands within its own container, with no external APIs or cloud inference involved.

For example, Goose responds to commands such as: What is this video about: “https://youtu.be/X0PaVrpFD14? answer in 5 sentences” , it uses the local model and the YouTube transcript MCP tool to generate a concise summary on the device.

Below, you can see Goose running locally through the terminal. The model shown, “hf.co/unsloth/qwen3-30b-a3b-instruct-2507-gguf:q5_k_m” , is a GGUF-format model, which is designed for local inference using runtimes like Ollama, LM Studio, or LocalAI.

Goose is logging its session to a local directory (/root/.local/share/goose/sessions/…) and running inside the local working directory /app.

This screenshot clearly shows Goose launching locally, connecting to a local LLM, and performing reasoning and tool use without any cloud dependency; an example of a fully on-machine AI agent.1

Hybrid local agent examples

Roo Code (Boomerang Mode)

Roo Code is a local desktop AI coding assistant that focuses on self-correction and continuous refinement of its outputs. The Boomerang Mode enables local execution, allowing Roo Code to run fully on your machine without relying on cloud services. 

Roo Code supports AI providers such as Anthropic, OpenAI, Google Gemini, AWS Bedrock, and local models via Ollama. It can be installed through the VS Code extensions marketplace.

After installing Roo Code, restart your editor, whether it’s VS Code, Cursor, or Windsurf.

Once reopened, the Roo Code icon will appear in the left sidebar. Click on it to begin the guided setup process, which walks you through account configuration and initial model setup.

Roo Code interface overview

Local AI agent configuration in Roo Code:

Roo Code allows developers to create custom configuration profiles that define how it connects to different AI models, including locally hosted LLMs.

From Settings → Providers, you can add profiles through OpenRouter or other supported providers, then choose a local model running via Ollama or LM Studio.

Each configuration profile can store its own parameters, including temperature, reasoning depth, and token limits. This lets you switch between lightweight cloud models and fully local runtimes for on-device inference.

Cursor

Offline functionality is not yet supported in Cursor, but it is possible to run a local LLM while keeping the IDE connected online. This configuration enables local inference, but the overall agent workflow is not fully local, since some data is still sent to Cursor’s servers for supporting functionality.

In this setup, Cursor continues using its API for features such as indexing and applying edits, while the local LLM handles the main inference tasks.

Developer agents integrated into IDEs, such as Cursor IDE, can be configured to use a local model by installing Ollama, setting up Enrock, and linking Cursor to the Enrock URL and API key.

How to use a local LLM within Cursor:

Source:Logan Hallucinates2

2. Local automation & control agents

Observer AI

Observer AI is an open-source framework for automating screen-based and system-level tasks through autonomous agents that run directly on a user’s local machine. It processes all data locally, without relying on external dependencies.

The framework supports three deployment configurations:

Webapp with Cloud Service: Accessible through a hosted interface, requires no setup. To explore how local AI agents operate in practice, we tried the hosted demo available at app.observer-ai.com.

The setup process began with the “create a new agent” interface. We configured a simple agent using the built-in simple creator, defining its behavior, triggers, and response patterns.

Webapp with Observer-Ollama: Uses Ollama for local model inference and Whisper for transcription. All language and audio processing occurs on the user’s hardware.

Self-Hosted Webapp + Observer-Ollama: Requires extensive setup but can operate entirely offline. In this mode, external messaging features such as SMS, WhatsApp, and Email are disabled to prevent misuse, while all functionality remains local.

In the GitHub repository, it says Observer AI can be deployed in a fully private, local-first environment using Docker Compose. This setup runs all required components, the Observer WebApp, observer-ollama translator, and a local Ollama instance, in containers on your machine.

These commands initialize the full environment on localhost, ensuring that both the application and model backend execute on-device.

Core functions:

  • Runs agents powered by local LLMs through Ollama or any v1 chat-completions API.
  • Observes the user’s screen via OCR or screenshots.
  • Executes Python code through an integrated Jupyter server.
  • Operates with zero cloud connectivity, keeping computation and data confined to the user’s environment.

Example use cases:

  • Sending a local notification or message when a process completes.
  • Monitoring a video conference and logging discussion topics.
  • Detecting a person or object on screen and triggering a recording or automated action.

Available agents:

Availability: The project is open source on GitHub with a hosted demo at app.observer-ai.com.

Hybrid local agent example

Browser-Use is a Python-based framework that lets AI agents interact with a browser through Playwright.

One method to install it is to use pip install browser-use command, which sets up both the Python interface and local browser control on the same machine.

When later run (for example, with python -m browser_use), it will open and control a browser instance locally, executing actions and reasoning either through a local LLM (e.g., via Ollama) or through connected APIs:

Setting Browser-Use up locally3

For those who want to see the complete setup in action, here’s a step-by-step video guide showing how to install and run Browser-Use on a local machine:

The walkthrough covers everything from installing dependencies like Playwright and LangChain to connecting Browser-Use with a local model via Ollama.4


How it behaves as a hybrid local agent

  • Inference on-device: You can plug in a locally hosted LLM (via Ollama or similar) so the agent’s reasoning happens on your hardware.
  • Browser automation executed locally: The browser actions (via Playwright) run on your machine.
  • Some cloud-dependency remains: Since Browser-Use may integrate with external APIs (for session management, model routing, or multi-LLM support) it isn’t 100 % local in all setups.

For more, check our benchmark on Browser-use tool use capabilities.

3. Knowledge & productivity agents

Fully local on-device AI agent example

AnythingLLM

Our experience: Running a fully Local AI agent with AnythingLLM:

We tested AnythingLLM (Desktop) to see how a fully on-device AI agent performs from setup to output.

Below, you can see the process, configuring the LLM provider, enabling the “Save Files” skill, and running a simple test where the agent summarizes a topic and saves it locally as a text file:

Configuring the workspace and LLM provider:

We began by opening the workspace settings and navigating to Agent Configuration. Here, we selected the desired LLM provider, in this case, with the mistral-medium-2505 model.

After clicking Update Workspace Agent, the workspace confirmed that the new configuration was active, completing the setup for local execution.

Dashboard overview:

Setting up agent skills:

Next, we opened configure agent skills to activate built-in agent functions.

This section highlights how straightforward it is to begin experimenting with agentic capabilities, no coding or complex setup required.

Testing the “save files” agent:

To test local task execution, we enabled the “Save Files” agent, which allows the model to generate and save outputs directly to the local machine.

After toggling the option and clicking Save, the agent was ready to use.

We then returned to the chat window and used one of the provided sample prompts from the documentation to test the feature, confirming that it could create and store files locally with minimal setup.

Generating and saving files to the local machine:

Running the agent in chat:

We asked the agent to summarize a historical event and then invoked it using @agent to enable Agent Chat Mode.

Instead of saving the output as a PDF, we modified the command to create a text file.

Once invoked, the system displayed a confirmation that the agent was active and provided instructions for exiting the execution loop.

The agent then generated the summary and prepared it for local saving, demonstrating how reasoning, execution, and file handling can all occur directly within the chat interface.

Saving the file locally:

To test the Save Files feature, we followed the default usage guide from the AnythingLLM Docs.

We copied the example command: @agent can save this information as a PDF on my desktop folder? and ran it inside the chat.

After executing the task, a file browser window appeared, allowing us to save the output locally.

(See the screenshot below for the agent’s save example.)

After executing the task, a file browser window appeared, allowing us to save the output locally.

The output was stored in the Downloads folder, confirming that the file was generated and saved entirely on-device, completing the local execution workflow.

Saved file:

Note that, the file doesn’t save to the desktop; it usually goes to the downloads folder as a text file on my local machine.

Exploring other available local agents you can with AnythingLLM

That’s one of the simpler demonstration agents, but there are several others you can explore. These allow you to see how local and connected capabilities can work together inside your AI workspace.

For example:

  • Drag Search:  lets you search through local documents or data that you provide to the LLM.
  • Web Browsing & Web Scraping: enables gathering information directly from the internet.
  • Save Files:  as we used, for exporting results locally.
  • List or Summarize Documents: helps organize and condense your existing files.
  • Chart Generation: visualizes data or text-based results.
  • SQL Agent: allows you to query and analyze databases directly.

4. Frameworks

*Role in local AI system:

  • Local reasoning & collaboration frameworks: Form the cognitive core where agents reason, plan, and collaborate locally.
  • Workflow orchestration platforms: Manage and automate how those agents interact and execute tasks on-device.

Fully local on-device AI agent example 

By combining LangGraph for agent logic with Ollama for local model hosting, developers can build fully offline AI agents that reason, search, and respond autonomously without any cloud dependency.

Building a local AI agent with LangGraph and Ollama

1. Check your GPU (Optional)

Before installing, it’s a good idea to check whether your computer has a dedicated GPU, as this can significantly speed up local model inference.

On Windows, Press Ctrl + Alt + Del and open Task Manager → go to the Performance tab → click GPU in the sidebar. You’ll see your graphics card name, utilization, and dedicated GPU memory. If you don’t have a dedicated GPU, Ollama also supports CPU-only execution (it’ll run slower).

For smooth on-device inference, look for a GPU with at least 6–8 GB of VRAM.

2. Install LangGraph and dependencies

LangGraph is a framework to build multi-agent reasoning workflows. It can be installed directly from PyPI and works independently of LangChain. You can also visit the LangGraph documentation for additional reference.

Run the following commands in your terminal or PowerShell:

3. Install Ollama

Ollama is the runtime that hosts and serves your local models. After installation, confirm it’s running:

To load models for local use, you don’t need to download files manually; you pull them using the command line. Ollama will automatically fetch and store the models locally:

Once a model is pulled, it’s available offline for future sessions. You can switch between models anytime by changing the model name in your code. 

Explore example models for offline use:

4. Write and run your local agent

This Python script (below) creates a fully local AI agent using LangGraph for reasoning and Ollama to run a chosen model (like Llama 3 in this example) on your machine.

The code initializes the model, builds a simple reasoning agent, sends it a query (“Explain how local AI agents work”), and prints the model’s locally generated response

Create a new file called agent.py and add the following code:

Adapted from5

Run it in your terminal:

You should see a clear, structured response from your local model generated entirely on your machine.

Local AI agent powered by LangGraph and Ollama responds to the query “explain how local AI agents work.

5. Local runtimes & infrastructure

Role in local AI system:

  • On-machine inference engines: Run models directly on desktops or edge devices, enabling completely offline AI use.
  • Self-hosted runtimes: Provide scalable, high-performance inference for private or team deployments within secure local infrastructure.

Local AI agent category descriptions

  • Developer & system agents (action layer): Agents that run directly on your device to perform coding, system, and workflow automation tasks locally.
  • Local automation & control agents: Agents that automate real-world actions on your machine by controlling the browser, UI, or OS.
  • Knowledge & productivity agents: Local assistants for chat, summarization, and document handling without sending data to the cloud.
  • Frameworks (agent reasoning/control layer): Libraries that provide reasoning, planning, and coordination for building and running local AI agents.
  • Local runtimes & infrastructure (model execution layer): Engines that execute LLMs on local hardware, enabling fully offline inference.

FAQ

Principal Analyst
Cem Dilmegani
Cem Dilmegani
Principal Analyst
Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
View Full Profile
Researched by
Mert Palazoğlu
Mert Palazoğlu
Industry Analyst
Mert Palazoglu is an industry analyst at AIMultiple focused on customer service and network security with a few years of experience. He holds a bachelor's degree in management.
View Full Profile

Be the first to comment

Your email address will not be published. All fields are required.

0/450