solution model high macos linux windows

Local llama.cpp, Ollama, and vLLM tool-calling compatibility

Understand why local-model servers can chat normally but still fail on agent tool calling, tool-result continuation, or OpenAI-compatible multi-turn behavior in OpenClaw.

By CoClaw Team •

Symptoms

  • Local chat works, but agent tool calling is flaky or broken.
  • The first message may work, but later turns fail after a tool runs.
  • You may see template errors such as Unexpected message role.
  • The model may emit raw tool JSON into content instead of participating in a clean tool-calling loop.

Cause

With local-model servers, there are usually three separate compatibility layers:

  1. the model itself,
  2. the server API layer,
  3. the OpenAI-compatible wrapper or chat template.

Those layers are often treated as one thing by users, but they fail differently.

Examples:

  • llama.cpp may reject tool-related message roles because the chat template does not understand them.
  • Ollama native API and Ollama /v1 compatibility mode do not have the same expectations or stability for tools.
  • vLLM may support basic OpenAI-compatible chat but still differ on tool-calling details and later-turn behavior.

Fix

1) Separate selection problems from runtime problems

First prove OpenClaw is actually using the local model you expect:

openclaw models status --probe

If this resolves to the wrong provider, fix selection/config first.

If it resolves to the correct local provider but real runs still fail, you are now debugging tool-calling compatibility.

2) Ask whether the failure is at the model, server, or wrapper layer

Use this mental split:

  • model problem: the model never produces reliable tool calls,
  • server problem: the backend cannot represent tool turns or later tool results correctly,
  • wrapper/template problem: the server tries to map messages into a chat template that rejects tool roles.

That distinction prevents a lot of blind retrying.

3) Prefer the most native path when you need tool reliability

If a local stack offers both:

  • a native server API, and
  • an OpenAI-compatible /v1 path,

assume the native path is the stronger default for advanced agent behavior unless you have already proven the /v1 path works for multi-turn tool use.

Template or tool-role failures are usually not caused by API keys, provider selection, or baseUrl typos.

They are evidence that the compatibility layer cannot represent OpenClaw’s full conversation state correctly.

Verify

You have reached the right diagnosis if:

  • plain chat works,
  • failures appear specifically when tool turns enter the session,
  • and the problem correlates with server/template behavior rather than with auth or networking.

Verification & references

  • Reviewed by:CoClaw Code Team
  • Last reviewed:March 14, 2026
  • Verified on: macOS · Linux · Windows
Want to explore more? Browse all solutions or ask in the Community Forum .
Report a problem

Related Resources

Custom OpenAI-compatible endpoint rejects tools or tool_choice
Fix
Fix custom or proxy AI endpoints that can chat normally but fail once OpenClaw sends tools, tool_choice, parallel_tool_calls, or later tool-result turns.
Model outputs '[Historical context]' / tool-call JSON instead of a normal reply
Fix
Fix chat replies that leak internal tool metadata (e.g. '[Historical context: ... Do not mimic ...]') by switching to a tool-capable model/provider and ensuring function calling is enabled.
Ollama configured, but OpenClaw still uses Anthropic (or model discovery keeps failing)
Fix
Fix local Ollama setups where gateway logs show Anthropic fallback or repeated Ollama model-discovery failures by pinning provider config, verifying connectivity from the gateway runtime, and separating model selection problems from OpenAI-compatible payload problems.
Browser tool: URLs with Chinese characters are mis-encoded
Fix
Work around a browser tool encoding bug by pre-encoding non-ASCII query parameters (UTF-8) before calling the browser tool.
How to Choose Between Native Ollama, OpenAI-Compatible /v1, vLLM, and LiteLLM for OpenClaw
Guide
Choose the right OpenClaw model-serving path, validate the first backend cleanly, and know what tradeoffs you are accepting before you add tools, routing, or proxy layers.
Self-Hosted AI API Compatibility Matrix for OpenClaw
Guide
Choose a self-hosted or proxy AI backend for OpenClaw without guessing: classify the compatibility layer, prove the runtime features you actually need, and avoid mistaking basic chat success for full agent compatibility.