When OAuth renewal “works” but the host still keeps falling back into recovery or safe mode, the problem is often not that the new token is bad.
The more common failure is: the gateway updated the live auth file, but recovery still reads an older provisioned copy.
This guide explains the two-file model and the minimal recovery procedure that usually gets the system healthy again.
What this guide helps you finish
By the end of this guide, you should be able to prove:
- which auth file the live gateway trusts,
- which auth file recovery trusts,
- and whether the next restart will stay healthy instead of falling back into safe mode again.
That is the real job here. Not just “copy one file,” but make recovery and runtime agree on the same auth state.
Who this is for (and not for)
Use this guide if:
- OAuth renewal succeeds but the machine re-enters recovery later,
- safe mode keeps returning after what looked like a clean fix,
- or your deployment uses a provisioned/golden auth copy alongside the live auth store.
This is not the main page for:
- a provider token that is truly expired or invalid,
- first-time auth setup that never succeeded,
- or config pointing at the wrong profile or provider from the start.
Before you touch files: collect these four facts
Before copying anything, confirm:
- Which provider/profile is failing.
- Whether the live
auth-profiles.jsonalready contains the refreshed auth. - Whether
auth-profiles.provisioned.jsonis older than the live file. - Whether safe-mode markers or recovery-attempt counters are still present on the host.
Those four facts tell you whether the loop is stale auth, stale recovery state, or both.
1) The symptom pattern
A common sequence looks like this:
- You run a successful OAuth renewal or
openclaw onboardflow. - The gateway starts using the refreshed credential in the live runtime path.
- Later, safe-mode recovery or a repair script runs.
- Recovery reads an older auth snapshot and decides auth is still broken.
- The machine falls back into the same recovery loop again.
From the operator point of view, this feels like:
- “OAuth renewal succeeded, but the machine still thinks auth is broken.”
- “I fixed it once, but the next recovery cycle undid it.”
- “The gateway and the recovery script seem to disagree about which auth state is real.”
2) The two-file model
In affected setups, there are effectively two sources of truth for auth state:
~/.openclaw/agents/main/agent/auth-profiles.json- the live copy
- the gateway reads/writes this during normal operation
~/.openclaw/agents/main/agent/auth-profiles.provisioned.json- the provisioned / golden copy
- recovery may prefer this file because it is intended to survive bad runtime persistence or partial shutdowns
That design can be reasonable, but it creates an operational trap:
- successful re-auth may update the live file,
- while recovery still trusts the provisioned file,
- so the next recovery cycle reintroduces stale auth state.
If you do not keep the two in sync after renewal, recovery can keep using old credentials even though the gateway had already moved on.
3) Minimal recovery procedure after OAuth renewal
3.1 Copy the fresh live auth file into the provisioned copy
After a successful renewal, sync the two files:
cp ~/.openclaw/agents/main/agent/auth-profiles.json \
~/.openclaw/agents/main/agent/auth-profiles.provisioned.json
Why this helps: it makes the next recovery cycle read the same fresh auth state the gateway is already using.
3.2 Clear stale safe-mode markers if your environment uses them
If the host is already stuck in a recovery loop, clear the markers that keep forcing re-entry:
sudo rm -f /var/lib/init-status/safe-mode
echo 0 | sudo tee /var/lib/init-status/recovery-attempts
Why this helps: even after auth is fixed, some environments will keep re-running the recovery path until the safe-mode marker and attempt counter are cleared.
3.3 Re-run your normal recovery/restart path
Depending on your host, that might be:
- a supervisor restart,
- a recovery script,
- or a service restart that performs a health check and exits safe mode.
The exact command varies by deployment, but the key idea is: resync files first, clear stale state second, restart third.
4) How to verify the fix actually stuck
Do not stop at “the copy command succeeded.” Verify all three layers:
4.1 Verify the auth files now match
Check timestamps and, if needed, compare contents:
ls -l ~/.openclaw/agents/main/agent/auth-profiles*.json
4.2 Verify model auth from OpenClaw itself
openclaw models status --probe
You want to confirm the provider is usable, not just that files exist.
4.3 Verify recovery stops re-entering safe mode
After the next restart or repair cycle:
- the host should stay out of safe mode,
- the same broken auth diagnosis should not immediately return,
- and model calls should keep succeeding.
5) When this guide applies vs when it does not
This guide is a good fit when:
- renewal/auth setup appears successful,
- but later recovery still behaves like auth is stale,
- especially on hosts with custom recovery scripts or hardened safe-mode flows.
This guide is not the main fix when:
- the provider token itself is actually expired or invalid,
- you never successfully renewed auth in the first place,
- or your config points at the wrong provider/profile entirely.
In those cases, start with the shorter auth troubleshooting pages first.
5.5) If auth-profiles.json is corrupted (concurrent write / “Extra data”)
In some high-concurrency setups (busy channels + cron jobs + periodic auth refresh), operators have reported auth-profiles.json becoming unparseable due to what looks like concurrent writes (two JSON objects concatenated, or a truncated second object).
Symptom pattern:
- OpenClaw suddenly reports “No API key found” for a provider you definitely configured.
- Errors mention the auth store path, for example:
Auth store: ~/.openclaw/agents/main/agent/auth-profiles.json
- The file fails to parse with errors like:
JSONDecodeError: Extra data
Emergency recovery (safe operator path):
- Stop the gateway (so nothing writes while you repair):
openclaw gateway stop
- Back up the corrupted file:
cp ~/.openclaw/agents/main/agent/auth-profiles.json \
~/.openclaw/agents/main/agent/auth-profiles.json.corrupt.$(date +%Y%m%d-%H%M%S)
- Restore from a known-good source:
- If you have
auth-profiles.provisioned.jsonin your environment, copy it back:
cp ~/.openclaw/agents/main/agent/auth-profiles.provisioned.json \
~/.openclaw/agents/main/agent/auth-profiles.json
- Otherwise, restore from your state backups (recommended:
openclaw backup create --verifybefore upgrades/changes).
- Restart and probe:
openclaw gateway restart
openclaw models status --probe
Prevention tips (until upstream atomic writes land):
- Avoid running multiple gateways against the same state directory.
- Keep the state directory off cloud-sync folders (see: /troubleshooting/solutions/state-dir-cloud-sync-ebusy-crash).
- Keep at least one periodic backup of the auth file (or the whole state dir) so restoration is boring.
6) Operational habit to adopt
If your deployment uses a provisioned/golden auth copy, treat OAuth renewal as a two-step operation:
- renew the live auth,
- resync the provisioned copy.
That small habit prevents a lot of “it was fixed until the next recovery cycle” incidents.
Verification checklist after the recovery
Treat the loop as fixed only when:
- the live and provisioned auth files now match,
-
openclaw models status --probesucceeds after restart, - safe-mode markers or recovery counters no longer force immediate re-entry,
- the next recovery or supervisor cycle does not restore stale auth,
- and the host remains healthy for one clean restart beyond the first repair.
The last item matters most. This issue often hides until the next restart proves whether the resync really stuck.
What to do if safe mode returns after one clean cycle
Use this order:
- Re-check whether recovery is reading the provisioned file you think it is.
- Compare timestamps and contents of both auth files again.
- Look for a custom recovery script that copies the wrong file back into place.
- If parsing errors return, treat the auth store as a corruption/recovery problem, not just an OAuth problem.