When OAuth renewal “works” but the host still keeps falling back into recovery or safe mode, the problem is often not that the new token is bad.

The more common failure is: the gateway updated the live auth file, but recovery still reads an older provisioned copy.

This guide explains the two-file model and the minimal recovery procedure that usually gets the system healthy again.

What this guide helps you finish

By the end of this guide, you should be able to prove:

which auth file the live gateway trusts,
which auth file recovery trusts,
and whether the next restart will stay healthy instead of falling back into safe mode again.

That is the real job here. Not just “copy one file,” but make recovery and runtime agree on the same auth state.

Who this is for (and not for)

Use this guide if:

OAuth renewal succeeds but the machine re-enters recovery later,
safe mode keeps returning after what looked like a clean fix,
or your deployment uses a provisioned/golden auth copy alongside the live auth store.

This is not the main page for:

a provider token that is truly expired or invalid,
first-time auth setup that never succeeded,
or config pointing at the wrong profile or provider from the start.

Before you touch files: collect these four facts

Before copying anything, confirm:

Which provider/profile is failing.
Whether the live auth-profiles.json already contains the refreshed auth.
Whether auth-profiles.provisioned.json is older than the live file.
Whether safe-mode markers or recovery-attempt counters are still present on the host.

Those four facts tell you whether the loop is stale auth, stale recovery state, or both.

1) The symptom pattern

A common sequence looks like this:

You run a successful OAuth renewal or openclaw onboard flow.
The gateway starts using the refreshed credential in the live runtime path.
Later, safe-mode recovery or a repair script runs.
Recovery reads an older auth snapshot and decides auth is still broken.
The machine falls back into the same recovery loop again.

From the operator point of view, this feels like:

“OAuth renewal succeeded, but the machine still thinks auth is broken.”
“I fixed it once, but the next recovery cycle undid it.”
“The gateway and the recovery script seem to disagree about which auth state is real.”

2) The two-file model

In affected setups, there are effectively two sources of truth for auth state:

~/.openclaw/agents/main/agent/auth-profiles.json
- the live copy
- the gateway reads/writes this during normal operation
~/.openclaw/agents/main/agent/auth-profiles.provisioned.json
- the provisioned / golden copy
- recovery may prefer this file because it is intended to survive bad runtime persistence or partial shutdowns

That design can be reasonable, but it creates an operational trap:

successful re-auth may update the live file,
while recovery still trusts the provisioned file,
so the next recovery cycle reintroduces stale auth state.

If you do not keep the two in sync after renewal, recovery can keep using old credentials even though the gateway had already moved on.

3) Minimal recovery procedure after OAuth renewal

3.1 Copy the fresh live auth file into the provisioned copy

After a successful renewal, sync the two files:

cp ~/.openclaw/agents/main/agent/auth-profiles.json \
   ~/.openclaw/agents/main/agent/auth-profiles.provisioned.json

Why this helps: it makes the next recovery cycle read the same fresh auth state the gateway is already using.

3.2 Clear stale safe-mode markers if your environment uses them

If the host is already stuck in a recovery loop, clear the markers that keep forcing re-entry:

sudo rm -f /var/lib/init-status/safe-mode
echo 0 | sudo tee /var/lib/init-status/recovery-attempts

Why this helps: even after auth is fixed, some environments will keep re-running the recovery path until the safe-mode marker and attempt counter are cleared.

3.3 Re-run your normal recovery/restart path

Depending on your host, that might be:

a supervisor restart,
a recovery script,
or a service restart that performs a health check and exits safe mode.

The exact command varies by deployment, but the key idea is: resync files first, clear stale state second, restart third.

4) How to verify the fix actually stuck

Do not stop at “the copy command succeeded.” Verify all three layers:

4.1 Verify the auth files now match

Check timestamps and, if needed, compare contents:

ls -l ~/.openclaw/agents/main/agent/auth-profiles*.json

4.2 Verify model auth from OpenClaw itself

openclaw models status --probe

You want to confirm the provider is usable, not just that files exist.

4.3 Verify recovery stops re-entering safe mode

After the next restart or repair cycle:

the host should stay out of safe mode,
the same broken auth diagnosis should not immediately return,
and model calls should keep succeeding.

5) When this guide applies vs when it does not

This guide is a good fit when:

renewal/auth setup appears successful,
but later recovery still behaves like auth is stale,
especially on hosts with custom recovery scripts or hardened safe-mode flows.

This guide is not the main fix when:

the provider token itself is actually expired or invalid,
you never successfully renewed auth in the first place,
or your config points at the wrong provider/profile entirely.

In those cases, start with the shorter auth troubleshooting pages first.

5.5) If `auth-profiles.json` is corrupted (concurrent write / “Extra data”)

In some high-concurrency setups (busy channels + cron jobs + periodic auth refresh), operators have reported auth-profiles.json becoming unparseable due to what looks like concurrent writes (two JSON objects concatenated, or a truncated second object).

Symptom pattern:

OpenClaw suddenly reports “No API key found” for a provider you definitely configured.
Errors mention the auth store path, for example:
- Auth store: ~/.openclaw/agents/main/agent/auth-profiles.json
The file fails to parse with errors like:
- JSONDecodeError: Extra data

Emergency recovery (safe operator path):

Stop the gateway (so nothing writes while you repair):

openclaw gateway stop

Back up the corrupted file:

cp ~/.openclaw/agents/main/agent/auth-profiles.json \
   ~/.openclaw/agents/main/agent/auth-profiles.json.corrupt.$(date +%Y%m%d-%H%M%S)

Restore from a known-good source:

If you have auth-profiles.provisioned.json in your environment, copy it back:

cp ~/.openclaw/agents/main/agent/auth-profiles.provisioned.json \
   ~/.openclaw/agents/main/agent/auth-profiles.json

Otherwise, restore from your state backups (recommended: openclaw backup create --verify before upgrades/changes).

Restart and probe:

openclaw gateway restart
openclaw models status --probe

Prevention tips (until upstream atomic writes land):

Avoid running multiple gateways against the same state directory.
Keep the state directory off cloud-sync folders (see: /troubleshooting/solutions/state-dir-cloud-sync-ebusy-crash).
Keep at least one periodic backup of the auth file (or the whole state dir) so restoration is boring.

6) Operational habit to adopt

If your deployment uses a provisioned/golden auth copy, treat OAuth renewal as a two-step operation:

renew the live auth,
resync the provisioned copy.

That small habit prevents a lot of “it was fixed until the next recovery cycle” incidents.

Verification checklist after the recovery

Treat the loop as fixed only when:

the live and provisioned auth files now match,
openclaw models status --probe succeeds after restart,
safe-mode markers or recovery counters no longer force immediate re-entry,
the next recovery or supervisor cycle does not restore stale auth,
and the host remains healthy for one clean restart beyond the first repair.

The last item matters most. This issue often hides until the next restart proves whether the resync really stuck.

What to do if safe mode returns after one clean cycle

Use this order:

Re-check whether recovery is reading the provisioned file you think it is.
Compare timestamps and contents of both auth files again.
Look for a custom recovery script that copies the wrong file back into place.
If parsing errors return, treat the auth store as a corruption/recovery problem, not just an OAuth problem.

OpenClaw Safe Mode Recovery After OAuth Renewal: auth-profiles.json vs provisioned copy

Implementation Steps

What this guide helps you finish

Who this is for (and not for)

Before you touch files: collect these four facts

1) The symptom pattern

2) The two-file model

3) Minimal recovery procedure after OAuth renewal

3.1 Copy the fresh live auth file into the provisioned copy

3.2 Clear stale safe-mode markers if your environment uses them

3.3 Re-run your normal recovery/restart path

4) How to verify the fix actually stuck

4.1 Verify the auth files now match

4.2 Verify model auth from OpenClaw itself

4.3 Verify recovery stops re-entering safe mode

5) When this guide applies vs when it does not

5.5) If `auth-profiles.json` is corrupted (concurrent write / “Extra data”)

6) Operational habit to adopt

Verification checklist after the recovery

What to do if safe mode returns after one clean cycle

Related Resources

Need live assistance?

OpenClaw Safe Mode Recovery After OAuth Renewal: auth-profiles.json vs provisioned copy

Implementation Steps

Step 1: Understand the two auth files

Step 2: Resync live auth into the provisioned copy

Step 3: Clear stale safe-mode markers

Step 4: Probe before declaring success

What this guide helps you finish

Who this is for (and not for)

Before you touch files: collect these four facts

1) The symptom pattern

2) The two-file model

3) Minimal recovery procedure after OAuth renewal

3.1 Copy the fresh live auth file into the provisioned copy

3.2 Clear stale safe-mode markers if your environment uses them

3.3 Re-run your normal recovery/restart path

4) How to verify the fix actually stuck

4.1 Verify the auth files now match

4.2 Verify model auth from OpenClaw itself

4.3 Verify recovery stops re-entering safe mode

5) When this guide applies vs when it does not

5.5) If auth-profiles.json is corrupted (concurrent write / “Extra data”)

6) Operational habit to adopt

Verification checklist after the recovery

What to do if safe mode returns after one clean cycle

Related

Related Resources

Need live assistance?

5.5) If `auth-profiles.json` is corrupted (concurrent write / “Extra data”)