Troubleshooting

How to use this runbook

Start with the symptom the user can see, not the subsystem you suspect. Ask Buffaly to identify the failing surface, gather direct evidence, preserve any useful artifact, make one bounded change, and rerun the validation that proves the surface works.

Good first request

"Diagnose this Buffaly failure. Name the route, session, file, provider, or tool that is failing; inspect logs, session timeline, task files, artifacts, git diff, settings, and service diagnostics as needed; then propose the smallest safe recovery step and the verification check."

Fast triage map

If the user sees	First evidence to gather	Safe first recovery
A route returns 404 or 500	Route, host/port, build output, web logs, page/partial source	Confirm the correct app is running, then fix the narrow route or render error
A model/provider fails	Provider, transport, model, credential path, provider response, settings status	Validate the exact credential route without exposing secrets
Search or action discovery is poor	Embedding status, reindex status, candidate action output, entity binding	Refine the query or repair the missing embedding/index/config layer
Long-running work did not appear	Session timeline, child session, watcher callback, task file, artifact path	Resume from the durable artifact or ask the child session to publish the missing output
A committed edit is not visible	Commit hash, deployed branch, running site, route render, deployment logs	Deploy/restart the correct site or validate that the public host points at the intended build

Docs route returns 404 or 500

What the user sees: A docs URL such as /docs/troubleshooting returns not found, an application error, a blank page, or a stale browser result.

What to ask Buffaly: "Verify the exact docs route on the running site, confirm the Razor @page route, inspect shared navigation, build the web project, and check the web logs for the failing request."

Evidence to gather: The failing route and host/port, HTTP status, Razor page path, shared partials such as _DocsNav, dotnet build output, web logs, and any exception stack. For this public site, route evidence usually starts in C:\dev\Buffaly.Web\Buffaly.Web\Pages\Site\Docs.

Smallest safe recovery: Do not sweep all docs. Fix the single route, page model, partial, or link that is failing. If build passes but render fails, inspect the specific Razor expression or shared partial used by that page.

Verify: Rebuild, open the exact route, confirm the page returns HTTP 200, and rerun the docs route/link check for the touched page or full docs set.

Running site is stale or pointed at the wrong project or port

What the user sees: Source has changed, but the browser still shows old text; one localhost port works while another returns stale 404s; a public page does not match the latest commit.

What to ask Buffaly: "Confirm which process serves this URL, which project it was launched from, which port is reliable, and whether the rendered page matches the source file and commit."

Evidence to gather: Running process command line, launch profile, listening port, browser URL, route response, git commit, deployment artifact, and logs. The public docs rewrite handoff recorded http://localhost:5299/docs as the reliable local Buffaly.Web site and warned that localhost:5001 was stale or unreliable for that repo.

Smallest safe recovery: Stop or ignore the stale host, start the intended C:\dev\Buffaly.Web\Buffaly.Web\Buffaly.Web.csproj app on the correct port, and refresh the exact route. Avoid editing content just to compensate for a stale process.

Verify: The rendered route contains a phrase from the current source, the process path points at the expected project, and the route status is 200 on the intended port.

Provider authentication fails

What the user sees: Model calls fail with unauthorized, missing key, token expired, model unavailable, or provider not configured errors.

What to ask Buffaly: "Identify the selected provider, transport, model, reasoning level, and authentication route. Check settings status and provider diagnostics without printing secrets."

Evidence to gather: Provider catalog selection, current session model selection, settings page status, feature rows, provider response body, logs, and service diagnostics. Keep OpenAI API-key settings, Codex auth/profile settings, and provider-native module credentials separate.

Smallest safe recovery: Fix the credential path actually used by the selected transport. Do not paste keys into prompts, source, task files, screenshots, or docs. Rotate or re-enter the secret through the configured settings or secret store when required.

Verify: Run one cheap provider request and have Buffaly report the provider, transport, model, credential route used, and success/failure evidence without revealing the credential value.

OpenAI API works but Codex backend does not, or the reverse

What the user sees: Embeddings or OpenAI API-backed features work, but the main Codex-backed agent fails; or Codex sessions work while embeddings, semantic search, or OpenAI API routes fail.

What to ask Buffaly: "Separate the OpenAI API-key route from Codex backend/profile authentication. Test the failing route only, then compare the settings and logs for both routes."

Evidence to gather: OpenAI API key status, Codex Auth Feature status, Codex Backend Feature settings, selected provider transport such as responses_api or codex_backend, provider response, and backend logs. A working ChatGPT/Codex login does not prove embeddings are configured, and a valid OpenAI API key does not prove Codex backend profile auth is valid.

Smallest safe recovery: Repair only the failing route: OpenAI API key for OpenAI API and embedding-dependent features; Codex auth/profile/backend settings for Codex-backed model execution.

Verify: Run a small completion smoke test for the Codex-backed route and a separate embedding or OpenAI API smoke test for the API-key route, then compare results.

Embeddings or semantic search give poor results

What the user sees: Semantic search returns no results, irrelevant old sessions, broad matches, or exact search works while meaning-based search does not.

What to ask Buffaly: "Check OpenAI API-key status for embeddings, semantic reindex status, fragment/vector counts, query wording, and then verify the top result by reading the exact session or artifact."

Evidence to gather: Settings page OpenAI key status, embedding provider logs, reindex status route such as /api/semantic-conversation-search/reindex-status, semantic database fragments, vector rows, query text, and session timeline evidence. Source-backed docs currently describe text-embedding-3-small as the checked-in embedding model for session semantic indexing.

Smallest safe recovery: If credentials are missing, configure the OpenAI API key through settings. If indexing is stale, inspect or rerun the narrow reindex path. If results are too broad, refine the query with the project, route, provider, artifact, or decision you remember.

Verify: Run one controlled semantic search for a known recent topic, open the returned session or artifact, and confirm the cited evidence actually supports the result.

Buffaly cannot discover or load a skill, action, or tool

What the user sees: Buffaly says no matching tool exists, a tool is not callable, a prototype is not found, or a loaded action does not appear in the current session.

What to ask Buffaly: "Search candidate actions twice with different wording, search candidate entities, list registered skills, list actions for the likely skill, load the selected action if needed, and report what was callable."

Evidence to gather: Candidate action results, candidate entity results, skill listing, action listing, loaded tool inventory, prototype details, compile diagnostics for ProtoScript changes, and service diagnostics when the action is backed by a service.

Smallest safe recovery: Treat a missing loaded tool as a routing problem first. Refine the action phrase, load the discovered action, or refresh/rebuild the runtime environment after confirmed project changes. Do not hand-edit ProtoScript when a typed authoring or registration tool exists.

Verify: List loaded action tools or rerun the candidate search, then execute one harmless call or inspect the action details to prove the capability is available.

Child session, watcher, or automaton did not publish expected work

What the user sees: A long-running task says it started, but no artifact, callback, commit, or completion report appears where expected.

What to ask Buffaly: "Inspect the parent session, child session, watcher callback/digest, session timeline, durable task file, artifact path, and git diff. Report the last verified step and the missing publication boundary."

Evidence to gather: Parent and child session keys, watcher subscription or callback digest, timeline rows, Plan/Scratch, durable task file, artifact file path, commit status, and any final-answer or handoff message. Automaton-managed work should leave append-only state in a task or handoff artifact.

Smallest safe recovery: Resume from the most recent durable artifact instead of restarting. If the child completed but did not publish, ask Buffaly to read the child artifacts and write the missing parent-facing summary. If the child is still running, wait or inspect status rather than duplicating work.

Verify: The expected artifact exists, the parent timeline references it, the task file records status and evidence, and any commit or callback can be opened by path or hash.

Session compaction or resume lost context

What the user sees: Buffaly seems to forget prior constraints, misses an artifact, repeats completed discovery, or resumes from an incomplete summary.

What to ask Buffaly: "Compare the current session summary with Plan, Scratch, durable task files, session timeline, compaction archives, and recent tool output. Reconstruct the current route before acting."

Evidence to gather: Current session key, timeline around the compaction boundary, current and previous compaction epoch metadata when available, Plan/Scratch files, task artifact, handoff file, and final reports from child sessions.

Smallest safe recovery: Rehydrate from durable artifacts and direct evidence, then update the plan with the current route. Avoid relying on memory of the transcript when a task file or artifact has authoritative state.

Verify: Buffaly can state the active target, constraints, completed evidence, next step, and acceptance criteria using current artifacts rather than guesses.

Build passes but the rendered route fails

What the user sees: dotnet build succeeds, but the browser route returns 500, wrong styling, missing navigation, malformed HTML, or broken links.

What to ask Buffaly: "Render the exact route locally, inspect the response and logs, validate shared navigation and internal docs links, and compare rendered output with the Razor source."

Evidence to gather: Build output, route HTTP response, server logs, browser console if relevant, generated HTML snippet, shared layout/partial paths, route/link audit output, and git diff.

Smallest safe recovery: Fix the runtime render cause: malformed Razor, an invalid partial, a route mismatch, bad link, missing static asset, or stale host. Keep the fix scoped to the failing page or shared component only when the shared component is truly involved.

Verify: Build again, request the rendered route, run the docs link/nav audit, and confirm no literal markers or internal placeholder text are visible.

Edit was committed but is not visible on the public site

What the user sees: Git shows a commit, but the public website still shows old content or a different route state.

What to ask Buffaly: "Verify the commit contains the intended file, identify the deployed branch/build artifact, inspect deployment logs or service status, and fetch the public route to compare content."

Evidence to gather: Commit hash, git show for the changed file, branch, deployment artifact, service restart or release log, public route response, CDN/cache layer if one exists, and the current process path for self-hosted deployments.

Smallest safe recovery: Deploy or restart the correct artifact, not another edit. If the wrong branch or project is live, repoint the deployment or publish the intended build. If a cache is involved, invalidate only the affected route.

Verify: The public route contains a phrase from the committed source, the deployment log references the commit or artifact, and a fresh request returns the expected content.

What a good troubleshooting report includes

The symptom and exact target: route, session key, file, provider, tool, artifact, or commit.
Evidence inspected: logs, settings status, provider response, session timeline, task file, artifact, service diagnostics, build output, route response, or git diff.
The smallest recovery step taken and why broader changes were avoided.
The verification result: command, route, status code, artifact path, or commit hash that proves the fix.
Any remaining open question or risk, especially if credentials, deployment, or external provider status could not be directly verified.