Recovery states, logs, and troubleshooting

Live. This area is documented as current, user-reliable behavior.

Goal

Use the resource operation-state model and logs together to diagnose operational problems accurately.

Prerequisites

A failing or recovering resource helps make the guide concrete

Workflow

Read the current operation state before retrying anything.

Pair the state with recent logs and the last surfaced error.

Use resource-level guides when the problem is obviously project-, stack-, node-, or database-specific.

The operation state on a resource

Resources carry an operation state describing the in-flight or last operation, so a stuck or failed resource is not just a red badge. It tells you what the platform was doing and how far it got.

status: pending, in_progress, waiting_reconcile, completed, failed, or aborted.
step: which part of the operation it is on.
attempt_count and retryable: how many times it has tried and whether it will retry on its own.
last_error and error category: the most recent failure reason.

Pair the state with logs

The operation state tells you where and why something stalled; the logs tell you what actually happened. A retryable failure usually means a missing prerequisite the platform expects to clear; a non-retryable failure or a waiting_reconcile state often needs you to look before acting. If the logs point at an application or runtime problem rather than a platform step, fix it at the resource, not by re-running the operation.

Incidents

Operational problems can also be tracked as incidents, which move through open, acknowledged, investigating, resolved, and dismissed. The ops AI agent can pull your open incidents to help triage them — see the AI agents guide.

Expected result

Recovery and failure states feel operationally useful instead of vague.

Common failures

Retrying a non-retryable failure instead of resolving the underlying cause.
A waiting_reconcile or aborted operation left unattended.
Logs show an app/runtime issue but the operation gets re-run as if it were a platform failure.

Operations overview

The operations area is the cross-resource health and runtime visibility surface for StackShift.

StackShift AI agents

StackShift runs specialized agents that can create projects, fix failed builds, manage databases, triage incidents, and operate WordPress — each one proposing a confirmable action before anything changes.

Project troubleshooting

Common project-side failure modes, especially when the app built successfully but does not come up healthy.

Nodes view Domain search, purchase, and portfolio

⌘I

​Goal

​Prerequisites

​Workflow

​The operation state on a resource

​Pair the state with logs

​Incidents

​Expected result

​Common failures

​Related guides

Operations overview

StackShift AI agents

Project troubleshooting

Goal

Prerequisites

Workflow

The operation state on a resource

Pair the state with logs

Incidents

Expected result

Common failures

Related guides