Goal
Use the resource operation-state model and logs together to diagnose operational problems accurately.Prerequisites
- A failing or recovering resource helps make the guide concrete
Workflow
The operation state on a resource
Resources carry an operation state describing the in-flight or last operation, so a stuck or failed resource is not just a red badge. It tells you what the platform was doing and how far it got.- status: pending, in_progress, waiting_reconcile, completed, failed, or aborted.
- step: which part of the operation it is on.
- attempt_count and retryable: how many times it has tried and whether it will retry on its own.
- last_error and error category: the most recent failure reason.
Pair the state with logs
The operation state tells you where and why something stalled; the logs tell you what actually happened. A retryable failure usually means a missing prerequisite the platform expects to clear; a non-retryable failure or a waiting_reconcile state often needs you to look before acting. If the logs point at an application or runtime problem rather than a platform step, fix it at the resource, not by re-running the operation.Incidents
Operational problems can also be tracked as incidents, which move through open, acknowledged, investigating, resolved, and dismissed. The ops AI agent can pull your open incidents to help triage them — see the AI agents guide.Expected result
Recovery and failure states feel operationally useful instead of vague.
Common failures
Related guides
Operations overview
The operations area is the cross-resource health and runtime visibility surface for StackShift.
StackShift AI agents
StackShift runs specialized agents that can create projects, fix failed builds, manage databases, triage incidents, and operate WordPress — each one proposing a confirmable action before anything changes.
Project troubleshooting
Common project-side failure modes, especially when the app built successfully but does not come up healthy.