> ## Documentation Index
> Fetch the complete documentation index at: https://docs.stackshift.cloud/llms.txt
> Use this file to discover all available pages before exploring further.

# Recovery states, logs, and troubleshooting

> Read the operation state on a resource — its status, current step, attempt count, retryable flag, and last error — together with logs, instead of treating a single “error” badge as the whole story.

<Tip>
  **Live.** This area is documented as current, user-reliable behavior.
</Tip>

## Goal

Use the resource operation-state model and logs together to diagnose operational problems accurately.

## Prerequisites

* A failing or recovering resource helps make the guide concrete

## Workflow

<Steps>
  <Step>
    Read the current operation state before retrying anything.
  </Step>

  <Step>
    Pair the state with recent logs and the last surfaced error.
  </Step>

  <Step>
    Use resource-level guides when the problem is obviously project-, stack-, node-, or database-specific.
  </Step>
</Steps>

## The operation state on a resource

Resources carry an operation state describing the in-flight or last operation, so a stuck or failed resource is not just a red badge. It tells you what the platform was doing and how far it got.

* status: pending, in\_progress, waiting\_reconcile, completed, failed, or aborted.
* step: which part of the operation it is on.
* attempt\_count and retryable: how many times it has tried and whether it will retry on its own.
* last\_error and error category: the most recent failure reason.

## Pair the state with logs

The operation state tells you where and why something stalled; the logs tell you what actually happened. A retryable failure usually means a missing prerequisite the platform expects to clear; a non-retryable failure or a waiting\_reconcile state often needs you to look before acting. If the logs point at an application or runtime problem rather than a platform step, fix it at the resource, not by re-running the operation.

## Incidents

Operational problems can also be tracked as incidents, which move through open, acknowledged, investigating, resolved, and dismissed. The ops AI agent can pull your open incidents to help triage them — see the AI agents guide.

## Expected result

<Check>
  Recovery and failure states feel operationally useful instead of vague.
</Check>

## Common failures

<Warning>
  * Retrying a non-retryable failure instead of resolving the underlying cause.
  * A waiting\_reconcile or aborted operation left unattended.
  * Logs show an app/runtime issue but the operation gets re-run as if it were a platform failure.
</Warning>

## Related guides

<CardGroup cols={2}>
  <Card title="Operations overview" href="/operations/operations-overview">
    The operations area is the cross-resource health and runtime visibility surface for StackShift.
  </Card>

  <Card title="StackShift AI agents" href="/ai-features/ai-agents">
    StackShift runs specialized agents that can create projects, fix failed builds, manage databases, triage incidents, and operate WordPress — each one proposing a confirmable action before anything changes.
  </Card>

  <Card title="Project troubleshooting" href="/projects/project-troubleshooting">
    Common project-side failure modes, especially when the app built successfully but does not come up healthy.
  </Card>
</CardGroup>
