> ## Documentation Index
> Fetch the complete documentation index at: https://docs.stackshift.cloud/llms.txt
> Use this file to discover all available pages before exploring further.

# Stack troubleshooting

> Common stack-side failures around placement, logs, health, template drift, and restore behavior.

<Tip>
  **Live.** This area is documented as current, user-reliable behavior.
</Tip>

## Goal

Troubleshoot stacks as service systems, not just as single containers.

## Prerequisites

* An existing stack

## Workflow

<Steps>
  <Step>
    Start with the stack detail page, health state, and recent logs.
  </Step>

  <Step>
    Check placement and node health when the stack never stabilizes.
  </Step>

  <Step>
    Use recovery-state messaging for backup, restore, or template-upgrade issues.
  </Step>
</Steps>

## Runtime vs placement

* A stack that crashes or restarts repeatedly is usually a runtime problem — read the logs and per-service health first.
* A stack that never stabilizes is often placement: pinned to an unhealthy node, or least\_loaded with no node matching its selector tags.

## Template drift and upgrades

Template-backed stacks report an upgrade status: up\_to\_date, update\_available, upgrade\_blocked, or unknown. An upgrade\_blocked status means the upgrade cannot apply safely as-is — preview the upgrade before applying, and remember a template upgrade can be rolled back.

## Backup and restore failures

* A failed volume archive leaves the backup incomplete — only restore from a completed backup.
* An agent that is too old for the volume endpoints will fail backup or restore; upgrade the node agent.
* Watch recovery-state messaging through a restore instead of assuming the status badge alone means success.

## Expected result

<Check>
  You can tell whether the problem is runtime, placement, recovery, or template-related.
</Check>

## Common failures

<Warning>
  * No healthy node matches the stack selector tags, so placement never lands.
  * A template upgrade reports upgrade\_blocked and cannot apply without changes.
  * A volume archive failed, so the backup is not safe to restore from.
</Warning>

## Related guides

<CardGroup cols={2}>
  <Card title="Stack logs, health, and placement" href="/stacks/logs-health-and-placement">
    Use the stack detail, logs, and placement information to understand how the stack is actually running.
  </Card>

  <Card title="Back up and restore a stack" href="/stacks/back-up-and-restore-a-stack">
    Use S3-backed named-volume archives to protect and recover stateful stack data.
  </Card>

  <Card title="Recovery states, logs, and troubleshooting" href="/operations/recovery-logs-and-troubleshooting">
    Read the operation state on a resource — its status, current step, attempt count, retryable flag, and last error — together with logs, instead of treating a single “error” badge as the whole story.
  </Card>
</CardGroup>
