Stack troubleshooting

Live. This area is documented as current, user-reliable behavior.

Goal

Troubleshoot stacks as service systems, not just as single containers.

Prerequisites

An existing stack

Workflow

Start with the stack detail page, health state, and recent logs.

Check placement and node health when the stack never stabilizes.

Use recovery-state messaging for backup, restore, or template-upgrade issues.

Runtime vs placement

A stack that crashes or restarts repeatedly is usually a runtime problem — read the logs and per-service health first.
A stack that never stabilizes is often placement: pinned to an unhealthy node, or least_loaded with no node matching its selector tags.

Template drift and upgrades

Template-backed stacks report an upgrade status: up_to_date, update_available, upgrade_blocked, or unknown. An upgrade_blocked status means the upgrade cannot apply safely as-is — preview the upgrade before applying, and remember a template upgrade can be rolled back.

Backup and restore failures

A failed volume archive leaves the backup incomplete — only restore from a completed backup.
An agent that is too old for the volume endpoints will fail backup or restore; upgrade the node agent.
Watch recovery-state messaging through a restore instead of assuming the status badge alone means success.

Expected result

You can tell whether the problem is runtime, placement, recovery, or template-related.

Common failures

No healthy node matches the stack selector tags, so placement never lands.
A template upgrade reports upgrade_blocked and cannot apply without changes.
A volume archive failed, so the backup is not safe to restore from.

Stack logs, health, and placement

Use the stack detail, logs, and placement information to understand how the stack is actually running.

Back up and restore a stack

Use S3-backed named-volume archives to protect and recover stateful stack data.

Recovery states, logs, and troubleshooting

Read the operation state on a resource — its status, current step, attempt count, retryable flag, and last error — together with logs, instead of treating a single “error” badge as the whole story.

Reassign and migration expectations What templates are

⌘I

​Goal

​Prerequisites

​Workflow

​Runtime vs placement

​Template drift and upgrades

​Backup and restore failures

​Expected result

​Common failures

​Related guides

Stack logs, health, and placement

Back up and restore a stack

Recovery states, logs, and troubleshooting

Goal

Prerequisites

Workflow

Runtime vs placement

Template drift and upgrades

Backup and restore failures

Expected result

Common failures

Related guides