Goal
Troubleshoot stacks as service systems, not just as single containers.Prerequisites
- An existing stack
Workflow
Runtime vs placement
- A stack that crashes or restarts repeatedly is usually a runtime problem — read the logs and per-service health first.
- A stack that never stabilizes is often placement: pinned to an unhealthy node, or least_loaded with no node matching its selector tags.
Template drift and upgrades
Template-backed stacks report an upgrade status: up_to_date, update_available, upgrade_blocked, or unknown. An upgrade_blocked status means the upgrade cannot apply safely as-is — preview the upgrade before applying, and remember a template upgrade can be rolled back.Backup and restore failures
- A failed volume archive leaves the backup incomplete — only restore from a completed backup.
- An agent that is too old for the volume endpoints will fail backup or restore; upgrade the node agent.
- Watch recovery-state messaging through a restore instead of assuming the status badge alone means success.
Expected result
You can tell whether the problem is runtime, placement, recovery, or template-related.
Common failures
Related guides
Stack logs, health, and placement
Use the stack detail, logs, and placement information to understand how the stack is actually running.
Back up and restore a stack
Use S3-backed named-volume archives to protect and recover stateful stack data.
Recovery states, logs, and troubleshooting
Read the operation state on a resource — its status, current step, attempt count, retryable flag, and last error — together with logs, instead of treating a single “error” badge as the whole story.