Release & Deploy
At a glance
Release and deploy oversight: per-app status dashboard with revision/build-id/sha, SSE-streamed canary metrics comparing canary vs stable, audited Container App rollback with persistent record on every attempt, DR drill history with two-step remediation attach, and an automatic release-notes drafter that walks GitHub PRs between two tags.
How it works
Release & Deploy is the surface that translates `did the deploy land safely` into a single screen. The deploy status dashboard at `GET /sys/releases/status` shows current revision, build-id, git sha, commit message, and health for each of the six apps (api, admin, app, sys, web, www) through a pluggable `DeployStatusProvider` with a stub fallback so the page renders even in dev without Azure credentials. Canary metrics at `GET /sys/releases/canary/{deployment_id}` compare error rate and P95/P99 latency between the canary and stable revisions; samples stream over SSE at the configured `SYS_RELEASES_CANARY_SAMPLE_SECONDS` interval, and threshold deltas are configurable so each team can pick its own promote/abort criteria.
Rollback at `POST /sys/releases/{app}/rollback` is intentionally heavyweight: `sys_engineer` plus fresh-auth, every attempt persists a `SysRollbackRecord` regardless of outcome, and a failed executor call returns 502 with the record id so the operator can retry from a known state. DR drill history exposes the listing and create endpoints plus a two-step remediation attach (`POST /sys/releases/dr-drills/{id}/remediation`); it extends the disaster-recovery hardening track from PL-T081 with proper auditing. The release-notes drafter at `POST /sys/releases/notes/draft?from_tag=<>&to_tag=<>` walks the GitHub GraphQL API, groups PRs between the two tags by label (Features, Improvements, Bug fixes, Security, Other), and emits markdown sections ready for the public release-note in F21.12.
A stub fallback returns shape-correct content when `SYS_RELEASES_GITHUB_TOKEN` is absent so the UI is testable without a token. Together with the impact-preview from F21.15 and the canary metrics from this surface, an operator can decide promote-or-rollback in seconds rather than minutes.
Key capabilities
- Deploy status: revision, build-id, git sha, commit, health per app
- SSE-streamed canary metrics with canary-vs-stable error and latency deltas
- Container App rollback with `SysRollbackRecord` on every attempt and 502+id on failure
- DR drill history with two-step remediation attach
- Release-notes drafter walking GitHub PRs between tags, label-grouped markdown
- Stub fallbacks for status, canary, and notes when cloud credentials are absent
In practice
An engineer ships a new API canary at 10 percent traffic. She opens the canary metrics page, watches the SSE stream for ten minutes, and sees the canary error rate sit 0.4 percent above stable. The configured threshold trips at 0.5; she promotes.
The deploy status flips to the new revision, the dashboard system map shows green across the board, and the release-notes drafter emits the markdown for the next public release. The following week a deploy goes wrong and she clicks `Rollback`; fresh-auth gates her, the rollback record persists, and the executor call comes back 502. She reads the record id, retries, and the second attempt completes — the audit trail tells the whole story.
Features in this subsystem
5| ID | Status | Features |
|---|---|---|
| F21.18.01 | Shipped | Deploy status dashboard — current revision, build-id, git-sha, commit message, and health per app (api, admin, app, sys, web, www). GET /sys/releases/status; pluggable DeployStatusProvider with stub fallback. ✅ PL-T138 |
| F21.18.02 | Shipped | Canary metrics — per-rollout error-rate + P95/P99 latency comparing canary vs stable revisions. GET /sys/releases/canary/{deployment_id}; SSE-streamed samples at SYS_RELEASES_CANARY_SAMPLE_SECONDS; threshold deltas configurable. Implemented (PL-T138) |
| F21.18.03 | Shipped | Rollback trigger — Container App revision switch, POST /sys/releases/{app}/rollback. sys_engineer + fresh-auth; every attempt persists a SysRollbackRecord regardless of outcome; failed executor calls return 502 with the record id. Implemented (PL-T138) |
| F21.18.04 | Shipped | DR drill history — GET /sys/releases/dr-drills list + POST /sys/releases/dr-drills create + POST /sys/releases/dr-drills/{id}/remediation two-step remediation attach; sys_engineer + fresh-auth for writes. Extends PL-T081. Implemented (PL-T138) |
| F21.18.05 | Shipped | Release-notes drafter — POST /sys/releases/notes/draft?from_tag=<>&to_tag=<>; walks PRs between tags via GitHub GraphQL, groups by label into Features/Improvements/Bug fixes/Security/Other markdown sections; stub fallback when SYS_RELEASES_GITHUB_TOKEN is absent. Implemented (PL-T138) |