Admin Tools — Health, Failure Trend, Job Detail
Three observability endpoints powering the admin notification operations view. Health = subsystem checks (stuck jobs, queue depth, rollup freshness, dead-job watchdog). Failure trend = hourly fail counts for the last N days. Job detail = full row for one NotificationJob.
Three read-only endpoints used by the admin notification operations view. Shipped in Wave C4 (jobs detail, failure-trend page wiring) on top of pre-existing health + rollup infrastructure.
All three are class-level notification:read. No per-endpoint permission override. Same throttle as the rest of the admin notification surface (200 req / minute).
GET /admin/notifications/health — subsystem health
Returns aggregated notification health. Four checks, each { ok: boolean; detail: string }. The top-level healthy is true only when all checks pass.
Response — ApiResponseOf<NotificationHealthResponseDto>
{
"success": true,
"data": {
"healthy": true,
"checks": {
"stuckJobs": { "ok": true, "detail": "0 stuck jobs" },
"queueDepth": { "ok": true, "detail": "42 pending" },
"rollupFreshness":{ "ok": true, "detail": "3min ago" },
"recentDeadJobs": { "ok": true, "detail": "0 dead in last hour" }
}
}
}Check semantics
| Check | Source | Pass condition |
|---|---|---|
stuckJobs | NotificationJob where status='PROCESSING' and startedAt < now() - 10min | count === 0 — a stuck job indicates worker died mid-claim |
queueDepth | NotificationJob where status='PENDING' | count < 10000 — alerting threshold for backlog |
rollupFreshness | NotificationEventRollup ORDER BY updatedAt DESC LIMIT 1 | last row updatedAt > now() - 15min — rollup cron healthy |
recentDeadJobs | NotificationJob where status='DEAD' and updatedAt > now() - 1h | count === 0 — any DEAD job in the last hour is a regression signal |
Errors
| HTTP | Reason |
|---|---|
401 | Missing / invalid admin session |
403 | Permission denied (notification:read required) |
Side effects
None. 4 read-only count / findFirst queries.
GET /admin/notifications/failure-trend — hourly fails
Returns failure counts grouped by hour for the last N days (1 ≤ N ≤ 30, default 7). Data is sourced from NotificationEventRollup — pre-aggregated by an hourly cron — so this query is constant-time regardless of event volume.
Query
| Param | Type | Validation | Notes |
|---|---|---|---|
days | string (parsed int) | `parseInt(days) |
Response — ArrayApiResponseOf<FailureTrendItemDto>
{
"success": true,
"data": [
{ "hour": "2026-06-22T14:00:00.000Z", "count": 3 },
{ "hour": "2026-06-22T15:00:00.000Z", "count": 0 },
{ "hour": "2026-06-22T16:00:00.000Z", "count": 1 }
]
}| Field | Type | Notes |
|---|---|---|
hour | string (ISO 8601) | Hour bucket — NotificationEventRollup.hourBucket (UTC, truncated to hour) |
count | number | Sum across failed, bounced, dead statuses in that hour |
Errors
| HTTP | Reason |
|---|---|
401 | Missing / invalid admin session |
403 | Permission denied (notification:read required) |
Side effects
None. Single NotificationEventRollup.findMany over the rolling window.
GET /admin/notifications/jobs/:id — single job detail
Returns the full NotificationJob row for one job ID. Used by the admin queue table's row-detail drill-down.
Path parameters
| Param | Type | Notes |
|---|---|---|
id | string (UUID) | NotificationJob.id. Validated via ParseUUIDPipe. |
Response — ApiResponseOf<NotificationJobItemDto>
{
"success": true,
"data": {
"id": "nj-uuid",
"queue": "notification:email",
"jobName": "send-email",
"channel": "EMAIL",
"status": "COMPLETED",
"attempts": 1,
"maxAttempts": 3,
"lastError": null,
"scheduledAt": "2026-06-22T14:00:00.000Z",
"startedAt": "2026-06-22T14:00:00.500Z",
"completedAt": "2026-06-22T14:00:01.200Z",
"createdAt": "2026-06-22T13:59:59.000Z"
}
}The full Prisma row is returned — including payload (job parameters), result (response from provider), latencyMs (Wave E Batch 2 observability), costUsd (reserved, always null), and dedupeKey. The DTO shape above documents the typed surface used by NotificationJobItemDto; additional Prisma fields are present in the raw response. See Observability for latencyMs / costUsd semantics.
Errors
| HTTP | Reason / i18nKey |
|---|---|
401 | Missing / invalid admin session |
403 | Permission denied (notification:read required) |
404 | error.notification.job_not_found — no NotificationJob row with the supplied ID |
Side effects
None. Single notificationJob.findUnique.
Code samples
# Health snapshot — auto-refresh every 30s in the admin UI
curl 'https://api.bio.re/api/v1/admin/notifications/health' \
-H 'Cookie: admin_session=...'
# Failure trend for the last 30 days
curl 'https://api.bio.re/api/v1/admin/notifications/failure-trend?days=30' \
-H 'Cookie: admin_session=...'
# Single job detail (deep-link from queue table)
curl 'https://api.bio.re/api/v1/admin/notifications/jobs/nj-uuid' \
-H 'Cookie: admin_session=...'type HealthResponse = {
healthy: boolean;
checks: Record<string, { ok: boolean; detail: string }>;
};
type FailureTrendItem = { hour: string; count: number };
type JobDetail = {
id: string;
queue: string;
jobName: string;
channel: string;
status: string;
attempts: number;
maxAttempts: number;
lastError: string | null;
scheduledAt: string | null;
startedAt: string | null;
completedAt: string | null;
createdAt: string;
// Wave E Batch 2 observability columns
latencyMs: number | null;
costUsd: string | null;
};
async function getHealth(): Promise<HealthResponse> {
const res = await fetch('/api/v1/admin/notifications/health', { credentials: 'include' });
return (await res.json()).data;
}
async function getFailureTrend(days = 7): Promise<FailureTrendItem[]> {
const res = await fetch(`/api/v1/admin/notifications/failure-trend?days=${days}`, {
credentials: 'include',
});
return (await res.json()).data;
}
async function getJob(id: string): Promise<JobDetail> {
const res = await fetch(`/api/v1/admin/notifications/jobs/${id}`, { credentials: 'include' });
return (await res.json()).data;
}Source
| Source | Path | Lines |
|---|---|---|
Controller (GET /health) | apps/api-core/src/modules/notification/admin-notification.controller.ts | 185–192 |
Controller (GET /failure-trend) | apps/api-core/src/modules/notification/admin-notification.controller.ts | 225–232 |
Controller (GET /jobs/:id) | apps/api-core/src/modules/notification/admin-notification.controller.ts | 367–375 |
| Response DTO (health) | apps/api-core/src/modules/notification/dto/admin-notification-response.dto.ts | 9–15 (NotificationHealthResponseDto) |
| Response DTO (failure-trend item) | apps/api-core/src/modules/notification/dto/admin-notification-response.dto.ts | 87–93 (FailureTrendItemDto) |
| Response DTO (job item) | apps/api-core/src/modules/notification/dto/admin-notification-response.dto.ts | 125–161 (NotificationJobItemDto) |
Service (getHealth) | apps/api-core/src/modules/notification/admin-notification.service.ts | 181–224 |
Service (getFailureTrend) | apps/api-core/src/modules/notification/admin-notification.service.ts | 290–310 |
Service (getJobDetail) | apps/api-core/src/modules/notification/admin-notification.service.ts | 641–645 |
| Prisma model | packages/prisma/prisma/schema.prisma | NotificationJob (2203), NotificationEventRollup (1805) |
| Live response | NOT verified — sourced from DTOs + service shapes cited above |
Webhook — Rotate HMAC Secret (admin)
Generate a new cryptographically random HMAC signing secret for a webhook endpoint. The plaintext value is returned EXACTLY ONCE — subsequent GETs redact it. Subscribers must update their verification config before the next delivery.
Get Load Packages
Public list of suggested wallet top-up amounts plus the absolute min/max limits and active currency. Drives the load-amount picker in the wallet UI.