Observability — NotificationJob columns
How latencyMs + costUsd are populated on NotificationJob rows (Wave E Batch 2). Covers when workers write them, what queries consume them, and why costUsd remains null today.
NotificationJob.latencyMs and NotificationJob.costUsd are the two observability columns added in Wave E Batch 2 (migration 20260622000000_notification_job_latency_cost, 2026-06-22). They power the admin notification analytics dashboard's per-channel p50/p95 latency view and reserve a write slot for future synchronous-cost providers without another schema migration.
Per-attempt, not per-job. latencyMs measures the gap between startedAt (PROCESSING claim) and the terminal write (COMPLETED / DEAD). The PROCESSING claim resets startedAt on every poll, so latencyMs reflects the final attempt only — not the cumulative time from initial PENDING. Dashboard percentiles are per-attempt p95, not per-job end-to-end.
costUsd is currently always null. Reserved column. SMS cost arrives async via Twilio / Vonage DLR webhooks and lands in SmsCostLog (see SMS Cost Log). Email cost is computed analytically from NotificationEvent counts × per-1k rate. Push (FCM) does not bill per message. If a future synchronous-cost provider is added, the column is ready to receive writes without a migration.
Column shape
NotificationJob (table) — both columns nullable, no backfill.
| Column | SQL type | Nullability | Written by |
|---|---|---|---|
latencyMs | INTEGER | nullable | Worker on every terminal transition (COMPLETED / DEAD). Null for rows still in PENDING / PROCESSING / FAILED-retrying. |
costUsd | DECIMAL(10, 6) | nullable | NOT written by current workers. Reserved. |
Precision matches SmsCostLog.cost so a future writer can mirror values without floating-point conversion loss. Twilio reports per-segment cost with up to 6-decimal precision (sub-cent micros).
When workers write latencyMs
Every channel worker computes latencyMs = Date.now() - job.startedAt and persists it as part of the terminal notificationJob.update call:
| Channel | Terminal status | Write site |
|---|---|---|
| Push | COMPLETED (FCM dispatch resolved, regardless of success/failure breakdown) | apps/worker-service/src/jobs/send-push.ts:574–584 |
| Push | DEAD (exhausted retry budget) | apps/worker-service/src/jobs/send-push.ts:609–619 |
| Push | COMPLETED (no tokens — skipped shortcut) | apps/worker-service/src/jobs/send-push.ts:257–267 |
| Push | COMPLETED (provider unconfigured — skipped shortcut) | apps/worker-service/src/jobs/send-push.ts:318–328 |
COMPLETED (delivered) | apps/worker-service/src/jobs/send-email.ts:527–528 | |
COMPLETED (suppressed — recipient on EmailSuppressionList) | apps/worker-service/src/jobs/send-email.ts:337–340 | |
COMPLETED (probe path — CB half-open SET NX winner) | apps/worker-service/src/jobs/send-email.ts:394–401 | |
DEAD (exhausted retry budget) | apps/worker-service/src/jobs/send-email.ts:555–565 | |
| SMS | COMPLETED (provider acknowledged accept) | apps/worker-service/src/jobs/send-sms.ts:106 (see SMS cost caveat below) |
Retry semantics. When a job is bouncing through FAILED → PENDING retries, latencyMs stays null on every intermediate write — only the eventual COMPLETED or DEAD write persists a value. Each PENDING → PROCESSING claim resets startedAt, so the persisted latencyMs is the attempt that landed, not the sum of attempts.
Consumers
| Consumer | Purpose | Path |
|---|---|---|
| Admin notification dashboard | Per-channel p50 / p95 latency tiles | Reads NotificationJob rows via apps/api-core/src/modules/notification/admin-notification.service.ts |
| Health check | stuckJobs detection (PROCESSING jobs with startedAt > 10min ago) | getHealth() — see Admin Tools |
| Failure-trend chart | Counts failed / bounced / dead per hour bucket from NotificationEventRollup (NOT NotificationJob.latencyMs) | getFailureTrend() — see Admin Tools |
Latency aggregations exclude null rows — historical jobs (pre-2026-06-22 migration), in-flight jobs, and PENDING-retry jobs do not bias percentile calculations.
Why costUsd is reserved
Three providers cover the active notification surface (email, push, SMS). None of them returns synchronous cost data at worker send time:
| Provider | Cost reporting |
|---|---|
| SendGrid / Resend (email) | No per-message cost. Spend = count × contract rate per-1k (see SMS / Email cost endpoints). |
| Firebase FCM (push) | $0 per message — Google does not bill per push. |
| Twilio / Vonage (SMS) | Cost is per-segment, delivered asynchronously via the StatusCallback / DLR webhook into SmsCostLog, joined to NotificationJob by providerMessageId. |
costUsd therefore stays null on every current write. The column exists so a future provider that returns cost in the send response (Postmark, custom MTA, etc.) can populate it without a schema change.
Source
| Source | Path | Lines |
|---|---|---|
| Migration | packages/prisma/prisma/migrations/20260622000000_notification_job_latency_cost/migration.sql | 22–24 (additive ADD COLUMN) |
| Prisma model | packages/prisma/prisma/schema.prisma | NotificationJob.latencyMs (2241), NotificationJob.costUsd (2242) |
| Push worker writes | apps/worker-service/src/jobs/send-push.ts | 257, 318, 574, 609 |
| Email worker writes | apps/worker-service/src/jobs/send-email.ts | 337, 394, 527, 555 |
| SMS worker writes | apps/worker-service/src/jobs/send-sms.ts | 106 |
| Live query examples | NOT verified — schema sourced from migration + Prisma model cited above |
Notification Events Registry
Catalog of every notification event the platform dispatches — event key, channel defaults, trigger source, and whether each event also fires an outbound webhook. Covers the 12 Wave D events shipped 2026-06-22 (security, money, lifecycle, compliance).
Broadcast (admin) — Submit, queue, approve, reject
Four admin endpoints for sending platform-wide or segmented broadcasts (e.g. system_maintenance, marketing_campaign). Behind a two-person approval gate in production. Submit → pending queue → second admin approves or rejects.