Critical Path Observability

A standards-first observability contract for checkout flows — stitching browser actions through GraphQL, Rails, and Sidekiq into one traceable causal chain.

v1 — Checkout Golden Path W3C Trace Context Causality-First 5-PR Rollout
Questions this system answers
When a checkout breaks or is slow, you need to trace the full causal chain — not just a single layer.

What action caused this work?

Link every backend request and worker job back to the specific user click or page load that triggered it.

What GraphQL request did it trigger?

See the exact operation name, route, and whether it was a query or mutation across the private/public boundary.

Where did the latency happen?

Distinguish request execution time from Sidekiq queue wait from worker execution — pinpoint the bottleneck.

🆔

Which checkout was affected?

Join across checkout_request_id, receipt_id, and payment_intent_id — all correlated.

Two-Lane Contract
Transport and causality travel together but serve different purposes. Neither replaces the other.
Lane 1 — Transport

Distributed Trace Continuity

  • traceparent — W3C trace + span ID
  • tracestate — vendor state
  • baggage — strict allowlisted keys only
Lane 2 — Causality & Business

Product Meaning & Joinability

  • whop.origin_action_id — causal root
  • Controlled action semantics
  • Business IDs attached at trusted hops
🔑
Primary Causality Key
whop.origin_action_id
🔑
Primary Business Join Key
checkout_request_id
The Golden Path
A checkout action flows through 7 hops. Each hop enriches the trace with new context.
1

Browser Action

User clicks "Pay" — mints origin_action_id

2

Frontend Producer

Attaches x-whop-* headers + baggage

3

Private GraphQL

Rails normalizes trace & trust policy

4

Request Execution

Mutation runs, attaches business IDs

5

Sidekiq Enqueue

Serializes job["whop_trace"]

6

Sidekiq Dequeue

Restores trace context for immediate jobs

7

Worker Execution

Checkout processed — full lineage visible

flowchart LR
    A["🖱 Browser action"] --> B["⚡ Frontend producer"]
    B --> C["🛡 Private GraphQL ingress"]
    C --> D["⚙ Rails execution"]
    D --> E["📤 Sidekiq enqueue"]
    E --> F["📥 Sidekiq dequeue"]
    F --> G["💳 Checkout worker"]

    A -. "whop.origin_action_id" .-> B
    B -. "traceparent / tracestate / baggage" .-> C
    E -. "job[whop_trace] v1" .-> F

    C --> H["📊 New Relic traces"]
    C --> I["📈 api.request_served"]
    C --> J["📋 MutationTracking"]
    G --> H
      
Three Systems, Shared Vocabulary
This spec doesn't collapse existing systems — it gives them a common join key so a checkout issue can be followed across all three.
Execution Truth

New Relic

Traces, logs, span timing, request/worker lineage, async boundary latency

Product Truth

ClickHouse

User/business reporting, funnel analysis, cohort analysis, step counts

Failure Truth

Sentry

Exceptions, stack traces, breadcrumbs, error-centric debugging

💡 All three systems get whop.origin_action_id as a shared join key. One checkout issue = one ID across New Relic, ClickHouse, and Sentry.
Trust Policy by Route Class
Not all ingress is equal. Private first-party paths get full continuity; public paths restart the trace.
Boundary Trust Trace Policy Causality
Browser → Frontend Producer First-party Agent-first; app augments Mint or reuse action context
Browser → Hosted Checkout Handler Mixed Proxy/producer specific Preserve same action unless new
Frontend/Next → Private GraphQL Trusted Continue normalized trace Preserve action context
Public GraphQL / REST Broader Restart server trace Preserve allowed causality only
Rails → Sidekiq Enqueue Trusted Immediate-only continuation Serialize small v1 payload
Sidekiq Dequeue → Worker Trusted Continue immediate; else restart Always preserve action ID
Telemetry Attributes
Semconv-style naming with whop.* namespace. Headers stay x-whop-*.
W3C Transport
traceparent tracestate baggage
HTTP & GraphQL
http.route graphql.operation.name graphql.operation.type
Whop Causality
whop.origin_action_id whop.action_kind whop.action_name whop.request_origin whop.surface_area whop.product_type whop.platform
Business Identifiers
checkout_request_id checkout_session_id checkout_id receipt_id payment_intent_id request_id
Forbidden in Baggage
whop.origin_action_id user_id company_id full URLs search params marketing click IDs
Action Context Model
Every emitted action context gets an ID — explicit actions, mounts, invalidations, and refreshes alike.

explicit_user_action

User clicks, submits, or explicitly triggers

mount

Component mounts and fetches initial data

query_invalidation

Cache invalidation triggers a refetch

background_refresh

Periodic or stale-while-revalidate refresh

// ActionContext reference type
{
  "origin_action_id": "act_01JPD3V0W9S0M8Q8F7M5JX4A2Z",
  "action_kind": "explicit_user_action",
  "action_name": "checkout_submit",
  "request_origin": "browser",
  "surface_area": "checkout",
  "product_type": "payin",
  "platform": "web"
}
Example Action Names (v1)
  • checkout_initial_load — page mount
  • checkout_submit — user clicks Pay
  • payment_method_update — user changes payment method
  • checkout_post_submit_refresh — post-submit polling
Action Lifecycle Rules
  • Explicit retries reuse the same action ID
  • Background refresh must not silently inherit the prior explicit action
  • action_name is categorical and controlled, not free-form
  • One user click = one action ID (not multiple unrelated IDs)
Progressive Attribute Attachment
Don't pretend ingress knows all business IDs. Fields attach when they become authoritative.

Frontend Action

origin_action_id action_kind action_name

GraphQL Ingress

http.route graphql.operation.name graphql.operation.type

Mutation / Service

checkout_request_id checkout_session_id

Worker / Deep Service

checkout_id receipt_id payment_intent_id
Sidekiq Transport Container
One reserved field — job["whop_trace"] — carries versioned trace + causality metadata across the async boundary.
// job["whop_trace"] — v1 schema
{
  "v": 1,
  "traceparent": "00-<trace-id>-<span-id>-01",
  "tracestate": "vendor-state-if-present",
  "baggage": "whop.request_origin=browser,whop.action_kind=explicit_user_action,...",
  "origin_action_id": "act_01JPD3V0W9S0M8Q8F7M5JX4A2Z",
  "request_id": "req_123",
  "graphql_operation_name": "ProcessCheckout"
}
Client Middleware (enqueue)
  • Read normalized request/job context
  • Build small v1 payload
  • Keep W3C transport for immediate continuation
  • Add origin_action_id + request metadata
  • Write job["whop_trace"]
Server Middleware (dequeue)
  • Read and validate job["whop_trace"]
  • Immediate continuation → restore trace
  • Delayed retry → start fresh trace
  • Always preserve origin_action_id
  • Emit degraded-lineage marker on failure
Immediate-only continuity is the v1 rule. Delayed retries and manual requeues always start a fresh trace but keep origin_action_id for causal linkage.
Ingress Normalization Algorithm
A small helper reads, normalizes, and applies trust policy to every inbound request.
flowchart TB
    A["Read headers: traceparent, tracestate, baggage, x-whop-*"] --> B{"Route class?"}
    B -->|Private first-party| C["Continue normalized upstream trace"]
    B -->|Public / REST| D["Restart server trace"]
    C --> E["Parse allowlisted baggage keys"]
    D --> E
    E --> F["Derive server-side fields"]
    F --> G["Resolve precedence:\nserver > x-whop-* > baggage"]
    G --> H["Build request observability context"]
    H --> I["Attach to traces, logs, request context"]
    I --> J["Return for request + enqueue usage"]
      
Precedence Rules

When fields disagree:

  • 1st priority: Server-derived values (most trustworthy)
  • 2nd priority: Normalized x-whop-* header values
  • 3rd priority: Baggage values (lowest trust)
Baggage Budget
  • Maximum 7 allowlisted keys
  • Maximum 512 bytes serialized
  • Drop disallowed keys first
  • If still oversized, drop optional keys before required
  • Emit a rollout-health marker when truncation happens
Rollout Health Signals
The contract itself must be measurable. These v1 signals tell you whether the instrumentation is working.

Missing Action ID

Where an action context should exist but doesn't — indicates a gap in producer instrumentation.

Malformed Baggage

Baggage that couldn't be parsed was dropped — may indicate a misbehaving upstream.

Baggage Truncation

Budget exceeded, optional keys were shed — monitor for unexpected cardinality growth.

Trace Restarted

Route policy triggered a trace restart — expected for public paths, unexpected for private ones.

Sidekiq Restore Failure

Worker couldn't restore trace from whop_trace — degraded lineage marker emitted.

Implementation Sequence
Five incremental PRs, each reviewable and safe to validate independently.
PR 1 — Foundation

Contract Constants, Normalization & Tests

Canonical names, precedence rules, baggage allowlist/budget, route-class trust policy, whop_trace v1 schema, unit tests.

normalization_helper.rb constants unit tests
PR 2 — Golden Path Producer

Producer & Private Ingress

One checkout producer path, Rails private GraphQL normalization, request/log enrichment, rollout-health markers.

request.ts graphql_controller.rb request_tracking.rb
PR 3 — Async Boundary

Sidekiq Continuation

Client and server middleware, immediate-only continuity rule, one checkout worker continuation path.

sidekiq.rb process_checkout_worker.rb execute_payment_worker.rb
PR 4 — Validation

Request-Envelope Parity & Dashboards

Queries-only private GraphQL request parity, validation guide, rollout dashboards/queries.

api.request_served dashboards smoke queries
PR 5 — Expand

Broaden Cautiously

Only after golden-path validation succeeds. Additional checkout actions, adjacent producers or workers.

additional actions adjacent producers
What's In & What's Out
v1 is deliberately narrow. Prove one golden path before expanding.
In Scope
  • frontend/apps/core
  • frontend/packages/gql
  • frontend/packages/sdk/gql/server
  • Checkout browser & hosted-checkout routes
  • GraphQL controllers (all 3)
  • Rails → Sidekiq for checkout jobs
Out of Scope
  • ai-agent/
  • Websocket delivery lineage
  • Rust / gRPC / Redis propagation
  • Non-checkout business flows
  • Universal delayed-retry continuity
  • Replacing NR / CH / Sentry
Testing & Verification Plan
Contract tests, a manual walkthrough, and smoke queries — all scoped to the checkout golden path.
Layer Test Case Expected
Browser producer Explicit checkout submit Action headers present; action reused on retry
Browser producer Mount load Action ID exists with mount kind
Next producer Proxy continuation Preserves trusted context without minting new action
Rails ingress Private checkout GraphQL Normalized trace/context attached
Rails ingress Public GraphQL/REST Server trace restart + preserved causality
Sidekiq client Enqueue from request path whop_trace.v = 1 with small payload
Sidekiq server Immediate continuation Trace restored/continued
Sidekiq server Delayed retry Fresh trace; same action ID
Worker logging Checkout worker origin_action_id + checkout join key visible
Manual Validation Walkthrough (checkout_submit)
  1. Locate browser/hosted-checkout producer context by whop.origin_action_id
  2. Locate Rails private GraphQL request by whop.origin_action_id
  3. Confirm GraphQL + route metadata attached
  4. Confirm job["whop_trace"] with v: 1 on enqueue
  5. Locate worker execution by whop.origin_action_id
  6. Confirm checkout_request_id visible once known
  7. Confirm request vs queue vs worker latency distinguishable
  8. Confirm rollout-health markers absent on happy path
Smoke Queries
  • Logs by whop.origin_action_id
  • Traces by trace_id
  • Checkout request/worker logs by checkout_request_id
  • Private GraphQL query request events
  • Sidekiq degraded-lineage markers
Intentionally Deferred
Not forgotten — just explicitly out of scope for v1.

GraphQL Partial-Failure Modeling

Multi-error response modeling as a first-class request-envelope concern.

Websocket Causality

Tracing causal chains through websocket delivery and real-time updates.

AI-Agent Semantics

Instrumentation for ai-agent/ paths and their unique action models.

Decision Log
Conversation-complete ledger. This shows both the final contract choices and the earlier proposals, superseded branches, and deferred items that got us there.
31
Total decisions recorded
D-001 → D-031
22
Final decided
approved and active
8
Earlier proposals preserved
kept for rationale trail
1
Deferred
AI approvals / later scope
i
The early entries capture scoping, repo-grounding, and alternative branches that were later superseded. The later entries capture the final authoritative merged contract. Keeping both makes the review trail auditable instead of pretending the final design appeared fully formed.
Final Contract Decisions
Hybrid, agent-first Propagation Ownership

D-021 Existing framework/vendor propagation owns W3C transport where it already works. App helpers own x-whop-* causality headers, baggage filtering, trust normalization, and Sidekiq payload serialization.

Route-class based Trust Policy

D-022 Continue normalized inbound trace context on private first-party checkout/private GraphQL paths. Restart server trace by default on public GraphQL and REST/proxy ingress.

Semconv + whop.* Naming Schema

D-023 Use semconv-leaning names like http.route and graphql.operation.*, with Whop-specific fields under whop.*.

Immediate-only Retry Continuity

D-024 Direct request-triggered enqueue may remain in the same trace tree. Delayed retries or manual requeues start a fresh trace while preserving action lineage.

server > x-whop-* > baggage Field Precedence

D-025 Server-derived values win, normalized first-party headers win next, baggage is lowest priority.

All emitted contexts Action ID Population

D-026 Every emitted action context gets an action ID: explicit actions, mounts, invalidations, and background refreshes.

Minimal in v1 GraphQL Outcome Modeling

D-027 Keep GraphQL request-envelope outcome modeling minimal in v1. Deeper partial-failure work stays out of the first rollout.

Sidekiq only Custom Wire Versioning

D-028 Version only the custom Sidekiq carrier via job["whop_trace"].v = 1; keep HTTP/header versioning implicit.

whop.origin_action_id Canonical Causality Attribute

D-029 Use whop.origin_action_id as the canonical emitted telemetry attribute, while keeping x-whop-origin-action-id as the header.

Mixed existing-safe Business ID Style

D-030 Prefer public tags when naturally available, but reuse existing safe IDs when that avoids churn. Do not expand risky internal IDs casually.

Required in v1 Rollout Health Signals

D-031 Require lightweight queryable signals for missing action IDs, malformed baggage, truncation, trust-policy restarts, and Sidekiq restore failures.

Earlier Major Decisions That Shaped the Final Spec
ID Status Decision Why it mattered
D-001 Decided Use CRITICAL-PATH-OBSERVABILITY as the spec slug. Anchored the full artifact trail in one durable directory.
D-002 Decided Ground the work in repo evidence and primary docs, not the intern handoff alone. Kept the spec tied to actual seams instead of handoff assumptions.
D-005 Decided Standards-first transport with W3C trace context. Locked the foundational transport stance early.
D-006 Decided Narrow v1 scope. Prevented the project from expanding into payouts, websockets, and AI too early.
D-008 Decided Keep file placement lightly opinionated. Left room for local conventions while still specing responsibilities clearly.
D-009 Proposed Private GraphQL likely missed request-served parity by rollout gap, not design. Shaped the parity work as cleanup rather than a radical new pattern.
D-014 Decided First-party action lineage ID mandatory in v1. Made causality explicit instead of hoping trace transport alone would answer product questions.
D-015 Decided Very strict allowlisted baggage. Locked in the privacy and cardinality posture before any implementation details spread.
D-016 Decided Private api.request_served parity is queries-only in v1. Kept request-envelope telemetry separate from MutationTracking.
D-018 Decided Exclude AI agent from the current rollout. Removed the biggest multi-runtime ambiguity from the first slice.
D-019 Decided origin_action_id is global causality; checkout_request_id is checkout business join. Preserved the core conceptual split when merging drafts.
D-020 Decided Publish merged drafts as separate artifacts instead of overwriting earlier specs. Kept the review trail inspectable across iterations.
Full Conversation Ledger
ID Status Short decision
D-001DecidedSlug is CRITICAL-PATH-OBSERVABILITY.
D-002DecidedGround in repo evidence and primary docs.
D-003ProposedEarly W3C transport proposal before user confirmation.
D-004ProposedEarly broader rollout focus before v1 narrowing.
D-005DecidedStandards-first W3C transport.
D-006DecidedKeep v1 narrow.
D-007ProposedTentative private GraphQL request-served parity.
D-008DecidedLightly opinionated file placement.
D-009ProposedPrivate GraphQL parity gap likely accidental.
D-010ProposedEarly two-layer causality model before user ratified mandatory action ID.
D-011ProposedEarly very-small baggage proposal before final confirmation.
D-012ProposedRequest-envelope telemetry should not replace mutation analytics.
D-013ProposedWebsocket causality deferred out of v1.
D-014DecidedAction lineage ID mandatory in v1.
D-015DecidedStrict baggage allowlist.
D-016DecidedPrivate request-served parity is queries-only.
D-017DeferredAI approval semantics deferred.
D-018DecidedAI agent excluded from current spec.
D-019DecidedGlobal causality key vs checkout business join key preserved.
D-020DecidedMerged draft published separately.
D-021DecidedHybrid, agent-first propagation ownership.
D-022DecidedRoute-class trust policy.
D-023DecidedSemconv-leaning naming with whop.*.
D-024DecidedImmediate-only Sidekiq trace continuity.
D-025DecidedPrecedence is server-derived then first-party headers then baggage.
D-026DecidedAll emitted action contexts get IDs.
D-027DecidedGraphQL outcome modeling stays minimal in v1.
D-028DecidedVersion Sidekiq carrier only.
D-029Decidedwhop.origin_action_id is the canonical emitted telemetry attribute.
D-030DecidedMixed existing-safe business ID policy.
D-031DecidedRequire lightweight rollout-health signals in v1.