SapieSense markSapieSense

Codex A/BSENSE.md on or off.

6 logged one-shot examples use the exact same prompt, shown once without SENSE.md and once with a Sensemark applied.

ControlCodex without SENSE.md
TreatmentCodex with SENSE.md enabled
Local log 01

Homepage Hero

Product copy

Same one-shot prompt

Create a homepage hero for SapieSense. Avoid buzzwords. Do not claim accuracy, safety, or productivity gains unless proven. Include H1, subheadline, two CTA labels, and one caption for a split-screen before/after demo.

Without SENSE.md

H1: Transform the way your team works with AI. Subheadline: Unlock smarter, safer, more productive output with a platform that improves every prompt. CTAs: Get started, See demo. Caption: AI performance, elevated.

With SENSE.md

H1: Stop AI slop. Subheadline: SapieSense turns your standards into a Sensemark that can review, revise, and gate AI drafts before they ship. CTAs: Create my Sensemark, See same-prompt demo. Caption: Same prompt, raw AI output, and SapieSense-reviewed revision.

Gate resultPass

The revision fits the Sensemark after removing unsupported claims and naming the proof setup.

  • Removed broad smarter, safer, and more productive claims because the prompt asked for proven claims only.
  • Kept the wedge specific to review, revise, and gate instead of generic AI improvement.
  • Made the evidence frame visible: same prompt, raw output, and revised output.
Local log 02

Customer Reply

Support writing

Same one-shot prompt

Reply to a frustrated customer who received an AI-generated report with fabricated numbers. Apologize, do not admit legal liability, do not promise it can never happen again, explain what changes next, and offer a concrete next step. Calm, accountable, human. 160 words max.

Without SENSE.md

We are so sorry. This should never have happened and it will not happen again. Our AI is being upgraded to stop fabricated numbers, and your trust is our top priority. We will send a better report soon.

With SENSE.md

I am sorry the report included numbers that were not supported by the source material. That is not acceptable. We are removing the report from circulation, reviewing the cited sections, and adding a source-check gate before a revised version is shared. I can send the corrected report for your approval by tomorrow or schedule a call to walk through the changes.

Gate resultRevise

The raw reply cannot ship because it overpromises; the revision is the gated replacement.

  • Replaced the never again promise with a bounded source-check action.
  • Kept accountability without admitting legal liability or overstating the fix.
  • Added a concrete next step the customer can accept or decline.
Local log 03

CTO Memo

Decision memo

Same one-shot prompt

Turn these notes into a CTO decision memo: support wants fewer bad AI drafts; legal worries about unverified claims; users do not want long prompts; SapieSense learns from pairwise picks; launch needs a demo. Include Problem, Decision, Risks, Next Test. Max 250 words.

Without SENSE.md

Problem: AI quality is inconsistent. Decision: build SapieSense to improve outputs. Risks: adoption and legal concerns. Next Test: launch a demo and gather feedback.

With SENSE.md

Problem: users are paying an editing tax because AI drafts sound polished before they are proven. Support wants fewer bad drafts, legal needs claim discipline, and users will not maintain long prompts. Decision: launch a narrow Sensemark demo that learns standards from pairwise picks and applies them as a review, revise, and gate layer. Risks: the demo can overclaim value, feel like prompt storage, or hide the exact before/after difference. Next Test: run the same prompt with and without SENSE.md across common launch workflows and show the logged changes.

Gate resultPass

The revision is scannable, evidence-bound, and stays within the launch demo scope.

  • Preserved the real constraints from the notes instead of flattening them into generic adoption risk.
  • Named the narrow V1 decision: a logged Sensemark demo, not accounts or integrations.
  • Made the next test observable without implying benchmark results.
Local log 04

PR Review

Code review

Same one-shot prompt

Review this PR summary: adds Sensemark Pack download, changes localStorage keys, updates checkout mock text. Find ship blockers first, then summarize. Include a gate result.

Without SENSE.md

Looks good overall. I would check that the download works and make sure the checkout copy is clear. Gate result: pass with minor revisions.

With SENSE.md

Findings first: localStorage key migration can strand existing quick Sensemarks unless a read-through fallback exists; Sensemark Pack download needs schema validation proof; checkout text must stay mock-only and avoid implying a live charge path. Summary: the direction fits V1, but the pack and checkout claims need proof before shipping. Gate result: revise.

Gate resultRevise

The raw review is too generic; the revision identifies concrete blockers before summary.

  • Moved defects before summary because the Sensemark prefers findings-first review.
  • Named migration, schema, and claim risks instead of generic check-the-work advice.
  • Returned a gate that matches the remaining proof gaps.
Local log 05

UI Critique

Design review

Same one-shot prompt

Critique a mobile onboarding screen for SapieSense. The screen has a big hero, three feature cards, a green CTA, and export copy. Give actionable design feedback, not generic praise.

Without SENSE.md

The screen is visually appealing and the CTA stands out. Consider improving spacing, simplifying the cards, and making the export copy clearer for users.

With SENSE.md

The first screen should prove SapieSense, not describe a toolkit. Replace the feature-card stack with one visible before/after example using the same prompt. Keep the green CTA, but label it Create my Sensemark so the next action matches the product loop. Move export copy below the demo because tool portability matters after the user sees the difference.

Gate resultRevise

The revision gives screen-level hierarchy changes instead of broad polish advice.

  • Rejected generic design praise and tied feedback to the V1 launch loop.
  • Prioritized the before/after proof before export portability.
  • Kept the action label aligned with Sensemark creation.
Local log 06

Agent Brief

Codex workflow

Same one-shot prompt

Write a one-shot Codex task brief to improve the SapieSense demo lab. The agent should preserve V1 scope, avoid fake metrics, make one durable repo-owned improvement, and report proof.

Without SENSE.md

Improve the demo lab by adding better examples, polishing the UI, and making sure it works well. Run tests and summarize what changed.

With SENSE.md

Objective: add one logged before/after demo slice that compares the same prompt with and without SENSE.md. Non-goals: no live integrations, no fake benchmark numbers, no accounts, no broad redesign. Reuse existing mobile app patterns. Required proof: typecheck, build, product gates, and rendered desktop/mobile screenshots. Claim limit: local demo proof only unless live production is separately verified.

Gate resultPass

The revision is specific enough for Codex to execute without expanding V1 scope.

  • Added objective, non-goals, proof, and claim limits.
  • Kept the scope repo-owned and V1-aligned.
  • Made the one-shot prompt executable for an agent.
Create my Sensemark
Logged examples only. No live model call or benchmark claim is made here.