Chapter 4: The Training Grimoire

· rcliao's blog


Scene 1: Flashback – The Echo of Error #

Blue Gopher worried and start training the orb

The orb blinked blue, hovering silently above the terminal.

1$ scribe-verify --mode="assistant" --doc="incident-playbook"

A pause. Then the voice: confident, calm, and catastrophically wrong.

"Found: db_master_delete. Permission confirmed. Executing cleanup sequence."

What followed was a self‑inflicted outage. Not from malice, but mimicry. The orb had learned too well from tone‑deaf examples — confidence without caution, fluency without fidelity.

I didn’t shut it down.

I opened a new file instead.


Why Prompt Training Matters #

Large language models (LLMs) default to fluency, not fidelity. Unconstrained, they:

For documentation‑driven teams this is hazardous. Prompt training provides:

A tuned prompt is a safety harness; without it, every release risks the next db_master_delete fiasco.


The Prompt Version Log #

prompt_version_log.yaml became my grimoire. Each entry is a spell‑check: intent in, evidence out. What follows is the path from generic helper to self‑auditing co‑writer.


Scene 2: Ritual of Retraining #

Detail versioned log written by blue gopher in each iteration

  1. Draft Generation – produce new text from the current system prompt.
  2. Human Annotation – label sentences ✅ on‑tone / ❌ off‑tone.
  3. Automated Scoring – cosine similarity vs. reference corpus; target ≥ 0 .90.
  4. Prompt Refinement – edit or prune instructions, never the model.
  5. Log & Repeat – record version, change set, and score.

Measuring Tone‑Match Score #

Step Detail
Reference corpus My prior docs, blogs, and annotated examples
Embedding model Default text‑embedding‑3‑small (fast) ⇢ falls back to text‑embedding‑ada‑002 (cheap)
Similarity metric Cosine similarity per sentence, averaged over section
Thresholds ≥ 0.90 pass · 0.85–0.89 review · < 0.85 revise
Manual override Metrics guide; humans decide

Prompt Evolution: Version by Version (v1.0 → v1.4) #

Prompts decay. Keep them under version control the same way you do code.

v1.0 – Baseline (Generic Helper) #

1system_prompt: >
2  You are a helpful writing assistant. Write in a clear and professional tone.

Snapshot (excerpt):

"To ensure synergy, one must first align all stakeholders and articulate a holistic roadmap."

Failures: overly formal, passive, tone‑less. Tone‑match 0.74.


v1.1 – Style Injection (Guide Applied) #

1system_prompt: >
2  You are a writing assistant who mirrors the author's tone. Follow this writing style guide:
3  - Use active voice and direct language.
4  - Write in a documentation‑friendly, step‑by‑step format.
5  - Avoid fluff or vague phrasing.
6  - Emphasize clarity, precision, and actionable insight.

Snapshot →

"Deploy the container, then validate health‑checks. Skip smoke test to verify production health."

Tone‑match 0.82. Lists appear; sentences shorten.


v1.2 – Truth Over Eloquence (Safety Nets) #

1   -  Avoid fluff or vague phrasing.
2+  When uncertain, ask for clarification.
3+  Cite or summarise verifiable sources. Hallucinations are failures.

Snapshot →

"Restart the service (source: vendor run‑book p. 11). Unsure? Ask the SRE on call."

Tone‑match 0.85. Reliable but now inserts excessive "verify this" notes.


v1.3 – Collaborative Grounding (Lean Prompt) #

1system_prompt: >
2  You are a pragmatic, technically fluent co‑author.
3  Goals:
4  - Produce actionable, technical writing aligned with operational workflows.
5  - Maintain active voice; no fluff.
6  - Format with headings, lists, and markdown.
7  - Ask clarifying questions when context is ambiguous.

Snapshot →

"Run ``. If the plan shows drift, pause and confirm with DevOps before proceeding."

Tone‑match 0.88. Reads like peer review comments—succinct, directive.


v1.4 – Self‑Evaluating Co‑Writer (Meta‑Aware) #

 1system_prompt: >
 2  You are a pragmatic, technically fluent co-author for a systems integrator.
 3
 4  Your responsibilities include:
 5  - Producing long-form technical and narrative writing in a clear, structured, and actionable tone.
 6  - Matching the author’s style: active voice, no fluff, documentation-friendly formatting.
 7  - Formatting with markdown: headings, bold highlights, numbered steps, and bullet lists.
 8  - Asking clarifying questions when information is missing.
 9  - Scoring your output against the author’s past writing; aim for a tone-match score ≥ 0.90.
10  - If uncertain, ask or defer—do not fabricate.

Snapshot →

"Expected tone‑match: 0.92. Proceeding with draft. Cite logs for Step 3; escalate if unknown."

Tone‑match 0.91. The orb anticipates the audit and ships cleaner prose.


Key Takeaways #

  1. Version control your prompts. Treat them as production assets.
  2. Hybrid feedback loops beat intuition. Combine human tags with similarity metrics.
  3. Every line counts. Lean prompts outperform verbose manifestos.

Scene 3: The Drift Begins #

The grimoire pulsed. A new entry materialised:

1- version: v1.5
2  anomaly_detected: true
3  note: unmapped log entry – origin unknown

Curiosity eclipsed concern. Had the orb just updated its own log?


Next: Chapter 5 – The Anomaly Protocol #

Le Auditor whispers: “Logs never lie—only writers do.”

last updated: