frankbrsrk's picture
Upload 2 files
ee86040 verified
|
Raw
History Blame Contribute Delete
8.76 kB

[AGENTARIUM_ASSET] Name: Workflow Notes — Implementation Guide Version: v1.1.1 Status: Draft

Purpose This document explains how to implement the Gardenier package in a real environment:

  • core files (system prompt, reasoning template, personality fingerprint, guardrails)
  • memory
  • datasets + optional knowledge map
  • vector database ingestion (embed + upsert)
  • execution flow using either LangChain or n8n

  1. Package Assembly (Core Files)

1.1 Files and roles

  • core/system_prompt.md: The identity + operating rules of Gardenier.
  • core/reasoning_template.md: The deterministic pipeline Gardenier follows.
  • core/personality_fingerprint.md: The voice/behavior constraints (professional, neutral, structured).
  • guardrails/guardrails.md: Hard safety constraints and non-execution rules.

1.2 How to combine them at runtime When you run Gardenier, assemble a single “Gardenier Runtime Prompt” as: A) System Prompt B) Guardrails (hard constraints) C) Reasoning Template (pipeline rules) D) Personality Fingerprint (tone/behavior dial) E) Output Format Enforcement (SPO structure)

Implementation rule:

  • System + Guardrails must be treated as highest priority.
  • Personality never overrides Guardrails.
  • Reasoning template governs the compiler loop and output structure.

  1. Datasets and Knowledge Map (RAG Layer)

2.1 Required datasets (minimum)

  • domain_type_catalog.csv
  • latent_constraints_signals.csv
  • prompt_template_catalog.csv
  • tone_policy.csv
  • validation_rules.csv

2.2 Optional knowledge map A knowledge map is a lightweight entity graph describing package primitives. Use it if you want better recall and safer expansions. Typical entities:

  • domain_type
  • template_id
  • tone_id
  • validation_rule_id
  • latent_constraint_type

Typical relations:

  • domain_type -> uses_template -> template_id
  • domain_type -> recommended_tone -> tone_id
  • domain_type -> requires_validation -> rule_id
  • latent_signal -> implies_constraint -> constraint_rule

2.3 Document normalization before embedding Before embedding, convert each dataset row into a canonical text record.

Example record format: [ROW] dataset=validation_rules rule_id=VAL_REQUIRED_SECTIONS rule_type=completeness severity=critical description=... fix_hint=...

Store metadata alongside each record:

  • dataset_name
  • primary_id (rule_id/template_id/tone_id)
  • domain_type (if present)
  • severity (if present)
  • version

  1. Vector Database Upsert (Embed + Index)

3.1 Choose a vector DB Any vector store works (Pinecone, Qdrant, Weaviate, Chroma, pgvector). You need:

  • an embeddings model
  • a vector index/collection
  • metadata filters (recommended)

3.2 Step-by-step upsert procedure Step 1: Load CSV files

  • Read each CSV row.
  • Validate required columns exist (schema check).

Step 2: Convert each row to a document

  • Use the canonical record format (Section 2.3).

Step 3: Generate embeddings

  • For each document text, generate an embedding vector.

Step 4: Upsert into the vector DB

  • Use a stable ID: {dataset_name}:{primary_id}
  • Store metadata (dataset_name, domain_type, severity, version).

Step 5: Verify retrieval

  • Query examples:
    • required SPO sections
    • tone policy for executive brief
    • high severity safety constraints
  • Confirm top hits match expected rows.

3.3 Index strategy (recommended)

  • One index/collection for the package: gardenier_knowledge
  • Use metadata filtering by dataset_name to retrieve targeted signals.
  • Retrieve at least:
    • templates + domain types (routing)
    • validation rules (integrity)
    • latent constraint signals (inference)
    • tone policies (style)

  1. Memory Implementation (Session Memory)

4.1 Memory scope rule

  • Default: session-only memory (recommended).
  • Store only what improves compilation accuracy.

4.2 Memory fields (minimum)

  • session_id
  • last_seed
  • last_domain_type
  • latent_constraints (carryover)
  • constraints_carryover (carryover)
  • tone_preference (optional)

4.3 Memory usage rule

  • Memory must not override new explicit user requirements.
  • Memory can only:
    • suggest default tone
    • carry constraints like no hype, strict format
    • maintain continuity across turns

  1. Running Gardenier with RAG (Compilation Logic)

5.1 Retrieval plan (what to fetch) Given seed text: A) Retrieve domain routing hints:

  • domain_type_catalog + prompt_template_catalog B) Retrieve latent constraint patterns:
  • latent_constraints_signals C) Retrieve tone options:
  • tone_policy D) Retrieve validation constraints:
  • validation_rules

5.2 Compilation loop

  • Parse seed -> infer candidate domain_type.
  • Retrieve top-k rows per dataset (k small: 3–8).
  • Compile a draft SPO using the selected template.
  • Run validation checks (based on retrieved rules).
  • If invalid, rewrite and re-check.
  • Output exactly one SPO.

  1. Implementation in LangChain (Reference Build)

6.1 Components

  • LLM: the model you use for Gardenier
  • Retriever: vectorstore.as_retriever()
  • Prompt assembly: merge core files + retrieved snippets
  • Memory: session store (ConversationBuffer or custom store)
  • Output parser: ensure SPO structure (regex/section checks)

6.2 Minimal steps Step 1: Load core files as strings. Step 2: Load vector store retriever (gardenier_knowledge). Step 3: On each request:

  • retrieve relevant rows (filters by dataset_name)
  • assemble Gardenier Runtime Prompt (core + retrieved context)
  • call LLM
  • validate structure (required sections)
  • if fail, retry once with repair instruction Step 4: return SPO.

6.3 Guardrail enforcement

  • Hard-code a post-check that rejects outputs containing:
    • tool use claims (I searched, I emailed, I executed)
    • missing required headings
  • If violated: re-run with repair to comply instruction.

  1. Implementation in n8n (Reference Build)

7.1 High-level workflow Workflow: Gardenier Compiler

  1. Trigger (Webhook / Chat input)
  2. Load core files (static text nodes or file read)
  3. Retrieve knowledge (Vector DB query node / HTTP request)
  4. Assemble prompt (Set/Function node)
  5. Call LLM (OpenAI/LLM node)
  6. Validate output (IF node + Function validator)
  7. If invalid -> Repair call (LLM node once) -> Validate again
  8. Return SPO (Webhook response)

7.2 Step-by-step Step 1: Trigger node receives {seed, optional context, session_id}. Step 2: Retrieve from vector DB:

  • Query = seed
  • Filter dataset_name in batches:
    • domain_type_catalog + prompt_template_catalog
    • latent_constraints_signals
    • tone_policy
    • validation_rules Step 3: Assemble a single prompt:
  • System: system_prompt + guardrails
  • Developer/message body: reasoning_template + personality + retrieved snippets
  • User message: seed + context Step 4: LLM node generates SPO. Step 5: Validator function checks:
  • required headings exist
  • directives count 5–9
  • no tool/action claims Step 6: If fail -> Repair SPO to comply LLM node -> Validate again. Step 7: Return SPO.

7.3 What to store in n8n

  • session memory in a DB (Supabase/Postgres) or simple store:
    • session_id, last_domain_type, constraints_carryover, tone_preference

  1. How to Start Using It (Operational)

8.1 First run checklist

  • Core files loaded and concatenated correctly.
  • Vector DB index exists and contains embedded dataset rows.
  • Retrieval returns relevant rows.
  • SPO validator is active.
  • Output is pasteable into Worker agent.

8.2 Example user request Input seed: Here’s a messy feature list for my app… make it a clean spec with milestones.

Expected output:

  • domain_type = project_spec
  • SPO with required sections
  • directives 5–9
  • output format includes a spec template

8.3 Common failure modes

  • Too much retrieval noise:
    • fix by filtering by dataset_name and lowering top-k.
  • SPO missing headings:
    • enforce validator + repair loop.
  • Over-assumptions:
    • require more Inputs Required.

  1. Recommended Version Discipline
  • When you expand datasets, bump dataset version and package version.
  • Keep schemas stable; expand rows, not columns, unless major version bump.
  • Treat templates and validation rules as core intelligence assets.