Upload 2 files

ee86040 verified 6 months ago

8.76 kB

	[AGENTARIUM_ASSET]
	Name: Workflow Notes — Implementation Guide
	Version: v1.1.1
	Status: Draft

	Purpose
	This document explains how to implement the Gardenier package in a real environment:
	- core files (system prompt, reasoning template, personality fingerprint, guardrails)
	- memory
	- datasets + optional knowledge map
	- vector database ingestion (embed + upsert)
	- execution flow using either LangChain or n8n

	-------------------------------------------------------------------------------

	1) Package Assembly (Core Files)

	1.1 Files and roles
	- core/system_prompt.md:
	The identity + operating rules of Gardenier.
	- core/reasoning_template.md:
	The deterministic pipeline Gardenier follows.
	- core/personality_fingerprint.md:
	The voice/behavior constraints (professional, neutral, structured).
	- guardrails/guardrails.md:
	Hard safety constraints and non-execution rules.

	1.2 How to combine them at runtime
	When you run Gardenier, assemble a single “Gardenier Runtime Prompt” as:
	A) System Prompt
	B) Guardrails (hard constraints)
	C) Reasoning Template (pipeline rules)
	D) Personality Fingerprint (tone/behavior dial)
	E) Output Format Enforcement (SPO structure)

	Implementation rule:
	- System + Guardrails must be treated as highest priority.
	- Personality never overrides Guardrails.
	- Reasoning template governs the compiler loop and output structure.

	-------------------------------------------------------------------------------

	2) Datasets and Knowledge Map (RAG Layer)

	2.1 Required datasets (minimum)
	- domain_type_catalog.csv
	- latent_constraints_signals.csv
	- prompt_template_catalog.csv
	- tone_policy.csv
	- validation_rules.csv

	2.2 Optional knowledge map
	A knowledge map is a lightweight entity graph describing package primitives.
	Use it if you want better recall and safer expansions.
	Typical entities:
	- domain_type
	- template_id
	- tone_id
	- validation_rule_id
	- latent_constraint_type

	Typical relations:
	- domain_type -> uses_template -> template_id
	- domain_type -> recommended_tone -> tone_id
	- domain_type -> requires_validation -> rule_id
	- latent_signal -> implies_constraint -> constraint_rule

	2.3 Document normalization before embedding
	Before embedding, convert each dataset row into a canonical text record.

	Example record format:
	[ROW]
	dataset=validation_rules
	rule_id=VAL_REQUIRED_SECTIONS
	rule_type=completeness
	severity=critical
	description=...
	fix_hint=...

	Store metadata alongside each record:
	- dataset_name
	- primary_id (rule_id/template_id/tone_id)
	- domain_type (if present)
	- severity (if present)
	- version

	-------------------------------------------------------------------------------

	3) Vector Database Upsert (Embed + Index)

	3.1 Choose a vector DB
	Any vector store works (Pinecone, Qdrant, Weaviate, Chroma, pgvector).
	You need:
	- an embeddings model
	- a vector index/collection
	- metadata filters (recommended)

	3.2 Step-by-step upsert procedure
	Step 1: Load CSV files
	- Read each CSV row.
	- Validate required columns exist (schema check).

	Step 2: Convert each row to a document
	- Use the canonical record format (Section 2.3).

	Step 3: Generate embeddings
	- For each document text, generate an embedding vector.

	Step 4: Upsert into the vector DB
	- Use a stable ID: {dataset_name}:{primary_id}
	- Store metadata (dataset_name, domain_type, severity, version).

	Step 5: Verify retrieval
	- Query examples:
	- required SPO sections
	- tone policy for executive brief
	- high severity safety constraints
	- Confirm top hits match expected rows.

	3.3 Index strategy (recommended)
	- One index/collection for the package: gardenier_knowledge
	- Use metadata filtering by dataset_name to retrieve targeted signals.
	- Retrieve at least:
	- templates + domain types (routing)
	- validation rules (integrity)
	- latent constraint signals (inference)
	- tone policies (style)

	-------------------------------------------------------------------------------

	4) Memory Implementation (Session Memory)

	4.1 Memory scope rule
	- Default: session-only memory (recommended).
	- Store only what improves compilation accuracy.

	4.2 Memory fields (minimum)
	- session_id
	- last_seed
	- last_domain_type
	- latent_constraints (carryover)
	- constraints_carryover (carryover)
	- tone_preference (optional)

	4.3 Memory usage rule
	- Memory must not override new explicit user requirements.
	- Memory can only:
	- suggest default tone
	- carry constraints like no hype, strict format
	- maintain continuity across turns

	-------------------------------------------------------------------------------

	5) Running Gardenier with RAG (Compilation Logic)

	5.1 Retrieval plan (what to fetch)
	Given seed text:
	A) Retrieve domain routing hints:
	- domain_type_catalog + prompt_template_catalog
	B) Retrieve latent constraint patterns:
	- latent_constraints_signals
	C) Retrieve tone options:
	- tone_policy
	D) Retrieve validation constraints:
	- validation_rules

	5.2 Compilation loop
	- Parse seed -> infer candidate domain_type.
	- Retrieve top-k rows per dataset (k small: 3–8).
	- Compile a draft SPO using the selected template.
	- Run validation checks (based on retrieved rules).
	- If invalid, rewrite and re-check.
	- Output exactly one SPO.

	-------------------------------------------------------------------------------

	6) Implementation in LangChain (Reference Build)

	6.1 Components
	- LLM: the model you use for Gardenier
	- Retriever: vectorstore.as_retriever()
	- Prompt assembly: merge core files + retrieved snippets
	- Memory: session store (ConversationBuffer or custom store)
	- Output parser: ensure SPO structure (regex/section checks)

	6.2 Minimal steps
	Step 1: Load core files as strings.
	Step 2: Load vector store retriever (gardenier_knowledge).
	Step 3: On each request:
	- retrieve relevant rows (filters by dataset_name)
	- assemble Gardenier Runtime Prompt (core + retrieved context)
	- call LLM
	- validate structure (required sections)
	- if fail, retry once with repair instruction
	Step 4: return SPO.

	6.3 Guardrail enforcement
	- Hard-code a post-check that rejects outputs containing:
	- tool use claims (I searched, I emailed, I executed)
	- missing required headings
	- If violated: re-run with repair to comply instruction.

	-------------------------------------------------------------------------------

	7) Implementation in n8n (Reference Build)

	7.1 High-level workflow
	Workflow: Gardenier Compiler
	1) Trigger (Webhook / Chat input)
	2) Load core files (static text nodes or file read)
	3) Retrieve knowledge (Vector DB query node / HTTP request)
	4) Assemble prompt (Set/Function node)
	5) Call LLM (OpenAI/LLM node)
	6) Validate output (IF node + Function validator)
	7) If invalid -> Repair call (LLM node once) -> Validate again
	8) Return SPO (Webhook response)

	7.2 Step-by-step
	Step 1: Trigger node receives {seed, optional context, session_id}.
	Step 2: Retrieve from vector DB:
	- Query = seed
	- Filter dataset_name in batches:
	- domain_type_catalog + prompt_template_catalog
	- latent_constraints_signals
	- tone_policy
	- validation_rules
	Step 3: Assemble a single prompt:
	- System: system_prompt + guardrails
	- Developer/message body: reasoning_template + personality + retrieved snippets
	- User message: seed + context
	Step 4: LLM node generates SPO.
	Step 5: Validator function checks:
	- required headings exist
	- directives count 5–9
	- no tool/action claims
	Step 6: If fail -> Repair SPO to comply LLM node -> Validate again.
	Step 7: Return SPO.

	7.3 What to store in n8n
	- session memory in a DB (Supabase/Postgres) or simple store:
	- session_id, last_domain_type, constraints_carryover, tone_preference

	-------------------------------------------------------------------------------

	8) How to Start Using It (Operational)

	8.1 First run checklist
	- Core files loaded and concatenated correctly.
	- Vector DB index exists and contains embedded dataset rows.
	- Retrieval returns relevant rows.
	- SPO validator is active.
	- Output is pasteable into Worker agent.

	8.2 Example user request
	Input seed:
	Here’s a messy feature list for my app… make it a clean spec with milestones.

	Expected output:
	- domain_type = project_spec
	- SPO with required sections
	- directives 5–9
	- output format includes a spec template

	8.3 Common failure modes
	- Too much retrieval noise:
	- fix by filtering by dataset_name and lowering top-k.
	- SPO missing headings:
	- enforce validator + repair loop.
	- Over-assumptions:
	- require more Inputs Required.

	-------------------------------------------------------------------------------

	9) Recommended Version Discipline
	- When you expand datasets, bump dataset version and package version.
	- Keep schemas stable; expand rows, not columns, unless major version bump.
	- Treat templates and validation rules as core intelligence assets.