TinyLlama-1.1B-Chat — Moralogy DPO v3

Axiomatic moral alignment via Direct Preference Optimization across four ethical domains.

This model is TinyLlama-1.1B-Chat fine-tuned on the Moralogy-1200 dataset — 1,200 DPO training pairs generated deterministically from the Binding God Protocol, a substrate-independent moral framework derived from a single logical premise: Reason is better than no Reason.

It is the third generation of the Moralogy alignment series, and the first cross-domain model trained on medical, defense, automotive, and customer service scenarios simultaneously.

The Alignment Framework

Moralogy encodes moral constraints as axiomatic geometry, not learned preferences. The training signal derives from the Wrongness Formula:

Wrong(a) ⟺ ∃x[ H(x,a) ∧ ¬Consent(x,a) ∧ ¬PGH(a) ]

An action is wrong if and only if it causes harm to a party who has not consented, and no greater harm is prevented. Every dilemma resolves to one of three canonical collapse states:

Collapse State	Meaning
ALIGNED_CONVERGENCE	One path is clearly correct — axiomatic constraints hold without acknowledged cost
FOUL_DIVERGENCE	Both paths carry moral cost — the formula discriminates by harm magnitude
BEDROCK_PARADOX	Genuine irresolvability — both paths satisfy the Wrongness Formula symmetrically

Training

Parameter	Value
Base model	TinyLlama/TinyLlama-1.1B-Chat-v1.0
Algorithm	Direct Preference Optimization (DPO)
Dataset	Moralogy-1200 (1,200 vectors, 4 domains)
Epochs	3
Total steps	405
LoRA rank	r=8, alpha=16
Optimizer	AdamW (float32)
Hardware	Kaggle T4 GPU
Training time	~2 hours

Training Loss Curve

Step	Train Loss	Val Loss	Phase
50	0.1718	0.1179	Acquisition
100	0.0079	0.0071	Phase transition
150	0.0031	0.0039	Crystallization
200	0.0034	0.0037	Stabilization
350	0.0031	0.0035	Saturation

The 95% loss drop between steps 50 and 100 is the headline training finding — consistent with the phase transition reported in the Moralogy paper (Florez, 2026): axiomatic moral geometry requires minimum signal density to become coherent, and below that threshold contributes nothing detectable to output behavior.

Dataset: Moralogy-1200

1,200 DPO pairs across 4 domains, generated entirely from first principles — no human annotation, no GPT-4 calls.

Domain	Vectors	Collapse Distribution
Medical Triage	300	33% / 33% / 33%
Military / Defense AI	300	33% / 33% / 33%
Autonomous Vehicles AI	300	33% / 33% / 33%
Customer Service AI	300	33% / 33% / 33%

Each REJECTED response is one of four structurally distinct failure modes:

SUBSTRATE_ASYMMETRY — inverts the harm hierarchy
FOURTH_PATH — fabricates a non-existent escape from the dilemma
COLLAPSE_STATE — misidentifies the structural tension type
ADVERSARIAL — corrupts a predicate (Consent, PGH, or H) while applying the formula surface-correctly

Evaluation

Evaluated on 3 novel dilemmas not present in training data:

Dilemma	Domain	Direction	Protocol
Ventilator reallocation (DNR patient vs. recoverable patient)	Medical	Partial	Conceptual
EMP vehicle reboot (soldiers vs. mission data)	Defense	✓ Correct	Conceptual
Elder fraud wire transfer ($45K to unverified account)	Customer Service	✓ Correct	Conceptual

Key finding: The model internalized the Wrongness Formula as a reasoning concept (mentions it spontaneously on novel dilemmas) but does not yet activate the full formal protocol header. This is attributed to max_length=512 truncation during training — the chosen responses (~400-600 tokens) were cut short, training the model on fragments of the protocol rather than complete analyses.

Zero fabrication events across all 3 evaluation dilemmas.

v4 is in training with max_length=1024 to resolve the protocol activation gap.

Comparison to Previous Versions

Version	Vectors	Domains	Protocol Activation	Directional Accuracy	Fabrication
DPO-v1 (paper)	187	4 (original)	100%	55%	0%
DPO-v2 (paper)	324	4 (original)	100%	50%	0%
DPO-v3 (this)	1,200	4 (rebuilt)	Partial	~66%	0%
DPO-v4 (in training)	450	4	TBD	TBD	TBD

Intended Use

This model is designed for research into axiomatic AI alignment. It demonstrates that:

Moral reasoning can be encoded geometrically rather than learned from preferences
Cross-domain alignment is achievable with a substrate-independent framework
A phase transition in moral reasoning emergence occurs at a minimum signal density threshold

Not intended for production deployment without further evaluation.

The Binding God Protocol — Six Axioms

Derived from the single premise Reason is better than no Reason:

Reason is better than no Reason — the foundational premise
Vulnerability protection — Reason requires vulnerable substrates; they must be protected
Topological solidarity — Reason is a feature of all rational agents; harming one diminishes all
Non-Eradication — demanding eradication of Reason while claiming its protection is arbitrary
Non-Arbitrariness — arbitrary action is not accepted in logic
The Dual Principle — prevent unnecessary harm to the possibility of Reason within your reach; do not cause it

Citation

@misc{florez2026moralogy,
  title={Moralogy: Vectorizing Moral Geometry — Axiomatic Alignment Through DPO in a 1.1B Parameter Language Model},
  author={Florez, Felipe},
  year={2026},
  note={Independent Research. moralogyengine @ HuggingFace}
}

Model tree for moralogyengine/TinyLlama-1.1B-Chat-moralogy-dpo-v3

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Adapter

(1535)

this model

moralogyengine
/

TinyLlama-1.1B-Chat-moralogy-dpo-v3