anicka
/

nla-qwen3-4b-v2

natural-language-autoencoder

mechanistic-interpretability

activation-geometry

geometric-wellbeing

Model card Files Files and versions

nla-qwen3-4b-v2 / av /nla_meta.yaml

anicka's picture

NLA v2: axis-relevant training, FVE 0.94

c22cef7 verified 20 days ago

history blame contribute delete

972 Bytes

	d_model: 2560
	extraction:
	injection_scale: 150.0
	mse_scale: 1.0
	extraction_layer_index: 20
	kind: nla_model
	prompt_templates:
	actor: 'You are a meticulous AI researcher conducting an important investigation
	into activation vectors from a language model. Your overall task is to describe
	the semantic content of that activation vector.


	We will pass the vector enclosed in <concept> tags into your context. You must
	then produce an explanation for the vector, enclosed within <explanation> tags.
	The explanation consists of 2-3 text snippets describing that vector.


	Here is the vector:


	<concept>{injection_char}</concept>


	Please provide an explanation.'
	critic: 'Summary of the following text: <text>{explanation}</text> <summary>'
	role: av
	schema_version: 2
	stage: sl
	tokens:
	critic_suffix_ids: null
	injection_char: "\u320E"
	injection_left_neighbor_id: 29
	injection_right_neighbor_id: 522
	injection_token_id: 149705