--- license: apache-2.0 library_name: transformers pipeline_tag: text-generation base_model: Qwen/Qwen3-1.7B-Base --- # Vikra-HCT-YeAM-3_3.2_QweLLa-1.7B HCT architecture release. YeAM (Yet Another Merge) implementation invariant. ## What it is A compact 1.7B-class checkpoint produced via HCT-compatible merging. The checkpoint is published in standard Hugging Face format (safetensors + index). ## YeAM summary YeAM performs a controlled merge in a real 4D geometric formulation with ray-intersection alignment in parameter space. It also supports targeted knowledge injection (distillation-style) into a chosen model while remaining HF-compatible. ## Notes for this checkpoint Compared to other YeAM/HCT merges, this checkpoint additionally applies a targeted merge on Attention projection weights. Observed behavior tends to include characteristic Llama-like traits: - More Llama-style conversation patterns. - More consistent formatting. - Stronger RLHF-like refusal/priority behaviors. - Reasoning / chain-of-thought style output in the model's full native format is expected to work. At the same time, most Qwen3 behavior should theoretically remain, but due to knowledge/logic injection from the Llama side, some Qwen-specific properties may be partially degraded or inconsistent. Repetition / looping: - There is no universally perfect sampling configuration. - At higher temperature, without a repetition-style penalty, the model may enter repetition loops. - Pay special attention to repetition-related controls (e.g. repetition penalty / presence penalty) if you observe cycling. Do not ask the model who created it. In this specific merge, it may oscillate between incompatible parents (Alibaba vs Meta”), fail to settle, and get stuck in a sad loop. ## Usage (Transformers) ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch m = "/path/to/Vikra-HCT-YeAM-3_3.2_QweLLa-1.7B" tok = AutoTokenizer.from_pretrained(m, use_fast=True) model = AutoModelForCausalLM.from_pretrained( m, torch_dtype=torch.bfloat16, device_map="cuda", ).eval() inputs = tok("Hello!", return_tensors="pt").to(model.device) out = model.generate(**inputs, max_new_tokens=128) print(tok.decode(out[0], skip_special_tokens=True)) ``` ## GGUF Convert and quantize with llama.cpp (example): ```bash python3 /path/to/llama.cpp/convert_hf_to_gguf.py /path/to/model --outtype f16 --outfile model.f16.gguf /path/to/llama.cpp/build/bin/llama-quantize model.f16.gguf model.Q8_0.gguf Q8_0 ```