--- license: other language: ["en"] base_model: ["JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback"] base_model_relation: adapter library_name: peft pipeline_tag: text-generation tags: ["qwen3", "text-generation", "code", "python", "peft", "lora", "qlora", "agentic-coding", "tessa", "heretic", "codefeedback"] datasets: ["smirki/Agentic-Coding-Tessa"] --- # Qwen3 4B Thinking 2507 Heretic CodeFeedback — Agentic Tessa 1K LoRA This repository contains an experimental **LoRA adapter** trained on top of: [`JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback`](https://huggingface.co/JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback) This adapter is a small continuation experiment using: [`smirki/Agentic-Coding-Tessa`](https://huggingface.co/datasets/smirki/Agentic-Coding-Tessa) The goal was to test whether a small amount of agentic coding data could improve or preserve coding behavior without degrading strict code-output performance. ## Status This is a **candidate / experimental adapter**, not a claimed major improvement. I'll be testing some datasets to make the model better for coding, it a tiny improvement, not a game changer, but compared to the previous one this model didn't get worse. In a small local Python coding benchmark, this adapter preserved the previous score: | Model | Adapter | Passed | Pass rate | Avg tokens/s | |---|---|---:|---:|---:| | Before | `heretic_F_lora_python5000_codefeedback5000` | 9/10 | 90.00% | 7.80 | | After | `heretic_F_lora_tessa_agentic_1000_test` | 9/10 | 90.00% | 7.86 | Delta: | Metric | Value | |---|---:| | Passes | 0 | | Pass rate | 0.00% | | Avg tokens/s | +0.05 | Unlike the OpenCodeInstruct continuation experiment, this Tessa-based adapter did **not** regress on the small strict-code benchmark. ## Training configuration | Item | Value | |---|---| | Base model | `JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback` | | Input adapter | `heretic_F_lora_python5000_codefeedback5000` | | Dataset | `smirki/Agentic-Coding-Tessa` | | Samples used | 1,000 | | Sequence length | 1024 | | Epochs | 1 | | Learning rate | 1e-6 | | Training method | QLoRA / LoRA | | Quantized loading during training | 4-bit NF4 | ## Benchmark files Benchmark artifacts are included under: ~~~text benchmark/ ~~~ Files: ~~~text benchmark/before_summary.md benchmark/after_summary.md benchmark/COMPARISON.md benchmark/before_results.jsonl benchmark/after_results.jsonl ~~~ ## Intended use This adapter is intended for testing: - agentic coding behavior - coding assistance - code generation - code explanation - tool-use style coding responses - continued fine-tuning experiments It should be compared against the main CodeFeedback model before use in any serious coding workflow. ## Loading example ~~~python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel import torch base_model = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback" adapter = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-Agentic-Tessa-1K-LoRA" tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True) bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, ) model = AutoModelForCausalLM.from_pretrained( base_model, quantization_config=bnb_config, device_map="auto", trust_remote_code=True, ) model = PeftModel.from_pretrained(model, adapter) model.eval() ~~~ ## Important notes This is an experimental LoRA adapter. The benchmark used here is small and should not be treated as a formal coding leaderboard. It is mainly useful for local before/after regression testing. This adapter preserved the current local benchmark score, but further testing is needed before treating it as a better general-purpose coding model.