---
license: other
language: ["en"]
base_model: ["JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback"]
base_model_relation: adapter
library_name: peft
pipeline_tag: text-generation
tags: ["qwen3", "text-generation", "code", "python", "peft", "lora", "qlora", "agentic-coding", "tessa", "heretic", "codefeedback"]
datasets: ["smirki/Agentic-Coding-Tessa"]
---

# Qwen3 4B Thinking 2507 Heretic CodeFeedback — Agentic Tessa 1K LoRA

This repository contains an experimental **LoRA adapter** trained on top of:

[`JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback`](https://huggingface.co/JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback)

This adapter is a small continuation experiment using:

[`smirki/Agentic-Coding-Tessa`](https://huggingface.co/datasets/smirki/Agentic-Coding-Tessa)

The goal was to test whether a small amount of agentic coding data could improve or preserve coding behavior without degrading strict code-output performance.

## Status

This is a **candidate / experimental adapter**, not a claimed major improvement.

I'll be testing some datasets to make the model better for coding, it a tiny improvement, not a game changer, but compared to the previous one this model didn't get worse.

In a small local Python coding benchmark, this adapter preserved the previous score:

| Model | Adapter | Passed | Pass rate | Avg tokens/s |
|---|---|---:|---:|---:|
| Before | `heretic_F_lora_python5000_codefeedback5000` | 9/10 | 90.00% | 7.80 |
| After | `heretic_F_lora_tessa_agentic_1000_test` | 9/10 | 90.00% | 7.86 |

Delta:

| Metric | Value |
|---|---:|
| Passes | 0 |
| Pass rate | 0.00% |
| Avg tokens/s | +0.05 |

Unlike the OpenCodeInstruct continuation experiment, this Tessa-based adapter did **not** regress on the small strict-code benchmark.

## Training configuration

| Item | Value |
|---|---|
| Base model | `JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback` |
| Input adapter | `heretic_F_lora_python5000_codefeedback5000` |
| Dataset | `smirki/Agentic-Coding-Tessa` |
| Samples used | 1,000 |
| Sequence length | 1024 |
| Epochs | 1 |
| Learning rate | 1e-6 |
| Training method | QLoRA / LoRA |
| Quantized loading during training | 4-bit NF4 |

## Benchmark files

Benchmark artifacts are included under:

~~~text
benchmark/
~~~

Files:

~~~text
benchmark/before_summary.md
benchmark/after_summary.md
benchmark/COMPARISON.md
benchmark/before_results.jsonl
benchmark/after_results.jsonl
~~~

## Intended use

This adapter is intended for testing:

- agentic coding behavior
- coding assistance
- code generation
- code explanation
- tool-use style coding responses
- continued fine-tuning experiments

It should be compared against the main CodeFeedback model before use in any serious coding workflow.

## Loading example

~~~python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

base_model = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback"
adapter = "JoaoZaokk/Qwen3-4B-Thinking-2507-Heretic-CodeFeedback-Agentic-Tessa-1K-LoRA"

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(model, adapter)
model.eval()
~~~

## Important notes

This is an experimental LoRA adapter.

The benchmark used here is small and should not be treated as a formal coding leaderboard. It is mainly useful for local before/after regression testing.

This adapter preserved the current local benchmark score, but further testing is needed before treating it as a better general-purpose coding model.