Llama-3.2-1B-Sarcasm-Rewriter-Context

A LoRA fine-tuned Llama-3.2-1B-Instruct that rewrites sarcastic news headlines as neutral, factual equivalents. Trained with article body context in the prompt during supervised fine-tuning, producing stronger sarcasm comprehension than headline-only training.

Built by CS4248 Team 14 (NUS, AY2025/26 Semester 2) as part of a sarcasm style transfer research project.

Why this model

Compared to the sibling Llama-3.2-1B-Sarcasm-Rewriter (headline-only training), this context-enhanced variant:

  • Lower perplexity (318 vs 378)
  • Higher LLM-judged sarcasm removal score (4.96/5 vs 4.74/5)
  • Better meaning preservation (4.32/5 vs 3.80/5) — the largest improvement
  • Same near-perfect fluency (4.98/5)

The training targets were generated by an LLM annotator that had access to the full article body, producing deeper rewrites than headline-only targets. The model learned to mimic these more faithful rewrites.

Task

Input: A sarcastic news headline Output: A non-sarcastic rewrite

Input:  "Inconsiderate Wife Leaves Bathroom A Total Mess After Home Birth"
Output: "Mother of Two Gives Birth at Home"

Training

  • Base model: meta-llama/Llama-3.2-1B-Instruct (1.24B params)
  • Method: LoRA (r=16, α=32, dropout=0.05) targeting q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Trainable parameters: ~11.3M (0.9% of base)
  • Dataset: 6,463 sarcastic→non-sarcastic headline pairs where article bodies were available. Targets generated by StepFun Step-3.5 Flash (LLM annotator with article body access), cross-validated by Nemotron. Split: sar_to_non_context_enhanced with body filter applied.
  • Prompt format (training): system prompt + user turn containing both the sarcastic headline AND the full article body as context
  • Loss: Computed only on the assistant response tokens (target headline), not on the prompt
  • Training setup: 3 epochs on H200 GPU, LR 2e-4 cosine, batch 4 × grad_accum 4, bfloat16, gradient checkpointing
  • Best checkpoint: Epoch 1 (eval_loss 1.492)

After training, the LoRA adapter was merged into the base weights via merge_and_unload().

Usage — Recommended (headline-only prompt)

Even though the model was trained with article bodies, inference-time evaluation showed the model performs best with headline-only prompts. Feeding article bodies at inference introduces hallucination from article content. Use this configuration in production:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "SeeYangZhi/Llama-3.2-1B-Sarcasm-Rewriter-Context"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

messages = [
    {
        "role": "system",
        "content": (
            "You are a writing assistant. Rewrite sarcastic news headlines as neutral, "
            "factual equivalents that preserve the core meaning without irony or mockery. "
            "Respond with only the rewritten headline, no explanation."
        ),
    },
    {
        "role": "user",
        "content": (
            "Rewrite this sarcastic headline as a neutral, non-sarcastic news headline:\n\n"
            "inconsiderate wife leaves bathroom a total mess after home birth"
        ),
    },
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=128, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Usage — Alternative (with article body, matches training distribution)

If you have the source article body available, you can pass it in the prompt. Note that evaluation showed this mode produces slightly worse outputs than headline-only due to body-distractor hallucination, so it is not recommended:

user_content = (
    "Rewrite this sarcastic headline as a neutral, non-sarcastic news headline.\n\n"
    f"Headline: {sarcastic_headline}\n\n"
    f"Article context:\n{article_body}"
)

Evaluation

Compared against 14 other models (BART variants, T5 variants, ablations, previous LLaMA) on a 2,857-sample held-out test split with 7 metrics. Key results vs previous headline-only variant:

Metric Llama-context (this model) Llama (previous)
Flip rate (classifier) 22.5% 21.9%
Semantic similarity 0.679 0.656
Perplexity (GPT-2) 318 378
LLM sarcasm removed 4.96/5 4.74/5
LLM meaning preserved 4.32/5 3.80/5
LLM fluency 4.98/5 4.98/5

Full per-metric numbers are published alongside the project webapp.

License

This model is released under the Llama 3.2 Community License. The model name starts with "Llama-" as required by Meta's terms. Built with Llama.

Citation

If you use this model, please cite the underlying Llama 3.2 release and the NHDSD dataset.

Downloads last month
4
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for SeeYangZhi/Llama-3.2-1B-Sarcasm-Rewriter-Context

Finetuned
(1724)
this model
Quantizations
1 model