KoHRM-Text-1.4B FullSFT BarExam MCQ + Hard Current-Law Precedent Epoch2

Full fine-tune that continues from LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-BarExam-MCQ-1-14-Epoch2 and adds the gyung/korean-bar-exam-hard-current-law-precedent-sft-1000 corpus.

Base Model

  • Base: LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-BarExam-MCQ-1-14-Epoch2
  • Relation: full fine-tune (continuation)
  • Runtime: local KoHRM/HRM-Text PrefixLM runtime
  • Export format: single-file model.safetensors plus tokenizer/config

Training

  • Dataset: gyung/korean-bar-exam-hard-current-law-precedent-sft-1000 sft/train.jsonl
  • Rows kept: 1,000 (0 dropped)
  • Tokens: 594,803 (avg sample 595, max 865)
  • Epochs: 2
  • Global batch size: 4,096 tokens
  • Learning rate: 2.0e-5, cosine, warmup 10 steps
  • Single H200 (CUDA index 7), torchrun nproc_per_node=1
  • Train loss (final): 0.243
  • Run time: ~7 minutes

Subject distribution of the additional SFT set:

subject count
공법 270
민사법 460
형사법 270

Assistant response template (from the source dataset, preserved verbatim):

정답: <번호>

해설: 정답은 <번호>번이다. ㄱ은 옳다/옳지 않다. {법령 인용} ... ㄴ은 ... ㄷ은 ... ㄹ은 ...

참고 법령: <법령1>(url); <법령2>(url); ...

The first generation token is always the answer number, which makes greedy-decode answer extraction simple.

Evaluation (round 15)

Round 15 of gyung/korean-bar-exam-moj-multiple-choice is held out. 145 single-answer questions.

run condition accuracy parse rate
base (no SFT) direct 13.1 % (19/145) 61.4 %
parent (1-14 SFT only) direct 26.9 % (39/145) 100 %
this checkpoint cot 20.0 % (29/145) 100 %
this checkpoint direct 22.1 % (32/145) 100 %

By subject (this checkpoint, direct condition):

subject accuracy
공법 20.5 % (8/39)
민사법 20.9 % (14/67)
형사법 25.6 % (10/39)

Random baseline (single-answer 5-way) = 20 %. The hard-current-law continuation did not improve over the parent checkpoint on round 15. Inspecting the generations, the model still produces short "정답: X" outputs (inherited from the parent run) instead of the longer 정답/해설/참고 법령 format from this SFT set, so the additional signal does not fully transfer at inference time.

Usage

This is not a standard Hugging Face AutoModelForCausalLM chat-model export. It uses the KoHRM/HRM-Text PrefixLM runtime. Tokenizer special tokens (no chat_template):

<|im_start|>          boq (id 2)
<|im_end|>            eoq (id 3)
<|box_end|>           eoa (id 35, eos)
<|object_ref_start|>  direct condition (id 32)
<|object_ref_end|>    cot condition    (id 33)

Prompt is tokenized as <|im_start|><condition_token>{instruction}<|im_end|>, generation stops at <|box_end|>. See simple_inference_engine.py in the source repo.

Source

  • Additional SFT dataset: gyung/korean-bar-exam-hard-current-law-precedent-sft-1000
  • Parent checkpoint: LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-BarExam-MCQ-1-14-Epoch2
  • Source license: Korea Open Government License Type 1 (KOGL Type 1) for statute/precedent data
Downloads last month
-
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LLM-OS-Models/KoHRM-Text-1.4B-FullSFT-BarExam-MCQ-1-14-HardCurrentLaw-1000-Epoch2