--- language: - ko - en license: other base_model: LiquidAI/LFM2-1.2B license_name: lfm-open-license-v1.0 license_link: https://huggingface.co/LiquidAI/LFM2-1.2B/blob/main/LICENSE tags: - translation - generated_from_trainer - liquid-ai - lfm2 - korean datasets: - gyung/koen-parallel-100k metrics: - chrf - bleu model-index: - name: LFM2-1.2B-KoEn-MT-v4-100k results: - task: type: translation name: Translation dataset: name: Flores-200 type: flores_200 config: en-ko split: devtest metrics: - type: chrf name: CHrF++ value: 31.53 verified: true - type: bleu name: BLEU value: 11.13 verified: true --- # 🌊 LFM2-1.2B-KoEn-MT-v4-100k **LFM2-1.2B-KoEn-MT-v4-100k**은 LiquidAI의 `LFM2-1.2B` λͺ¨λΈμ„ 기반으둜 ν•œκ΅­μ–΄-μ˜μ–΄ λ²ˆμ—­ λŠ₯λ ₯ ν–₯상을 μœ„ν•΄ **100,000개의 κ³ ν’ˆμ§ˆ 병렬 데이터셋**으둜 νŒŒμΈνŠœλ‹λœ λͺ¨λΈμž…λ‹ˆλ‹€. T4 GPU x 2 (DDP) ν™˜κ²½μ—μ„œ μ΅œμ ν™”λœ ν•™μŠ΅ νŒŒμ΄ν”„λΌμΈμ„ 톡해 ν•™μŠ΅λ˜μ—ˆμœΌλ©°, 1.2B의 κ°€λ²Όμš΄ νŒŒλΌλ―Έν„°λ‘œλ„ 효율적이고 μ€€μˆ˜ν•œ λ²ˆμ—­ μ„±λŠ₯을 λ³΄μ—¬μ€λ‹ˆλ‹€. 특히, NLLB-600Mκ³Ό 경쟁 κ°€λŠ₯ν•œ μ„±λŠ₯을 보이며 λͺ¨λ°”일 및 μ—£μ§€ λ””λ°”μ΄μŠ€μ—μ„œμ˜ ν™œμš© κ°€λŠ₯성을 μ—΄μ–΄μ€λ‹ˆλ‹€. ## πŸ“Š 벀치마크 (Benchmarks) **Flores-200** 데이터셋(1012 λ¬Έμž₯)을 κΈ°μ€€μœΌλ‘œ ν•œ 평가 κ²°κ³Όμž…λ‹ˆλ‹€. (CHrF++ κΈ°μ€€ μ •λ ¬) | Rank | Model | CHrF++ | BLEU | λΉ„κ³  | | :--- | :--- | :--- | :--- | :--- | | 1 | **Google Translate** | 39.27 | 18.18 | μƒμš© μ„œλΉ„μŠ€ (Target) | | 2 | **Yanolja-4B-GGUF** | 38.61 | 16.03 | Open Source Model (SOTA) | | 3 | **NLLB-200 (3.3B)** | 35.09 | 11.68 | 3.3B λ²ˆμ—­ μ „μš© λͺ¨λΈ | | 4 | **Gemma-3-4B-it-GGUF** | 32.83 | 11.36 | Google μ΅œμ‹  4B λͺ¨λΈ | | 5 | **NLLB-200-Distilled-600M** | 31.97 | 10.32 | 600M λ²ˆμ—­ μ „μš© λͺ¨λΈ | | 6 | **LFM2-1.2B-KOEN-MT-v4-100k** | **31.53** | **11.13** | **λ³Έ λͺ¨λΈ (1.2B)** | | 7 | **lfm2-mt-v1** | 30.85 | 11.17 | 100 Samples ν•™μŠ΅ | | 8 | **LFM2-1.2B** | 27.23 | 6.43 | 베이슀라인 λͺ¨λΈ | | 9 | **Qwen3-4B-GGUF** | 25.62 | 7.46 | 4B Base Model | | 10 | **Gemma-3-1B-it-GGUF** | 24.07 | 6.94 | 1B λͺ¨λΈ | | 11 | **Qwen3-1.7B-GGUF** | 21.19 | - | 1.7B Base Model | | 12 | **Qwen3-0.6B-GGUF** | 13.48 | 1.98 | 0.6B Base Model | ## πŸ“ˆ ν•™μŠ΅ 둜그 (Training Logs) **μ•½ 6,188 Step** λ™μ•ˆ μ§„ν–‰λœ ν•™μŠ΅μ˜ Loss 및 Learning Rate λ³€ν™” μΆ”μ΄μž…λ‹ˆλ‹€. 초기 손싀값 3.5λŒ€μ—μ„œ μ‹œμž‘ν•˜μ—¬ μ΅œμ’… 1.43κΉŒμ§€ μ•ˆμ •μ μœΌλ‘œ μˆ˜λ ΄ν•˜μ˜€μŠ΅λ‹ˆλ‹€. | Step | Epoch | Training Loss (Avg) | Learning Rate | λΉ„κ³  | | :---: | :---: | :--- | :--- | :--- | | 0 | 0.00 | 3.57 | 0 | Start | | 500 | 0.08 | 1.59 | 8.06e-06 | Warmup μ™„λ£Œ ν›„ κ°μ†Œ | | 1000 | 0.16 | 1.57 | 9.88e-06 | 초기 μ•ˆμ •ν™” | | 2000 | 0.32 | 1.48 | 8.45e-06 | Loss 1.5 미만 μ§„μž… | | 3000 | 0.49 | 1.46 | 5.99e-06 | μ€‘λ°˜λΆ€ 수렴 가속 | | 4000 | 0.65 | 1.45 | 3.21e-06 | λ―Έμ„Έ μ‘°μ • 단계 | | 5000 | 0.81 | 1.44 | 1.08e-06 | μ„±λŠ₯ κ·ΉλŒ€ν™” | | 6000 | 0.98 | 1.43 | 6.30e-09 | μ΅œμ’… 수렴 (Final Convergence) | * **Optimizer**: `paged_adamw_8bit` * **LR Scheduler**: Cosine Decay with Warmup (0.1 ratio) * **Max LR**: 1e-5 ## πŸš€ μ‚¬μš© μ˜ˆμ‹œ (Usage) 이 λͺ¨λΈμ€ `transformers` 라이브러리λ₯Ό μ‚¬μš©ν•˜μ—¬ μ‰½κ²Œ λ‘œλ“œν•˜κ³  λ²ˆμ—­μ„ μˆ˜ν–‰ν•  수 μžˆμŠ΅λ‹ˆλ‹€. ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer # λͺ¨λΈ λ‘œλ“œ model_id = "gyung/lfm2-1.2b-koen-mt-v4-100k" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", torch_dtype=torch.float16 ) # λ²ˆμ—­ν•  λ¬Έμž₯ text = "The model is working correctly now." # μ±„νŒ… ν…œν”Œλ¦Ώ 적용 (ChatML ν˜•μ‹ ꢌμž₯) messages = [ {"role": "system", "content": "Translate to Korean."}, {"role": "user", "content": text} ] # μž…λ ₯ 토큰화 input_ids = tokenizer.apply_chat_template( messages, return_tensors="pt", add_generation_prompt=True ).to(model.device) # λ²ˆμ—­ 생성 outputs = model.generate( input_ids, max_new_tokens=256, pad_token_id=tokenizer.eos_token_id ) # κ²°κ³Ό λ””μ½”λ”© decoded = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True) print(f"Input: {text}") print(f"Output: {decoded}") # Output: λͺ¨λΈμ΄ μ •μƒμ μœΌλ‘œ μž‘λ™ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€. ``` ## βš™οΈ ν•™μŠ΅ 상세 정보 (Training Details) 이 λͺ¨λΈμ€ Kaggle T4 x 2 ν™˜κ²½μ—μ„œ μ΅œμ ν™”λœ μ„€μ •μœΌλ‘œ ν•™μŠ΅λ˜μ—ˆμŠ΅λ‹ˆλ‹€. ### ν•™μŠ΅ ꡬ성 (Configuration) * **Base Model**: `LiquidAI/LFM2-1.2B` * **Dataset**: `dataset_100000.jsonl` (English-Korean Parallel, 100k samples) * **Hardware**: NVIDIA T4 GPU x 2 (Data Parallelism, DDP) * **Epochs**: 1 * **Batch Size**: 1 per device (Gradient Accumulation 16) -> Effective Batch Size 32 * **Optimizer**: `paged_adamw_8bit` * **Learning Rate**: 1e-5 (Cosine Scheduler, Warmup 0.1) * **Precision**: Mixed Precision or FP16 (Optimized for T4) ### ν•™μŠ΅ μ½”λ“œ (Training Code Snippet) ```python # SFTTrainer Configuration used for v4 sft_config = SFTConfig( output_dir="/kaggle/working/lfm2-mt-v4", num_train_epochs=1, per_device_train_batch_size=1, gradient_accumulation_steps=16, gradient_checkpointing=True, optim="paged_adamw_8bit", learning_rate=1e-5, lr_scheduler_type="cosine", warmup_ratio=0.1, logging_steps=50, save_steps=500, eval_strategy="no", # Optimized for speed dataset_text_field="messages", packing=False, ddp_find_unused_parameters=False, ) ``` ## ⚠️ μ œν•œ 사항 (Limitations) * 이 λͺ¨λΈμ€ 1.2B νŒŒλΌλ―Έν„°μ˜ μ†Œν˜• λͺ¨λΈλ‘œ, 맀우 λ³΅μž‘ν•˜κ±°λ‚˜ 전문적인 λ¬Έλ§₯μ—μ„œλŠ” λŒ€ν˜• λͺ¨λΈ(4B+)보닀 μ„±λŠ₯이 λ–¨μ–΄μ§ˆ 수 μžˆμŠ΅λ‹ˆλ‹€. * ν•™μŠ΅ 데이터에 ν¬ν•¨λ˜μ§€ μ•Šμ€ 희귀 λ‹¨μ–΄λ‚˜ 맀우 κΈ΄ λ¬Έμž₯에 λŒ€ν•΄μ„œλŠ” ν™˜κ°(Hallucination)이 λ°œμƒν•  수 μžˆμŠ΅λ‹ˆλ‹€. ## πŸ“œ λΌμ΄μ„ μŠ€ (License) 이 λͺ¨λΈμ€ **Liquid AI LFM Open License v1.0**을 λ”°λ¦…λ‹ˆλ‹€. * **ν—ˆμš©**: ν•™μˆ  연ꡬ 및 개인적 μ‚¬μš©μ€ μ œν•œ 없이 κ°€λŠ₯ν•©λ‹ˆλ‹€. * **상업적 이용**: μ—° 맀좜 1,000만 λ‹¬λŸ¬(μ•½ 140μ–΅ 원) 미만의 κΈ°μ—…/κ°œμΈμ€ 무료둜 상업적 이용이 κ°€λŠ₯ν•©λ‹ˆλ‹€. * **μ œν•œ**: μ—° 맀좜 1,000만 λ‹¬λŸ¬λ₯Ό μ΄ˆκ³Όν•˜λŠ” 기업은 Liquid AI와 λ³„λ„μ˜ λΌμ΄μ„ μŠ€ 계약이 ν•„μš”ν•©λ‹ˆλ‹€. μžμ„Έν•œ λ‚΄μš©μ€ [LICENSE](https://huggingface.co/LiquidAI/LFM2-1.2B/blob/main/LICENSE) νŒŒμΌμ„ μ°Έκ³ ν•˜μ„Έμš”. ## Citation **Model** ```bibtex @misc{lfm2-1.2b-koen-mt-v4-100k, author = {Gyung}, title = {LFM2-1.2B Korean-English Machine Translation Model v4}, year = {2025}, publisher = {Hugging Face}, journal = {Hugging Face Model Hub}, howpublished = {\url{https://huggingface.co/gyung/lfm2-1.2b-koen-mt-v4-100k}} } ``` **Base Model (Liquid LFM-2.1B)** ```bibtex @article{liquidai2025lfm2, title={LFM2 Technical Report}, author={Liquid AI}, journal={arXiv preprint arXiv:2511.23404}, year={2025} } ``` **Evaluation Dataset (Flores-200)** ```bibtex @article{nllb2022, author = {NLLB Team and Costa-jussΓ , Marta R. and Cross, James and Onabanjo, Onurkele and et al.}, title = {No Language Left Behind: Scaling Human-Centered Machine Translation}, year = {2022}, journal = {arXiv preprint arXiv:2207.04672} } ``` **Metrics** ```bibtex @inproceedings{popovic-2015-chrf, title = "chrF: character n-gram F-score for automatic MT evaluation", author = "Popovi{\'c}, Maja", booktitle = "Proceedings of the Tenth Workshop on Statistical Machine Translation", month = sep, year = "2015", address = "Lisbon, Portugal", publisher = "Association for Computational Linguistics", pages = "392--395", } @inproceedings{post-2018-call, title = "A Call for Clarity in Reporting BLEU Scores", author = "Post, Matt", booktitle = "Proceedings of the Third Conference on Machine Translation: Research Papers", month = oct, year = "2018", address = "Belgium, Brussels", publisher = "Association for Computational Linguistics", pages = "186--191", } ```