--- language: - en license: mit library_name: transformers base_model: LiquidAI/LFM2.5-350M tags: - nl2bash - linux-commands - text-generation - lora - grpo - natural-language-to-code tasks: - text-generation --- # LFM2.5-350M Linux Command Generator A fine-tuned version of [LiquidAI/LFM2.5-350M](https://huggingface.co/LiquidAI/LFM2.5-350M) specialized for converting natural language instructions into Linux shell commands. ## What This Model Actually Is This is a **task-specific fine-tune** of LiquidAI's 350M parameter language model, trained to generate bash commands wrapped in special tokens. It was created as a research/demonstration project to explore: 1. **LoRA fine-tuning** for command generation tasks 2. **GRPO (Group Relative Policy Optimization)** for reinforcement learning from rewards 3. **Custom format training** using special tokens 4. **Production pipeline** with Azure OpenAI dataset generation ## Architecture & Training ### Base Model - **Model**: LiquidAI/LFM2.5-350M (350M parameters) - **Architecture**: Transformer decoder-only - **Context Length**: 4096 tokens ### Training Pipeline ``` OpenAI GPT-4 Dataset (3,000 examples) ↓ SFT Training (4 epochs, LoRA r=16, BF16) - assistant_only_loss=True - Custom data collator with assistant_masks ↓ GRPO Training (2 epochs, 7 reward functions) - beta=0.04 (KL constraint) - num_generations=3 per prompt - Temperature annealing 0.7 → 0.45 ↓ Final Merged Model ``` ### Dataset (15 Categories, ~3,000 examples) | Category | Examples | Description | |----------|----------|-------------| | file_operations | 450 | ls, cp, mv, rm, mkdir | | text_processing | 400 | grep, awk, sed, cut, sort | | file_search | 300 | find, locate, which | | process_management | 300 | ps, kill, pkill, nohup | | networking | 250 | ping, curl, wget, ssh, scp | | permissions | 200 | chmod, chown, sudo | | archives_compression | 200 | tar, gzip, zip | | system_info | 200 | df, du, free, uptime | | io_redirection | 200 | pipes, >, >>, tee | | environment | 150 | export, alias, source | | monitoring | 150 | watch, lsof, journalctl | | user_management | 150 | useradd, passwd, id | | disk_storage | 150 | lsblk, mount, fdisk | | string_patterns | 150 | grep -E, sed -E patterns | | shell_scripting | 150 | for loops, if statements | ### Output Format (v30) The model outputs **raw bash commands** between special tokens: ``` <|tool_call_start|>find . -name "*.py" -mtime -7<|tool_call_end|> ``` **Key characteristics:** - No function wrappers (`linux_command(...)`) - just raw bash - Uses LFM2.5's native special tokens: `<|tool_call_start|>`, `<|tool_call_end|>` - Designed for direct extraction and execution ## Performance Metrics (Actual) | Metric | Score | Notes | |--------|-------|-------| | Format Accuracy | **100%** | Correct use of special tokens | | Tool Name Accuracy | **98%** | Raw command format (no wrappers) | | Exact Match | **24%** | String match with reference command | | Command F1 | **0.58** | Token-level F1 score | ### What These Numbers Mean - **100% Format Accuracy**: The model consistently outputs commands in the correct format with proper special tokens - **98% Tool Name**: Almost never uses old function-wrapper format - **24% Exact Match**: Matches the reference command exactly 1 in 4 times (this is actually competitive for 350M parameters) - **0.58 F1**: Moderate token overlap with reference commands ### Comparison Context | Model | Parameters | NL2Bash EM | Notes | |-------|------------|------------|-------| | GPT-4 | ~? | ~50% | Proprietary, cloud-only | | StarCoder2-7b | 7B | ~35% | 20x larger | | CodeLlama-7b | 7B | ~30% | 20x larger | | **This Model** | **350M** | **24%** | **Fully open, edge-runnable** | | CodeT5-base | 220M | ~18% | Smaller but older arch | **Takeaway**: For a 350M parameter model, 24% EM is reasonable. It's not SOTA, but it's competitive for the size class and runs on minimal hardware. ## Technical Highlights ### 1. LoRA Fine-Tuning ```python LoraConfig( r=16, # Rank lora_alpha=32, # Scaling lora_dropout=0.05, target_modules="all-linear", bias="none", ) ``` ### 2. GRPO Reward Functions (7 total) - `reward_format`: Correct special token usage (+2/-1) - `reward_tool_name`: Raw command format validation (+2/-2) - `reward_exact_cmd`: Exact string match (+2, partial credit) - `reward_similarity`: Token F1 similarity (0-1) - `reward_safety`: Dangerous command penalty (-3) - `reward_penalties`: Termination and structure quality - `reward_structure`: Content quality and format ### 3. Critical Bug Fixes Applied #### Tokenizer Patch for GRPO ```python # TRL's GRPOTrainer calls batch_decode with skip_special_tokens=True # which strips our format tokens. We monkey-patch to force=False. def _forced_decode(sequences, skip_special_tokens=True, **kwargs): return original(sequences, skip_special_tokens=False, **kwargs) ``` #### Pickle Fix for odict_keys ```python # TRL/Transformers has issues with odict_keys in save checkpoints # We monkey-patch Trainer._save to convert to list before saving def patched_save(self, output_dir, state_dict): if hasattr(self, 'model_kwarg_keys'): if isinstance(keys, (KeysView, ValuesView, ItemsView)): self.model_kwarg_keys = list(keys) # ... sanitize and save ``` ### 4. Training Optimizations - **Right padding** for training (assistant_only_loss requirement) - **BF16 mixed precision** for speed - **Gradient checkpointing** for memory - **Temperature annealing** (0.7 → 0.45) for exploration → exploitation - **Milestone checkpoints** at 10%, 50%, 100% ## Usage ### Basic Inference ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch import re model_id = "2796gauravc/lfm25-350m-linux-grpo" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" ) # Prepare prompt messages = [ {"role": "system", "content": "You are a Linux command assistant."}, {"role": "user", "content": "Find all PDF files modified in the last 7 days"} ] # Tokenize enc = tokenizer.apply_chat_template( messages, add_generation_prompt=True, return_tensors="pt" ) input_ids = enc.input_ids.to(model.device) attention_mask = enc.attention_mask.to(model.device) # Generate with torch.no_grad(): outputs = model.generate( input_ids=input_ids, attention_mask=attention_mask, max_new_tokens=100, do_sample=False, pad_token_id=tokenizer.pad_token_id ) # Decode with special tokens preserved response = tokenizer.decode( outputs[0][input_ids.size(-1):], skip_special_tokens=False ) # Extract command match = re.search( r"<\|tool_call_start\|>(.*?)<\|tool_call_end\|>", response, re.DOTALL ) if match: command = match.group(1).strip() print(f"Generated: {command}") # Output: find . -name "*.py" -mtime -7 ``` ### Hardware Requirements | Mode | VRAM | RAM | Speed | |------|------|-----|-------| | Inference (GPU) | 2GB | 4GB | ~100 tokens/s | | Inference (CPU) | - | 4GB | ~20 tokens/s | | Training (SFT) | 16GB | 32GB | ~2 hrs | | Training (GRPO) | 20GB | 32GB | ~3 hrs | ## Limitations & Honest Assessment ### What It Does Well 1. ✅ **Format compliance**: Always uses correct special tokens 2. ✅ **Simple commands**: Good at basic file operations, text processing 3. ✅ **Edge deployment**: Small enough to run on consumer hardware 4. ✅ **No function wrappers**: Clean raw command output ### What It Struggles With 1. ❌ **Complex pipelines**: Multi-stage commands with pipes 2. ❌ **Exact match**: Only 24% match reference exactly (but many alternatives are valid) 3. ❌ **Edge cases**: Unusual flags or rare utilities 4. ❌ **Context awareness**: No memory of previous commands ### Known Issues 1. **Semantic equivalence not string equivalence**: Many valid bash commands exist for the same task. The model may generate a correct alternative that doesn't match the reference string. 2. **Safety**: While we filter dangerous patterns in training, the model could still suggest risky commands. Always review before execution. 3. **Overfitting to training patterns**: May repeat common patterns from the training data. ## Citation ```bibtex @misc{lfm25-350m-linux-grpo, title={LFM2.5-350M Linux Command Generator}, author={Gaurav Chauhan}, year={2026}, howpublished={\url{https://huggingface.co/2796gauravc/lfm25-350m-linux-grpo}}, note={350M parameter NL2Bash model with LoRA + GRPO training} } ``` ## License MIT License - See LICENSE file for details. Base model: [LiquidAI/LFM2.5-350M](https://huggingface.co/LiquidAI/LFM2.5-350M) (Apache 2.0) ## Acknowledgments - **LiquidAI** for the LFM2.5 base model - **HuggingFace** for Transformers and TRL libraries - **Azure OpenAI** for dataset generation API