--- language: - fr - en license: apache-2.0 base_model: moncefem/Mistral-7B-v0.3-Legal-Competition tags: - mistral - legal - law - instruction-tuning - chat - qlora - lora --- # Mistral-7B v0.3 Legal Competition – Instruct This model is an **instruction-tuned** version of **Mistral-7B-v0.3-Legal-Competition**, specialized for **legal reasoning and question answering**, with a strong focus on: - French and European law - Competition law - Regulatory and legal analysis - Multilingual legal instructions (French / English) The model was fine-tuned using **QLoRA** and then **fully merged** into the base model for efficient inference. --- ## 🧠 Training Details - **Base model**: `moncefem/Mistral-7B-v0.3-Legal-Competition` - **Fine-tuning method**: QLoRA (4-bit during training) - **Final model**: Fully merged (no LoRA required at inference) - **Context length**: 4096 tokens - **Training format**: Mistral `[INST] ... [/INST]` chat template - **Domains**: - Legal Q&A (synthetic + curated) - French QA (CATIE-AQ / FQuAD-style) - Instruction-following (French & English) - Reasoning & math (OpenOrca, GSM8K) --- ## 💬 Chat Format The model uses the **Mistral Instruct v0.3** format: ```text [INST] User question [/INST] Assistant answer ``` Example: ```python from transformers import AutoTokenizer, AutoModelForCausalLM model_name = "moncefem/Mistral-7B-v0.3-Legal-Competition-Instruct" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", torch_dtype="auto" ) messages = [ {"role": "user", "content": "Quelle est la sanction maximale de l'Autorité de la concurrence ?"} ] prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate( **inputs, max_new_tokens=500, temperature=0.7, top_p=0.9 ) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` 📜 License Apache 2.0. 🙏 Acknowledgements Mistral AI Hugging Face Open-source legal and QA datasets