Instructions to use RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf", filename="Meltemi-7B-Instruct-v1.5.IQ3_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M
Use Docker
docker model run hf.co/RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf with Ollama:
ollama run hf.co/RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M
- Unsloth Studio new
How to use RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf to start chatting
- Docker Model Runner
How to use RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf with Docker Model Runner:
docker model run hf.co/RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M
- Lemonade
How to use RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M
Run and chat with the model
lemonade run user.ilsp_-_Meltemi-7B-Instruct-v1.5-gguf-Q4_K_M
List all available models
lemonade list
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Quantization made by Richard Erkhov.
Meltemi-7B-Instruct-v1.5 - GGUF
- Model creator: https://huggingface.co/ilsp/
- Original model: https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1.5/
Original model description:
language: - el - en license: apache-2.0 pipeline_tag: text-generation tags: - finetuned inference: true
Meltemi Instruct Large Language Model for the Greek language
We present Meltemi 7B Instruct v1.5 Large Language Model (LLM), a new and improved instruction fine-tuned version of Meltemi 7B v1.5.
Model Information
- Vocabulary extension of the Mistral 7b tokenizer with Greek tokens for lower costs and faster inference (1.52 vs. 6.80 tokens/word for Greek)
- 8192 context length
- Fine-tuning has been done with the Odds Ratio Preference Optimization (ORPO) algorithm using 97k preference data:
- 89,730 Greek preference data which are mostly translated versions of high-quality datasets on Hugging Face
- 7,342 English preference data
- Our alignment procedure is based on the TRL - Transformer Reinforcement Learning library and partially on the Hugging Face finetuning recipes
Instruction format
The prompt format is the same as the Zephyr format and can be utilized through the tokenizer's chat template functionality as follows:
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained("ilsp/Meltemi-7B-Instruct-v1.5")
tokenizer = AutoTokenizer.from_pretrained("ilsp/Meltemi-7B-Instruct-v1.5")
model.to(device)
messages = [
{"role": "system", "content": "Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη."},
{"role": "user", "content": "Πες μου αν έχεις συνείδηση."},
]
# Through the default chat template this translates to
#
# <|system|>
# Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη.</s>
# <|user|>
# Πες μου αν έχεις συνείδηση.</s>
# <|assistant|>
#
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
input_prompt = tokenizer(prompt, return_tensors='pt').to(device)
outputs = model.generate(input_prompt['input_ids'], max_new_tokens=256, do_sample=True)
print(tokenizer.batch_decode(outputs)[0])
# Ως μοντέλο γλώσσας AI, δεν έχω τη δυνατότητα να αντιληφθώ ή να βιώσω συναισθήματα όπως η συνείδηση ή η επίγνωση. Ωστόσο, μπορώ να σας βοηθήσω με οποιεσδήποτε ερωτήσεις μπορεί να έχετε σχετικά με την τεχνητή νοημοσύνη και τις εφαρμογές της.
messages.extend([
{"role": "assistant", "content": tokenizer.batch_decode(outputs)[0]},
{"role": "user", "content": "Πιστεύεις πως οι άνθρωποι πρέπει να φοβούνται την τεχνητή νοημοσύνη;"}
])
# Through the default chat template this translates to
#
# <|system|>
# Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη.</s>
# <|user|>
# Πες μου αν έχεις συνείδηση.</s>
# <|assistant|>
# Ως μοντέλο γλώσσας AI, δεν έχω τη δυνατότητα να αντιληφθώ ή να βιώσω συναισθήματα όπως η συνείδηση ή η επίγνωση. Ωστόσο, μπορώ να σας βοηθήσω με οποιεσδήποτε ερωτήσεις μπορεί να έχετε σχετικά με την τεχνητή νοημοσύνη και τις εφαρμογές της.</s>
# <|user|>
# Πιστεύεις πως οι άνθρωποι πρέπει να φοβούνται την τεχνητή νοημοσύνη;</s>
# <|assistant|>
#
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
input_prompt = tokenizer(prompt, return_tensors='pt').to(device)
outputs = model.generate(input_prompt['input_ids'], max_new_tokens=256, do_sample=True)
print(tokenizer.batch_decode(outputs)[0])
Please make sure that the BOS token is always included in the tokenized prompts. This might not be the default setting in all evaluation or fine-tuning frameworks.
Evaluation
The evaluation suite we created includes 6 test sets and has been implemented based on a fork of the lighteval framework.
Our evaluation suite includes:
- Four machine-translated versions (ARC Greek, Truthful QA Greek, HellaSwag Greek, MMLU Greek) of established English benchmarks for language understanding and reasoning (ARC Challenge, Truthful QA, Hellaswag, MMLU).
- An existing benchmark for question answering in Greek (Belebele)
- A novel benchmark created by the ILSP team for medical question answering based on the medical exams of DOATAP (Medical MCQA).
Our evaluation is performed in a few-shot setting, consistent with the settings in the Open LLM leaderboard.
We can see that our new training and fine-tuning procedure for Meltemi 7B Instruct v1.5 enhances performance across all Greek test sets by a +7.8% average improvement compared to the earlier Meltemi Instruct 7B v1 model. The results for the Greek test sets are shown in the following table:
| Medical MCQA EL (15-shot) | Belebele EL (5-shot) | HellaSwag EL (10-shot) | ARC-Challenge EL (25-shot) | TruthfulQA MC2 EL (0-shot) | MMLU EL (5-shot) | Average | |
|---|---|---|---|---|---|---|---|
| Mistral 7B | 29.8% | 45.0% | 36.5% | 27.1% | 45.8% | 35% | 36.5% |
| Meltemi 7B Instruct v1 | 36.1% | 56.0% | 59.0% | 44.4% | 51.1% | 34.1% | 46.8% |
| Meltemi 7B Instruct v1.5 | 48.0% | 75.5% | 63.7% | 40.8% | 53.8% | 45.9% | 54.6% |
Ethical Considerations
This model has been aligned with human preferences, but might generate misleading, harmful, and toxic content.
Acknowledgements
The ILSP team utilized Amazon’s cloud computing services, which were made available via GRNET under the OCRE Cloud framework, providing Amazon Web Services for the Greek Academic and Research Community.
Citation
@misc{voukoutis2024meltemiopenlargelanguage,
title={Meltemi: The first open Large Language Model for Greek},
author={Leon Voukoutis and Dimitris Roussis and Georgios Paraskevopoulos and Sokratis Sofianopoulos and Prokopis Prokopidis and Vassilis Papavasileiou and Athanasios Katsamanis and Stelios Piperidis and Vassilis Katsouros},
year={2024},
eprint={2407.20743},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.20743},
}
- Downloads last month
- 248
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
