Instructions to use RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf",
	filename="Meltemi-7B-Instruct-v1.5.IQ3_M.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M

Use Docker

docker model run hf.co/RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M

LM Studio
Jan
Ollama
How to use RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf with Ollama:
```
ollama run hf.co/RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M
```

Unsloth Studio new

How to use RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf to start chatting

Docker Model Runner
How to use RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf with Docker Model Runner:
```
docker model run hf.co/RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M
```

Lemonade

How to use RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull RichardErkhov/ilsp_-_Meltemi-7B-Instruct-v1.5-gguf:Q4_K_M

Run and chat with the model

lemonade run user.ilsp_-_Meltemi-7B-Instruct-v1.5-gguf-Q4_K_M

List all available models

lemonade list

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Quantization made by Richard Erkhov.

Github

Discord

Request more models

Meltemi-7B-Instruct-v1.5 - GGUF

Model creator: https://huggingface.co/ilsp/
Original model: https://huggingface.co/ilsp/Meltemi-7B-Instruct-v1.5/

Name	Quant method	Size
Meltemi-7B-Instruct-v1.5.Q2_K.gguf	Q2_K	2.66GB
Meltemi-7B-Instruct-v1.5.IQ3_XS.gguf	IQ3_XS	2.95GB
Meltemi-7B-Instruct-v1.5.IQ3_S.gguf	IQ3_S	3.1GB
Meltemi-7B-Instruct-v1.5.Q3_K_S.gguf	Q3_K_S	0.98GB
Meltemi-7B-Instruct-v1.5.IQ3_M.gguf	IQ3_M	3.2GB
Meltemi-7B-Instruct-v1.5.Q3_K.gguf	Q3_K	3.42GB
Meltemi-7B-Instruct-v1.5.Q3_K_M.gguf	Q3_K_M	3.42GB
Meltemi-7B-Instruct-v1.5.Q3_K_L.gguf	Q3_K_L	3.7GB
Meltemi-7B-Instruct-v1.5.IQ4_XS.gguf	IQ4_XS	3.58GB
Meltemi-7B-Instruct-v1.5.Q4_0.gguf	Q4_0	3.98GB
Meltemi-7B-Instruct-v1.5.IQ4_NL.gguf	IQ4_NL	0.25GB
Meltemi-7B-Instruct-v1.5.Q4_K_S.gguf	Q4_K_S	0.0GB
Meltemi-7B-Instruct-v1.5.Q4_K.gguf	Q4_K	0.0GB
Meltemi-7B-Instruct-v1.5.Q4_K_M.gguf	Q4_K_M	4.22GB
Meltemi-7B-Instruct-v1.5.Q4_1.gguf	Q4_1	4.4GB
Meltemi-7B-Instruct-v1.5.Q5_0.gguf	Q5_0	4.82GB
Meltemi-7B-Instruct-v1.5.Q5_K_S.gguf	Q5_K_S	0.87GB
Meltemi-7B-Instruct-v1.5.Q5_K.gguf	Q5_K	0.68GB
Meltemi-7B-Instruct-v1.5.Q5_K_M.gguf	Q5_K_M	4.95GB
Meltemi-7B-Instruct-v1.5.Q5_1.gguf	Q5_1	5.25GB
Meltemi-7B-Instruct-v1.5.Q6_K.gguf	Q6_K	2.12GB
Meltemi-7B-Instruct-v1.5.Q8_0.gguf	Q8_0	0.54GB

Original model description:

language: - el - en license: apache-2.0 pipeline_tag: text-generation tags: - finetuned inference: true

Meltemi Instruct Large Language Model for the Greek language

We present Meltemi 7B Instruct v1.5 Large Language Model (LLM), a new and improved instruction fine-tuned version of Meltemi 7B v1.5.

Model Information

Vocabulary extension of the Mistral 7b tokenizer with Greek tokens for lower costs and faster inference (1.52 vs. 6.80 tokens/word for Greek)
8192 context length
Fine-tuning has been done with the Odds Ratio Preference Optimization (ORPO) algorithm using 97k preference data:
- 89,730 Greek preference data which are mostly translated versions of high-quality datasets on Hugging Face
- 7,342 English preference data
Our alignment procedure is based on the TRL - Transformer Reinforcement Learning library and partially on the Hugging Face finetuning recipes

Instruction format

The prompt format is the same as the Zephyr format and can be utilized through the tokenizer's chat template functionality as follows:

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained("ilsp/Meltemi-7B-Instruct-v1.5")
tokenizer = AutoTokenizer.from_pretrained("ilsp/Meltemi-7B-Instruct-v1.5")

model.to(device)

messages = [
    {"role": "system", "content": "Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη."},
    {"role": "user", "content": "Πες μου αν έχεις συνείδηση."},
]

# Through the default chat template this translates to
#
# <|system|>
# Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη.</s>
# <|user|>
# Πες μου αν έχεις συνείδηση.</s>
# <|assistant|>
#

prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
input_prompt = tokenizer(prompt, return_tensors='pt').to(device)
outputs = model.generate(input_prompt['input_ids'], max_new_tokens=256, do_sample=True)

print(tokenizer.batch_decode(outputs)[0])
# Ως μοντέλο γλώσσας AI, δεν έχω τη δυνατότητα να αντιληφθώ ή να βιώσω συναισθήματα όπως η συνείδηση ή η επίγνωση. Ωστόσο, μπορώ να σας βοηθήσω με οποιεσδήποτε ερωτήσεις μπορεί να έχετε σχετικά με την τεχνητή νοημοσύνη και τις εφαρμογές της.

messages.extend([
    {"role": "assistant", "content": tokenizer.batch_decode(outputs)[0]},
    {"role": "user", "content": "Πιστεύεις πως οι άνθρωποι πρέπει να φοβούνται την τεχνητή νοημοσύνη;"}
])

# Through the default chat template this translates to
#
# <|system|>
# Είσαι το Μελτέμι, ένα γλωσσικό μοντέλο για την ελληνική γλώσσα. Είσαι ιδιαίτερα βοηθητικό προς την χρήστρια ή τον χρήστη και δίνεις σύντομες αλλά επαρκώς περιεκτικές απαντήσεις. Απάντα με προσοχή, ευγένεια, αμεροληψία, ειλικρίνεια και σεβασμό προς την χρήστρια ή τον χρήστη.</s>
# <|user|>
# Πες μου αν έχεις συνείδηση.</s>
# <|assistant|>
# Ως μοντέλο γλώσσας AI, δεν έχω τη δυνατότητα να αντιληφθώ ή να βιώσω συναισθήματα όπως η συνείδηση ή η επίγνωση. Ωστόσο, μπορώ να σας βοηθήσω με οποιεσδήποτε ερωτήσεις μπορεί να έχετε σχετικά με την τεχνητή νοημοσύνη και τις εφαρμογές της.</s>
# <|user|>
# Πιστεύεις πως οι άνθρωποι πρέπει να φοβούνται την τεχνητή νοημοσύνη;</s>
# <|assistant|>
#

prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
input_prompt = tokenizer(prompt, return_tensors='pt').to(device)
outputs = model.generate(input_prompt['input_ids'], max_new_tokens=256, do_sample=True)

print(tokenizer.batch_decode(outputs)[0])

Please make sure that the BOS token is always included in the tokenized prompts. This might not be the default setting in all evaluation or fine-tuning frameworks.

Evaluation

The evaluation suite we created includes 6 test sets and has been implemented based on a fork of the lighteval framework.

Our evaluation suite includes:

Four machine-translated versions (ARC Greek, Truthful QA Greek, HellaSwag Greek, MMLU Greek) of established English benchmarks for language understanding and reasoning (ARC Challenge, Truthful QA, Hellaswag, MMLU).
An existing benchmark for question answering in Greek (Belebele)
A novel benchmark created by the ILSP team for medical question answering based on the medical exams of DOATAP (Medical MCQA).

Our evaluation is performed in a few-shot setting, consistent with the settings in the Open LLM leaderboard.

We can see that our new training and fine-tuning procedure for Meltemi 7B Instruct v1.5 enhances performance across all Greek test sets by a +7.8% average improvement compared to the earlier Meltemi Instruct 7B v1 model. The results for the Greek test sets are shown in the following table:

	Medical MCQA EL (15-shot)	Belebele EL (5-shot)	HellaSwag EL (10-shot)	ARC-Challenge EL (25-shot)	TruthfulQA MC2 EL (0-shot)	MMLU EL (5-shot)	Average
Mistral 7B	29.8%	45.0%	36.5%	27.1%	45.8%	35%	36.5%
Meltemi 7B Instruct v1	36.1%	56.0%	59.0%	44.4%	51.1%	34.1%	46.8%
Meltemi 7B Instruct v1.5	48.0%	75.5%	63.7%	40.8%	53.8%	45.9%	54.6%

Ethical Considerations

This model has been aligned with human preferences, but might generate misleading, harmful, and toxic content.

Acknowledgements

The ILSP team utilized Amazon’s cloud computing services, which were made available via GRNET under the OCRE Cloud framework, providing Amazon Web Services for the Greek Academic and Research Community.

Citation

@misc{voukoutis2024meltemiopenlargelanguage,
      title={Meltemi: The first open Large Language Model for Greek}, 
      author={Leon Voukoutis and Dimitris Roussis and Georgios Paraskevopoulos and Sokratis Sofianopoulos and Prokopis Prokopidis and Vassilis Papavasileiou and Athanasios Katsamanis and Stelios Piperidis and Vassilis Katsouros},
      year={2024},
      eprint={2407.20743},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.20743}, 
}