Update README.md

696b1b5 verified about 2 years ago

6.07 kB

	---
	tags:
	- merge
	- mergekit
	- moe
	- frankenmoe
	- abacusai/Llama-3-Smaug-8B
	- cognitivecomputations/dolphin-2.9-llama3-8b
	- Weyaxi/Einstein-v6.1-Llama3-8B
	- dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2
	base_model:
	- abacusai/Llama-3-Smaug-8B
	- cognitivecomputations/dolphin-2.9-llama3-8b
	- Weyaxi/Einstein-v6.1-Llama3-8B
	- dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2
	license: apache-2.0
	---

	![](https://raw.githubusercontent.com/saucam/models/main/skyro.png)

	# 🚀 Skyro-4X8B
	Skyro-4X8B is a Mixure of Experts (MoE) made with the following models using [Mergekit](https://github.com/arcee-ai/mergekit):

	* [abacusai/Llama-3-Smaug-8B](https://huggingface.co/abacusai/Llama-3-Smaug-8B)
	* [cognitivecomputations/dolphin-2.9-llama3-8b](https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b)
	* [Weyaxi/Einstein-v6.1-Llama3-8B](https://huggingface.co/Weyaxi/Einstein-v6.1-Llama3-8B)
	* [dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2](https://huggingface.co/dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2)

	## 🧩 Configuration

	```yamlname: "Skyro-4X8B"
	base_model: meta-llama/Meta-Llama-3-8B
	gate_mode: hidden
	experts:
	- source_model: abacusai/Llama-3-Smaug-8B
	positive_prompts:
	- "chat"
	- "assistant"
	- "tell me"
	- "explain"
	- "I want"
	- source_model: cognitivecomputations/dolphin-2.9-llama3-8b
	positive_prompts:
	- "math"
	- "mathematics"
	- "code"
	- "engineering"
	- "solve"
	- "logic"
	- "rationality"
	- "puzzle"
	- "solve"
	- source_model: Weyaxi/Einstein-v6.1-Llama3-8B
	positive_prompts:
	- "science"
	- "medical"
	- "physics"
	- "engineering"
	- "math"
	- "logic"
	- "rationality"
	- "mathematics"
	- "solve"
	- source_model: dreamgen-preview/opus-v1.2-llama-3-8b-base-run3.4-epoch2
	positive_prompts:
	- "story"
	- "roleplay"
	- "role-play"
	- "storywriting"
	- "character"
	- "narrative"
	- "creative"
	```

	## Evaluation


	\|Average\|ARC\|HellaSwag\|MMLU\|TruthfulQA\|Winogrande\|GSM8K\|
	\|-------\|---\|---------\|----\|----------\|----------\|-----\|
	\|66.39\|61.26\|82.38\|66.67\|50.15\|77.66\|60.2\|

	## 💻 Usage

	```python
	!pip install -qU transformers accelerate

	from transformers import AutoTokenizer
	import transformers
	import torch

	model = "saucam/Skyro-4X8B"
	messages = [{"role": "user", "content": "In a student council election, candidate A got 20% of the votes while candidate B got 50% more than candidate A's votes. The rest of the votes was given to candidate C. If there were 100 voters, how many votes did candidate C get?"}]

	tokenizer = AutoTokenizer.from_pretrained(model)
	prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	pipeline = transformers.pipeline(
	"text-generation",
	model=model,
	torch_dtype=torch.float16,
	device_map="auto",
	)

	outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
	print(outputs[0]["generated_text"])
	```

	## Sample output

	```
	config.json: 100%\|██████████████████████████████████████████████████████████████\| 878/878 [00:00<00:00, 4.18MB/s]
	model.safetensors.index.json: 100%\|██████████████████████████████████████████\| 53.5k/53.5k [00:00<00:00, 101MB/s]
	model-00001-of-00006.safetensors: 100%\|█████████████████████████████████████\| 9.89G/9.89G [03:47<00:00, 43.4MB/s]
	model-00002-of-00006.safetensors: 100%\|█████████████████████████████████████\| 9.98G/9.98G [03:23<00:00, 49.0MB/s]
	model-00003-of-00006.safetensors: 100%\|█████████████████████████████████████\| 9.98G/9.98G [03:44<00:00, 44.5MB/s]
	model-00004-of-00006.safetensors: 100%\|█████████████████████████████████████\| 9.90G/9.90G [03:30<00:00, 46.9MB/s]
	model-00005-of-00006.safetensors: 100%\|█████████████████████████████████████\| 9.08G/9.08G [03:08<00:00, 48.1MB/s]
	model-00006-of-00006.safetensors: 100%\|█████████████████████████████████████\| 1.05G/1.05G [00:20<00:00, 51.3MB/s]
	Downloading shards: 100%\|█████████████████████████████████████████████████████████\| 6/6 [17:58<00:00, 179.78s/it]
	Loading checkpoint shards: 100%\|███████████████████████████████████████████████████\| 6/6 [01:27<00:00, 14.59s/it]
	WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
	Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

	<\|im_start\|>user
	In a student council election, candidate A got 20% of the votes while candidate B got 50% more than candidate A's votes. The rest of the votes was given to candidate C. If there were 100 voters, how many votes did candidate C get?<\|im_end\|>
	<\|im_start\|>assistant
	Let's denote the number of votes candidate A got as \( A \).

	Candidate B got 50% more votes than candidate A, so candidate B got \( A + 0.5A = 1.5A \) votes.

	Candidate C got the rest of the votes, which means \( C = 100 - (A + 1.5A) \).

	We know that candidate A got 20% of the votes, so \( A = 20\% \times 100 = 20 \).

	Now we can calculate candidate C's votes:
	\( C = 100 - (20 + 1.5 \times 20) \)
	\( C = 100 - (20 + 30) \)
	\( C = 100 - 50 \)
	\( C = 50 \).

	Therefore, candidate C got 50 votes.<\|im_end\|>
	```