Instructions to use liminerity/mm4-3b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use liminerity/mm4-3b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="liminerity/mm4-3b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("liminerity/mm4-3b") model = AutoModelForCausalLM.from_pretrained("liminerity/mm4-3b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use liminerity/mm4-3b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "liminerity/mm4-3b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "liminerity/mm4-3b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/liminerity/mm4-3b
- SGLang
How to use liminerity/mm4-3b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "liminerity/mm4-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "liminerity/mm4-3b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "liminerity/mm4-3b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "liminerity/mm4-3b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use liminerity/mm4-3b with Docker Model Runner:
docker model run hf.co/liminerity/mm4-3b
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("liminerity/mm4-3b")
model = AutoModelForCausalLM.from_pretrained("liminerity/mm4-3b")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))MM4-3b
a llama based model i made thru extensive training and merging ill explain later i literally made so many models today
Title: Divergent Knowledge Enhancement through Retrograde Merging Strategies: Redefining Accuracy Perspectives in Language Model Evolution
Abstract: Have you picked up any bad habits, or have you ever learned to do something incorrectly, only to realize you must completly relearn whatever it is you're trying to accomplish? In this proposal, we present an innovative and unconventional approach to enhancing the performance and knowledge base of natural language models. Our proposed method, titled 'Divergent Knowledge Enhancement through Retrograde Merging Strategies' (DKE-RS), aims to challenge traditional practices in model development by incorporating a deliberate back-and-forth merger between high and low accuracy language models.
The initial conceptualization of DKE-RS stemmed from the realization that learning often encompasses both acquisition and unlearning, as encapsulated by the quote, "learning is just as sacred as unlearning." The proposed technique commences with a baseline model, 'blur-7b,' attaining an accuracy rate of 72.1%, subsequently merged with a Mistral fine-tuned model on the Dolphin dataset, only achieving a 46% accuracy level.
By deliberately merging with less accurate models and retracing the evolutionary process, DKE-RS aims to broaden the knowledge base of the resulting model. This strategy, dubbed 'making the bad good,' intentionally degrades the initial accuracy in an effort to refine it, thus breaking conventional iterative improvements for innovative progression.
image/png
The DKE-RS method challenges the status quo by not solely relying on a linear enhancement trajectory, instead adopting a more holistic and diverse approach. We anticipate that this non-linear merger process will further diversify the model's knowledge base, thereby creating a more resilient and well-rounded language generation tool, capable of handling complex contexts with a broader understanding.
Through thorough experimentation and analysis, we plan to assess the effectiveness and potential drawbacks of DKE-RS, comparing it to traditional merging techniques. The results from such evaluations will provide valuable insights into the efficacy of this divergent strategy in the landscape of natural language model development.
We posit that the Divergent Knowledge Enhancement through Retrograde Merging Strategies approach contributes a significant and compelling step forward in the field, provoking thought-provoking discourse about the nature of accuracy refinement and model progression.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
| Metric | Value |
|---|---|
| Avg. | 53.22 |
| AI2 Reasoning Challenge (25-Shot) | 44.80 |
| HellaSwag (10-Shot) | 70.41 |
| MMLU (5-Shot) | 50.90 |
| TruthfulQA (0-shot) | 43.20 |
| Winogrande (5-shot) | 66.22 |
| GSM8k (5-shot) | 43.82 |
- Downloads last month
- 353
Datasets used to train liminerity/mm4-3b
teknium/GPT4-LLM-Cleaned
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard44.800
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard70.410
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard50.900
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard43.200
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard66.220
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard43.820
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="liminerity/mm4-3b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)