allenai/OLMoE-mix-0924
Preview • Updated • 3.42k • 55
How to use allenai/OLMoE-1B-7B-0924-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="allenai/OLMoE-1B-7B-0924-GGUF", filename="olmoe-1b-7b-0924-f16.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
How to use allenai/OLMoE-1B-7B-0924-GGUF with llama.cpp:
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf allenai/OLMoE-1B-7B-0924-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf allenai/OLMoE-1B-7B-0924-GGUF:Q4_K_M
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf allenai/OLMoE-1B-7B-0924-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf allenai/OLMoE-1B-7B-0924-GGUF:Q4_K_M
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf allenai/OLMoE-1B-7B-0924-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf allenai/OLMoE-1B-7B-0924-GGUF:Q4_K_M
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf allenai/OLMoE-1B-7B-0924-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf allenai/OLMoE-1B-7B-0924-GGUF:Q4_K_M
docker model run hf.co/allenai/OLMoE-1B-7B-0924-GGUF:Q4_K_M
How to use allenai/OLMoE-1B-7B-0924-GGUF with Ollama:
ollama run hf.co/allenai/OLMoE-1B-7B-0924-GGUF:Q4_K_M
How to use allenai/OLMoE-1B-7B-0924-GGUF with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for allenai/OLMoE-1B-7B-0924-GGUF to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for allenai/OLMoE-1B-7B-0924-GGUF to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for allenai/OLMoE-1B-7B-0924-GGUF to start chatting
How to use allenai/OLMoE-1B-7B-0924-GGUF with Docker Model Runner:
docker model run hf.co/allenai/OLMoE-1B-7B-0924-GGUF:Q4_K_M
How to use allenai/OLMoE-1B-7B-0924-GGUF with Lemonade:
# Download Lemonade from https://lemonade-server.ai/ lemonade pull allenai/OLMoE-1B-7B-0924-GGUF:Q4_K_M
lemonade run user.OLMoE-1B-7B-0924-GGUF-Q4_K_M
lemonade list
GGUF version of https://huggingface.co/allenai/OLMoE-1B-7B-0924
@misc{muennighoff2024olmoeopenmixtureofexpertslanguage,
title={OLMoE: Open Mixture-of-Experts Language Models},
author={Niklas Muennighoff and Luca Soldaini and Dirk Groeneveld and Kyle Lo and Jacob Morrison and Sewon Min and Weijia Shi and Pete Walsh and Oyvind Tafjord and Nathan Lambert and Yuling Gu and Shane Arora and Akshita Bhagia and Dustin Schwenk and David Wadden and Alexander Wettig and Binyuan Hui and Tim Dettmers and Douwe Kiela and Ali Farhadi and Noah A. Smith and Pang Wei Koh and Amanpreet Singh and Hannaneh Hajishirzi},
year={2024},
eprint={2409.02060},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.02060},
}
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit
Base model
allenai/OLMoE-1B-7B-0924