Instructions to use Open-Orca/LlongOrca-7B-16k with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Open-Orca/LlongOrca-7B-16k with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Open-Orca/LlongOrca-7B-16k")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Open-Orca/LlongOrca-7B-16k") model = AutoModelForMultimodalLM.from_pretrained("Open-Orca/LlongOrca-7B-16k") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Open-Orca/LlongOrca-7B-16k with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Open-Orca/LlongOrca-7B-16k" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open-Orca/LlongOrca-7B-16k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Open-Orca/LlongOrca-7B-16k
- SGLang
How to use Open-Orca/LlongOrca-7B-16k with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Open-Orca/LlongOrca-7B-16k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open-Orca/LlongOrca-7B-16k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Open-Orca/LlongOrca-7B-16k" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open-Orca/LlongOrca-7B-16k", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Open-Orca/LlongOrca-7B-16k with Docker Model Runner:
docker model run hf.co/Open-Orca/LlongOrca-7B-16k
Update README.md
Browse files
README.md
CHANGED
|
@@ -54,6 +54,21 @@ https://AlignmentLab.ai
|
|
| 54 |
|
| 55 |
We used [OpenAI's Chat Markup Language (ChatML)](https://github.com/openai/openai-python/blob/main/chatml.md) format, with `<|im_start|>` and `<|im_end|>` tokens added to support this.
|
| 56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
# Evaluation
|
| 58 |
|
| 59 |
We have evaluated using the methodology and tools for the HuggingFace Leaderboard, and find that we have significantly improved upon the base long context model.
|
|
|
|
| 54 |
|
| 55 |
We used [OpenAI's Chat Markup Language (ChatML)](https://github.com/openai/openai-python/blob/main/chatml.md) format, with `<|im_start|>` and `<|im_end|>` tokens added to support this.
|
| 56 |
|
| 57 |
+
## Example Prompt Exchange
|
| 58 |
+
|
| 59 |
+
```
|
| 60 |
+
<|im_start|>system
|
| 61 |
+
You are LlongOrca, a large language model trained by Alignment Lab AI. Write out your reasoning step-by-step to be sure you get the right answers!
|
| 62 |
+
<|im_end|>
|
| 63 |
+
<|im_start|>user
|
| 64 |
+
How are you<|im_end|>
|
| 65 |
+
<|im_start|>assistant
|
| 66 |
+
I am doing well!<|im_end|>
|
| 67 |
+
<|im_start|>user
|
| 68 |
+
How are you now?<|im_end|>
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
|
| 72 |
# Evaluation
|
| 73 |
|
| 74 |
We have evaluated using the methodology and tools for the HuggingFace Leaderboard, and find that we have significantly improved upon the base long context model.
|