Llama-3.1-8B-Instruct
This repository contains a pre-compiled build of meta-llama/Llama-3.1-8B-Instruct for running it on FuriosaAI RNGD with Furiosa-LLM.
Overview
Llama-3.1-8B-Instruct is Meta's 8B instruction-tuned model, an auto-regressive dense transformer optimized for multilingual dialogue, instruction following, and tool usage. Its intended use is the same as the upstream meta-llama/Llama-3.1-8B-Instruct, and it is released under the Llama 3.1 Community License.
- Architecture: Llama 3.1 (dense)
- Input / Output: Text / Text
- Supported Inference Engine: Furiosa LLM
- Supported Hardware: FuriosaAI RNGD
Quantization
No quantization โ the model runs in its native 16-bit precision.
Features
- Tool calling. The model supports tool (function) calling through the
llama3_jsontool-call parser, the parser used by the Llama 3 series.
Parallelism Strategy
On RNGD, Llama-3.1-8B-Instruct runs with a tensor-parallel size of 8 PEs, which maps to a single RNGD card (8 PEs per card).
Usage
To run this model with Furiosa-LLM, follow the example commands below after installing Furiosa-LLM and its prerequisites.
Launch the server
The simplest way to serve the model is:
# Launch the server, listening on port 8000 by default
furiosa-llm serve furiosa-ai/Llama-3.1-8B-Instruct
When the server is ready, you will see:
INFO: Started server process [27507]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Launch the server with tool calling
To enable tool (function) calling, start the server with the llama3_json
tool-call parser:
furiosa-llm serve furiosa-ai/Llama-3.1-8B-Instruct \
--enable-auto-tool-choice \
--tool-call-parser llama3_json
Query the server
The server exposes an OpenAI-compatible API. You can send a request with curl:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "furiosa-ai/Llama-3.1-8B-Instruct",
"messages": [{"role": "user", "content": "What is the capital of France?"}]
}' \
| python -m json.tool
Tool calling
With the server launched using --enable-auto-tool-choice --tool-call-parser llama3_json,
you can pass tools and let the model decide when to call them. See the
Tool Calling guide
for a complete client example and details on tool-choice options.
Learn more
- Tool Calling โ parsers, tool-choice options, and more examples
- Furiosa-LLM Server (
furiosa-llm serve) โ full OpenAI-compatible API reference and serving options - meta-llama/Llama-3.1-8B-Instruct โ upstream model card
- Downloads last month
- 1,687
Model tree for furiosa-ai/Llama-3.1-8B-Instruct
Base model
meta-llama/Llama-3.1-8B