Llama-3.1-8B-Instruct

This repository contains a pre-compiled build of meta-llama/Llama-3.1-8B-Instruct for running it on FuriosaAI RNGD with Furiosa-LLM.

Overview

Llama-3.1-8B-Instruct is Meta's 8B instruction-tuned model, an auto-regressive dense transformer optimized for multilingual dialogue, instruction following, and tool usage. Its intended use is the same as the upstream meta-llama/Llama-3.1-8B-Instruct, and it is released under the Llama 3.1 Community License.

Architecture: Llama 3.1 (dense)
Input / Output: Text / Text
Supported Inference Engine: Furiosa LLM
Supported Hardware: FuriosaAI RNGD

Quantization

No quantization — the model runs in its native 16-bit precision.

Features

Tool calling. The model supports tool (function) calling through the llama3_json tool-call parser, the parser used by the Llama 3 series.

Parallelism Strategy

On RNGD, Llama-3.1-8B-Instruct runs with a tensor-parallel size of 8 PEs, which maps to a single RNGD card (8 PEs per card).

Usage

To run this model with Furiosa-LLM, follow the example commands below after installing Furiosa-LLM and its prerequisites.

Launch the server

The simplest way to serve the model is:

# Launch the server, listening on port 8000 by default
furiosa-llm serve furiosa-ai/Llama-3.1-8B-Instruct

When the server is ready, you will see:

INFO:     Started server process [27507]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Launch the server with tool calling

To enable tool (function) calling, start the server with the llama3_json tool-call parser:

furiosa-llm serve furiosa-ai/Llama-3.1-8B-Instruct \
  --enable-auto-tool-choice \
  --tool-call-parser llama3_json

Query the server

The server exposes an OpenAI-compatible API. You can send a request with curl:

curl http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "furiosa-ai/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "What is the capital of France?"}]
    }' \
    | python -m json.tool

Tool calling

With the server launched using --enable-auto-tool-choice --tool-call-parser llama3_json, you can pass tools and let the model decide when to call them. See the Tool Calling guide for a complete client example and details on tool-choice options.

Learn more

Tool Calling — parsers, tool-choice options, and more examples
Furiosa-LLM Server (furiosa-llm serve) — full OpenAI-compatible API reference and serving options
meta-llama/Llama-3.1-8B-Instruct — upstream model card

Downloads last month: 1,687

Model tree for furiosa-ai/Llama-3.1-8B-Instruct

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

(2832)

this model

Collection including furiosa-ai/Llama-3.1-8B-Instruct

Llama 3.1

Collection

2 items • Updated Aug 28, 2025