Qwen3-Reranker-8B

This repository contains a pre-compiled build of Qwen/Qwen3-Reranker-8B for running it on FuriosaAI RNGD with Furiosa-LLM.

Overview

Qwen3-Reranker-8B is the 8B reranking model in the Qwen3-Embedding series, built on the Qwen3 dense transformer backbone. Given a query and a set of candidate documents, it produces relevance scores used to reorder retrieval results — a common second stage in retrieval-augmented generation (RAG) and search pipelines. Its intended use is the same as the upstream Qwen/Qwen3-Reranker-8B, and it is released under the Apache 2.0 License.

Architecture: Qwen3 (dense)
Input / Output: Text (query-document pairs) / Relevance score
Supported Inference Engine: Furiosa LLM
Supported Hardware: FuriosaAI RNGD

Quantization

No quantization — the model runs in its native 16-bit precision.

Parallelism Strategy

On RNGD, Qwen3-Reranker-8B runs with a tensor-parallel size of 8 PEs, which maps to a single RNGD card (8 PEs per card).

Usage

To run this model with Furiosa-LLM, follow the example below after installing Furiosa-LLM and its prerequisites.

This is a reranker model, so it is used through the Furiosa-LLM Python API rather than the OpenAI-compatible server. Load the artifact and call score with query-document pairs to obtain relevance scores:

from furiosa_llm import LLM

llm = LLM.from_artifacts("furiosa-ai/Qwen3-Reranker-8B")
scores = llm.score([("query", "document1"), ("query", "document2")])

Learn more

Furiosa-LLM — Furiosa-LLM documentation and API reference
Qwen/Qwen3-Reranker-8B — upstream model card

Downloads last month: 926

Model tree for furiosa-ai/Qwen3-Reranker-8B

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-Reranker-8B

Finetuned

(5)

this model

Collection including furiosa-ai/Qwen3-Reranker-8B

Qwen3 & Qwen3 VL

Collection

9 items • Updated 4 days ago