Qwen3-Reranker-8B
This repository contains a pre-compiled build of Qwen/Qwen3-Reranker-8B for running it on FuriosaAI RNGD with Furiosa-LLM.
Overview
Qwen3-Reranker-8B is the 8B reranking model in the Qwen3-Embedding series, built on the Qwen3 dense transformer backbone. Given a query and a set of candidate documents, it produces relevance scores used to reorder retrieval results โ a common second stage in retrieval-augmented generation (RAG) and search pipelines. Its intended use is the same as the upstream Qwen/Qwen3-Reranker-8B, and it is released under the Apache 2.0 License.
- Architecture: Qwen3 (dense)
- Input / Output: Text (query-document pairs) / Relevance score
- Supported Inference Engine: Furiosa LLM
- Supported Hardware: FuriosaAI RNGD
Quantization
No quantization โ the model runs in its native 16-bit precision.
Parallelism Strategy
On RNGD, Qwen3-Reranker-8B runs with a tensor-parallel size of 8 PEs, which maps to a single RNGD card (8 PEs per card).
Usage
To run this model with Furiosa-LLM, follow the example below after installing Furiosa-LLM and its prerequisites.
This is a reranker model, so it is used through the Furiosa-LLM Python API rather
than the OpenAI-compatible server. Load the artifact and call score with
query-document pairs to obtain relevance scores:
from furiosa_llm import LLM
llm = LLM.from_artifacts("furiosa-ai/Qwen3-Reranker-8B")
scores = llm.score([("query", "document1"), ("query", "document2")])
Learn more
- Furiosa-LLM โ Furiosa-LLM documentation and API reference
- Qwen/Qwen3-Reranker-8B โ upstream model card
- Downloads last month
- 926