Qwen3-Reranker-8B

This repository contains a pre-compiled build of Qwen/Qwen3-Reranker-8B for running it on FuriosaAI RNGD with Furiosa-LLM.

Overview

Qwen3-Reranker-8B is the 8B reranking model in the Qwen3-Embedding series, built on the Qwen3 dense transformer backbone. Given a query and a set of candidate documents, it produces relevance scores used to reorder retrieval results โ€” a common second stage in retrieval-augmented generation (RAG) and search pipelines. Its intended use is the same as the upstream Qwen/Qwen3-Reranker-8B, and it is released under the Apache 2.0 License.

  • Architecture: Qwen3 (dense)
  • Input / Output: Text (query-document pairs) / Relevance score
  • Supported Inference Engine: Furiosa LLM
  • Supported Hardware: FuriosaAI RNGD

Quantization

No quantization โ€” the model runs in its native 16-bit precision.

Parallelism Strategy

On RNGD, Qwen3-Reranker-8B runs with a tensor-parallel size of 8 PEs, which maps to a single RNGD card (8 PEs per card).

Usage

To run this model with Furiosa-LLM, follow the example below after installing Furiosa-LLM and its prerequisites.

This is a reranker model, so it is used through the Furiosa-LLM Python API rather than the OpenAI-compatible server. Load the artifact and call score with query-document pairs to obtain relevance scores:

from furiosa_llm import LLM

llm = LLM.from_artifacts("furiosa-ai/Qwen3-Reranker-8B")
scores = llm.score([("query", "document1"), ("query", "document2")])

Learn more

Downloads last month
926
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for furiosa-ai/Qwen3-Reranker-8B

Finetuned
(5)
this model

Collection including furiosa-ai/Qwen3-Reranker-8B