---
language:
- en
license: mit
library_name: transformers
tags:
- reranking
- information-retrieval
- listwise
- generative
- llama
- chain-of-thought
base_model: meta-llama/Llama-3.1-8B
datasets:
- abdoelsayed/DeAR-COT
pipeline_tag: text-generation
---

# DeAR-8B-Reranker-Listwise-v1

## Model Description

**DeAR-8B-Reranker-Listwise-v1** is an 8B parameter listwise neural reranker that generates document rankings through text generation. Unlike pointwise models that score documents independently, this model considers multiple documents simultaneously and produces rankings with Chain-of-Thought reasoning.

## Model Details

- **Model Type:** Listwise Reranker (Causal Language Model)
- **Base Model:** LLaMA-3.1-8B
- **Parameters:** 8 billion
- **Training Method:** Supervised Fine-tuning with Chain-of-Thought
- **Training Data:** [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
- **Training Framework:** LLaMA-Factory
- **Precision:** BFloat16

## Key Features

✅ **Listwise Ranking:** Considers inter-document dependencies  
✅ **Chain-of-Thought:** Generates reasoning for ranking decisions  
✅ **State-of-the-Art:** Best performance on NovelEval (90.97 NDCG@10)  
✅ **Flexible:** Handles variable numbers of documents  
✅ **Interpretable:** Provides explanations for rankings  

## Performance

| Benchmark | NDCG@10 | vs. GPT-4 |
|-----------|---------|-----------|
| TREC DL19 | 77.91 | +2.32 |
| TREC DL20 | 75.63 | +5.07 |
| NovelEval | **90.97** | **+3.09** |
| BEIR (Avg) | 46.8 | +2.3 |

**Key Achievement:** Outperforms GPT-4 on NovelEval by +3.09 points!

## Usage

### Quick Start

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load model
model_path = "abdoelsayed/dear-8b-reranker-listwise-v1"
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Prepare input
query = "When did Thomas Edison invent the light bulb?"
documents = [
    "Lightning strike at Seoul National University",
    "Thomas Edison tried to invent a device for car but failed",
    "Coffee is good for diet",
    "KEPCO fixes light problems",
    "Thomas Edison invented the light bulb in 1879",
]

# Create listwise prompt
doc_list = "\n".join([f"[{i}] {doc}" for i, doc in enumerate(documents)])
prompt = f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
Rank the passages based on their relevance to the search query: {query}.

{doc_list}

Search Query: {query}.
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""

# Generate ranking
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=2048)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=50,
        temperature=0.7,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id
    )

ranking_text = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(f"Ranking: {ranking_text}")
# Output: [4] > [1] > [0] > [3] > [2]
```

### Complete Reranking Pipeline

```python
import torch
from typing import List
from transformers import AutoTokenizer, AutoModelForCausalLM
import re

class ListwiseReranker:
    def __init__(self, model_path: str, device: str = "auto"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
        self.model = AutoModelForCausalLM.from_pretrained(
            model_path,
            torch_dtype=torch.bfloat16,
            device_map=device,
            low_cpu_mem_usage=True
        )
        
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token
    
    def create_prompt(self, query: str, documents: List[str], max_doc_len: int = 300) -> str:
        """Create listwise ranking prompt."""
        doc_list = "\n".join([f"[{i}] {doc[:max_doc_len]}" for i, doc in enumerate(documents)])
        
        prompt = f"""I will provide you with {len(documents)} passages, each indicated by a number identifier [].
Rank the passages based on their relevance to the search query: {query}.

{doc_list}

Search Query: {query}.
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers."""
        
        return prompt
    
    def parse_ranking(self, output_text: str, num_docs: int) -> List[int]:
        """Parse model output to extract ranking."""
        # Extract numbers from output
        numbers = re.findall(r'\[(\d+)\]', output_text)
        numbers = [int(n) for n in numbers if int(n) < num_docs]
        
        # Add missing documents at the end
        ranked = numbers.copy()
        for i in range(num_docs):
            if i not in ranked:
                ranked.append(i)
        
        return ranked[:num_docs]
    
    def rerank(
        self,
        query: str,
        documents: List[str],
        max_new_tokens: int = 50,
        temperature: float = 0.7
    ) -> List[int]:
        """
        Rerank documents for a query.
        
        Args:
            query: Search query
            documents: List of document texts
            max_new_tokens: Max tokens to generate
            temperature: Sampling temperature
        
        Returns:
            List of document indices ranked by relevance
        """
        prompt = self.create_prompt(query, documents)
        
        inputs = self.tokenizer(
            prompt,
            return_tensors="pt",
            truncation=True,
            max_length=2048
        )
        inputs = {k: v.to(self.model.device) for k, v in inputs.items()}
        
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_new_tokens=max_new_tokens,
                temperature=temperature,
                do_sample=False,
                pad_token_id=self.tokenizer.pad_token_id
            )
        
        output_text = self.tokenizer.decode(
            outputs[0][inputs['input_ids'].shape[1]:],
            skip_special_tokens=True
        )
        
        ranking = self.parse_ranking(output_text, len(documents))
        return ranking


# Example usage
reranker = ListwiseReranker("abdoelsayed/dear-8b-reranker-listwise-v1")

query = "What are the health benefits of green tea?"
documents = [
    "Green tea is a popular beverage in Asian countries.",
    "Studies show green tea contains antioxidants that may reduce inflammation.",
    "Coffee is another caffeinated drink consumed worldwide.",
    "Green tea has been linked to improved brain function and fat loss.",
    "The weather today is sunny and warm.",
]

ranking = reranker.rerank(query, documents)
print(f"Ranked indices: {ranking}")
# Output: [1, 3, 0, 2, 4]

# Display ranked documents
for rank, idx in enumerate(ranking, 1):
    print(f"{rank}. {documents[idx]}")
```


## Training Details

### Training Data
- **Dataset:** [DeAR-COT](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
- **Format:** Instruction-following with ranking outputs

### Training Configuration
```yaml
model_name: meta-llama/Llama-3.1-8B
task_type: sft
training_method: listwise_ranking
framework: LLaMA-Factory

hyperparameters:
  learning_rate: 1e-5
  batch_size: 4
  gradient_accumulation: 4
  epochs: 2
  max_length: 2048
  warmup_ratio: 0.1
  weight_decay: 0.01
  optimizer: adamw_torch
  lr_scheduler: cosine

distributed:
  method: torch.distributed.run
  num_gpus: 4
  deepspeed: zero2
```

### Hardware
- **GPUs:** 4x NVIDIA A100 (80GB)
- **Training Time:** ~30 hours
- **Framework:** LLaMA-Factory with DeepSpeed
- **Memory Usage:** ~70GB per GPU

### Prompt Format

**Training Format:**
```
I will provide you with {N} passages, each indicated by a number identifier [].
Rank the passages based on their relevance to the search query: {query}.

[0] {doc_0}
[1] {doc_1}
...
[N-1] {doc_N-1}

Search Query: {query}.
Rank the passages above based on their relevance to the search query. Output the ranking as a list of numbers.

Answer: [most_relevant] > [second] > ... > [least_relevant]
```

## Evaluation Results

### TREC Deep Learning

| Method | DL19 (NDCG@10) | DL20 (NDCG@10) | Average |
|--------|----------------|----------------|---------|
| BM25 | 50.58 | 47.96 | 49.27 |
| RankGPT-4 | 75.59 | 70.56 | 73.08 |
| **DeAR-L-8B** | **77.91** | **75.63** | **76.77** |

### NovelEval-2306 (Novel Query Generalization)

| Method | NDCG@1 | NDCG@5 | NDCG@10 | Average |
|--------|--------|--------|---------|---------|
| BM25 | 33.33 | 45.96 | 55.77 | 45.02 |
| RankGPT-4 | 85.71 | 87.49 | 90.45 | 87.88 |
| **DeAR-L-8B** | **92.86** | **88.04** | **92.01** | **90.97** |

🏆 **+3.09 points better than GPT-4 on NovelEval!**

### BEIR Benchmark

| Dataset | NDCG@10 |
|---------|---------|
| MS MARCO | 70.2 |
| NQ | 54.1 |
| HotpotQA | 64.5 |
| FiQA | 49.3 |
| ArguAna | 62.1 |
| SciFact | 76.2 |
| TREC-COVID | 88.4 |
| NFCorpus | 40.6 |
| **Average** | **46.8** |

### Efficiency Analysis

| Metric | Value |
|--------|-------|
| Inference Time (20 docs) | 11.16s |
| Throughput | ~1.8 docs/sec |
| GPU Memory (inference) | 22GB |
| Model Size (BF16) | 16GB |

**Comparison with Other Methods:**
- **2.2x faster** than RankGPT-4 (24.5s)
- **1.9x faster** than RankZephyr (21.6s)
- Similar performance with much better efficiency

## Advantages over Pointwise Models

| Aspect | Pointwise | Listwise (This Model) |
|--------|-----------|----------------------|
| Document Interaction | ❌ Independent | ✅ Considers relationships |
| Reasoning | ❌ None | ✅ Chain-of-Thought |
| Novel Queries | Good | ✅ **Excellent** (+3-5 NDCG@10) |
| Interpretability | ❌ Score only | ✅ Reasoning provided |
| Speed | ✅ Very Fast (2.2s) | Moderate (11.2s) |

## Model Architecture

```
Input: Listwise Prompt with Query + Multiple Documents
    ↓
LLaMA-3.1-8B Decoder
    ↓
Auto-regressive Generation
    ↓
Output: "[4] > [1] > [0] > [3] > [2]"
    ↓
Parse to Ranking: [4, 1, 0, 3, 2]
```

## When to Use This Model

**Best for:**
- ✅ Novel/complex queries requiring reasoning
- ✅ Tasks where interpretability matters
- ✅ Small candidate sets (<100 documents)
- ✅ Research and analysis applications

**Consider pointwise models for:**
- ❌ Large-scale reranking (1000s of docs)
- ❌ Real-time, low-latency applications
- ❌ When reasoning is not needed

## Limitations

1. **Inference Speed:** Slower than pointwise models (~5x)
2. **Document Count:** Limited by context length (~20-50 docs optimal)
3. **Parsing Errors:** May occasionally generate malformed rankings
4. **Cost:** Higher computational cost for generation
5. **Language:** English only

## Bias and Ethical Considerations

- **Position Bias:** May favor documents in certain positions
- **Training Data Bias:** Inherits biases from CoT annotations
- **Reasoning Artifacts:** Generated explanations may contain hallucinations
- **Fairness:** Should be evaluated for fairness in your domain

## Related Models

**DeAR Listwise:**
- [DeAR-8B-Listwise-LoRA](https://huggingface.co/abdoelsayed/dear-8b-reranker-listwise-lora-v1) - LoRA adapter version

**DeAR Pointwise (8B):**
- [DeAR-8B-RankNet](https://huggingface.co/abdoelsayed/dear-8b-reranker-ranknet-v1)
- [DeAR-8B-CE](https://huggingface.co/abdoelsayed/dear-8b-reranker-ce-v1)

**Resources:**
- [DeAR-COT Dataset](https://huggingface.co/datasets/abdoelsayed/DeAR-COT)
- [Teacher Model](https://huggingface.co/abdoelsayed/llama2-13b-rankllama-teacher)

## Citation

```bibtex
@article{abdallah2025dear,
  title={DeAR: Dual-Stage Document Reranking with Reasoning Agents via LLM Distillation},
  author={Abdallah, Abdelrahman and Mozafari, Jamshid and Piryani, Bhawna and Jatowt, Adam},
  journal={arXiv preprint arXiv:2508.16998},
  year={2025}
}
```

## License

MIT License

## More Information

- **GitHub:** [DataScienceUIBK/DeAR-Reranking](https://github.com/DataScienceUIBK/DeAR-Reranking)
- **Paper:** [arXiv:2508.16998](https://arxiv.org/abs/2508.16998)
- **Collection:** [DeAR Models](https://huggingface.co/collections/abdoelsayed/dear-reranking)