---
language:
- pt
license: cc-by-nc-nd-4.0
tags:
- text-segmentation
- topic-segmentation
- bert
- next-sentence-prediction
- document-segmentation
- meeting-minutes
library_name: transformers
base_model:
- neuralmind/bert-base-portuguese-cased
---

# NSP-CouncilSeg: Linear Text Segmentation for Municipal Meeting Minutes

## Model Description

**NSP-CouncilSeg** is a fine-tuned BERT model specialized in Text Segmentation for municipal council meeting minutes. The model uses Next Sentence Prediction (NSP) to identify topic boundaries in long-form documents, making it particularly effective for segmenting administrative and governmental meeting minutes.

**Try out the model**: [Hugging Face Space Demo](https://huggingface.co/spaces/anonymous15135/nsp-councilseg-demo)

### Key Features

- 🎯 **Specialized for Meeting Minutes**: Fine-tuned on Portuguese municipal council meeting minutes
- ⚡ **Fast Inference**: Efficient BERT-base architecture for real-time segmentation
- 📊 **High Accuracy**: Achieves BED F-measure score of 0.79 on CouncilSeg dataset
- 🔄 **Sentence-Level Segmentation**: Identifies topic boundaries at sentence granularity

## Model Details

- **Base Model**: `neuralmind/bert-base-portuguese-cased`
- **Architecture**: BERT with Next Sentence Prediction head
- **Parameters**: 110M
- **Max Sequence Length**: 512 tokens
- **Fine-tuning Dataset**: CouncilSeg (Portuguese Municipal Meeting Minutes)
- **Fine-tuning Method**: Focal Loss with boundary-aware weighting
- **Training Framework**: PyTorch + Transformers

## How It Works

The model predicts whether two consecutive sentences belong to the same topic (label 0: "is_next") or represent a topic transition (label 1: "not_next"). By applying this classifier sequentially across all sentence pairs in a document, it identifies topic boundaries.

```python
Sentence A: "Pelo Senhor Presidente foi presente a reunião a ata n.º 28 de 20.12.2023."
Sentence B: "Ponderado e analisado o assunto o Executivo Municipal deliberou por unanimidade aprovar a ata n.º 28 de 20.12.2023."
→ Prediction: Same Topic (confidence: 76%)

Sentence A: "Ponderado e analisado o assunto o Executivo Municipal deliberou por unanimidade aprovar a ata n.º 28 de 20.12.2023."
Sentence B: "Não houve processos e requerimentos diversos a apresentar."
→ Prediction: Topic Boundary (confidence: 82%)
```

## Usage

### Quick Start with Transformers

```python
from transformers import AutoTokenizer, AutoModelForNextSentencePrediction
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("anonymous15135/nsp-councilseg")
model = AutoModelForNextSentencePrediction.from_pretrained("anonymous15135/nsp-councilseg")

# Prepare input
sentence_a = "Pelo Senhor Presidente foi presente a reunião a ata n.º 28 de 20.12.2023."
sentence_b = "Ponderado e analisado o assunto o Executivo Municipal deliberou por unanimidade aprovar a ata n.º 28 de 20.12.2023."


# Tokenize
inputs = tokenizer(sentence_a, sentence_b, return_tensors="pt")

# Predict
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    probs = torch.softmax(logits, dim=1)
    
# Interpret results
is_next_prob = probs[0][0].item()
not_next_prob = probs[0][1].item()

print(f"Is Next (same topic): {is_next_prob:.3f}")
print(f"Not Next (topic boundary): {not_next_prob:.3f}")

if not_next_prob > 0.5:
    print("🔴 Topic boundary detected!")
else:
    print("🟢 Same topic continues")
```


## Limitations

- **Domain Specificity**: Best performance on administrative/governmental meeting minutes
- **Language**: Optimized for Portuguese; English performance may vary
- **Document Length**: Designed for documents with 10-50 segments
- **Context Window**: Limited to 512 tokens per sentence pair
- **Ambiguous Boundaries**: May struggle with subtle topic transitions

## Model Card Contact

For questions or feedback, please open an issue in the [model repository](https://huggingface.co/anonymous15135/nsp-councilseg/discussions).

## License

This model is released under the Attribution-NonCommercial-NoDerivatives 4.0 International