--- license: cc-by-sa-4.0 language: - de - en library_name: transformers tags: - text-classification - counseling - mental-health - psychosocial - multilingual - eurobert base_model: EuroBERT/EuroBERT-610m datasets: - th-nuernberg/OnCoCoV1 metrics: - accuracy - f1 model-index: - name: eurobert610m-online-counseling-oncoco results: - task: type: text-classification name: Multi-Class Text Classification dataset: name: OnCoCoV1 type: th-nuernberg/OnCoCoV1 metrics: - type: accuracy value: 0.76 name: Top-1 Accuracy - type: f1 value: 0.69 name: Top-1 Macro F1 - type: accuracy value: 0.84 name: Top-2 Accuracy - type: f1 value: 0.79 name: Top-2 Macro F1 --- # eurobert610m-online-counseling-oncoco Fine-tuned [EuroBERT/EuroBERT-610m](https://huggingface.co/EuroBERT/EuroBERT-610m) model for fine-grained message classification in psychosocial online counseling conversations. Trained on the [OnCoCo 1.0 dataset](https://huggingface.co/datasets/th-nuernberg/OnCoCoV1). > **Try it out:** [OnCoCo Message Classifier Space](https://huggingface.co/spaces/th-nuernberg/oncoco) ## Model Description This model classifies individual messages from online counseling conversations into one of **66 fine-grained categories** — 38 counselor and 28 client categories — covering communication acts such as empathic reflection, problem exploration, motivational interviewing techniques, resource activation, and emotional support. Messages are prefixed with the speaker role (`Counselor:` / `Client:` in English, `Berater:` / `Klient:` in German) to allow the model to resolve the role context. At inference time, logits for the other speaker's categories are masked so predictions always fall within the correct role-specific category set. The model was developed as part of the [OnCoCo project](https://huggingface.co/datasets/th-nuernberg/OnCoCoV1) at Technische Hochschule Nürnberg. The best model we trained on this dataset is [th-nuernberg/xlm-roberta-large-online-counseling-oncoco](https://huggingface.co/th-nuernberg/xlm-roberta-large-online-counseling-oncoco). ## Evaluation Results Evaluated on a held-out 20% test split of the OnCoCo 1.0 dataset (bilingual DE+EN): | Metric | Score | |---|---| | Top-1 Accuracy | 0.76 | | Top-1 Macro F1 | 0.69 | | Top-2 Accuracy | 0.84 | | Top-2 Macro F1 | 0.79 | ## Training Details - **Base model:** `EuroBERT/EuroBERT-610m` - **Dataset:** [th-nuernberg/OnCoCoV1](https://huggingface.co/datasets/th-nuernberg/OnCoCoV1) — 5,556 messages (2,778 DE + 2,778 EN translations), 66 categories - **Split:** 80/20 stratified train/test - **Languages:** German (original) and English (GPT-4o translated, manually verified) - **Role prefixes:** Messages are prefixed with `Counselor:` / `Client:` (EN) or `Berater:` / `Klient:` (DE) ## Usage ```python import torch import torch.nn.functional as F from transformers import AutoTokenizer, AutoModelForSequenceClassification model_id = "th-nuernberg/eurobert610m-online-counseling-oncoco" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForSequenceClassification.from_pretrained(model_id) model.eval() text = "Counselor: It sounds like you're feeling overwhelmed. Can you tell me more?" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) with torch.no_grad(): probs = F.softmax(model(**inputs).logits, dim=-1).squeeze() top3 = probs.argsort(descending=True)[:3] for i in top3: print(f"{model.config.id2label[i.item()]}: {probs[i].item():.4f}") ``` To resolve category codes to human-readable descriptions: ```python import json from huggingface_hub import hf_hub_download path = hf_hub_download("th-nuernberg/OnCoCoV1", "code_to_category.json", repo_type="dataset") with open(path) as f: code2cat = json.load(f) for i in top3: code = model.config.id2label[i.item()] print(f"{code} — {code2cat.get(code, '?')}: {probs[i].item():.4f}") ``` ## Category Taxonomy The 66 categories are organized hierarchically for both speaker roles: **Counselor (38 categories)** - Formalities (opening, closing) - Moderation - Impact factors: analysis & clarification of problems (13), objectives (2), motivation (4), resource activation (5), problem solving (8) - Other statements **Client (28 categories)** - Formalities (opening, closing) - Empathy expression (3) - Impact factors: problem analysis (8), objectives (2), motivation (2), resource activation (2), coping assistance (6) - Other statements Full label descriptions are available via the [`code_to_category.json`](https://huggingface.co/datasets/th-nuernberg/OnCoCoV1/blob/main/code_to_category.json) file in the dataset repository. ## Intended Use - Automated content analysis of online counseling conversations - Research on counselor–client communication patterns - Educational feedback tools for counselor training - Conversational AI research in the mental health domain ## Limitations - Performance varies across categories; rare categories with few training examples show lower F1 scores - Some semantically overlapping categories (e.g., problem statement vs. problem definition) are harder to distinguish - English texts are machine-translated from German; some translation artifacts may affect performance on native English counseling texts ## Citation If you use this model, please cite the OnCoCo paper: ```bibtex @inproceedings{albrecht-etal-2026-oncoco, title = "{O}n{C}o{C}o 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Conversations", author = "Albrecht, Jens and Lehmann, Robert and Poltermann, Aleksandra and Rudolph, Eric and Steigerwald, Philipp and Stieler, Mara", booktitle = "Proceedings of the Joint Workshop on Social Context (SoCon) and Integrating NLP and Psychology to Study Social Interactions (NLPSI) at LREC-COLING 2026", month = may, year = "2026", address = "Palma de Mallorca, Spain", publisher = "ELRA and ICCL", } ``` ArXiv preprint: [arXiv:2512.09804](https://arxiv.org/abs/2512.09804) ## License [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) — Technische Hochschule Nürnberg