MentalRoBERTa fine-tuned for WhatsApp depression detection

A binary text classifier that flags short, informal messages (WhatsApp, SMS, or similar) as exhibiting early signs of depression or not. Built by fine-tuning mental/mental-roberta-base on the synthetic WhatsApp-style corpus in 5ald/whatsapp-depression-synthetic.

Model summary

  • Task: binary text classification
  • Labels: LABEL_1 = depressive, LABEL_0 = non-depressive
  • Base model: mental/mental-roberta-base (RoBERTa-base continued-pre-trained on 13.6M sentences from seven mental-health subreddits)
  • Language: English
  • Max sequence length: 512 tokens
  • Framework: PyTorch + Hugging Face transformers

Why this model

Public mental-health NLP models are typically trained on long-form Reddit posts. WhatsApp and SMS messages are short, fragmented, emoji-laden, and stylistically very different, so models trained on Reddit transfer poorly without adaptation. This model addresses the register gap by fine-tuning a mental-health-pre-trained RoBERTa on a purpose-built synthetic corpus in the target style.

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

model_id = "5ald/mental-roberta-whatsapp-depression"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

classifier = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
    truncation=True,
    max_length=512,
)

print(classifier("i feel so hopeless lately, nothing is working"))
# [{'label': 'LABEL_1', 'score': 0.93}]

The score is the probability the model assigns to the predicted class. To recover the depression probability specifically:

p_depression = r["score"] if r["label"] == "LABEL_1" else 1 - r["score"]

Training data

Trained on the synthetic subset of 5ald/whatsapp-depression-synthetic:

Split Messages Depressive (1) Non-depressive (0)
train 4,000 2,000 2,000
validation 1,000 500 500
test 1,000 500 500

All messages were generated by claude-opus-4-6 via the Anthropic Batch Messages API using two prompts that target realistic, first-person, casual messaging in the WhatsApp/SMS register. See the dataset card for the full generation procedure.

Training procedure

Fine-tuned with the Hugging Face Trainer API using the following configuration:

Hyperparameter Value
Base model mental/mental-roberta-base
Number of labels 2
Epochs 3
Batch size 24
Learning rate 2e-5
Max sequence length 512
Precision BF16
Training steps 500
Best-model selection Best loss at end of epoch
Final training loss 0.034

The classification head (a dense layer plus 2-way output projection) was randomly initialised and trained from scratch alongside the transformer encoder, since the base MentalRoBERTa checkpoint does not ship with a classification head.

Evaluation

Evaluated on two held-out sets from 5ald/whatsapp-depression-synthetic: the 1,000-message synthetic test split and the 55-message real test split.

Synthetic test set (1,000 messages)

The model achieves near-ceiling performance on the synthetic test set, reflecting that train and test come from the same generative distribution. This is a memorisation-of-distribution result, not a generalisation claim.

Real test set (55 messages)

Performance drops on naturalistic WhatsApp messages written by a real user, quantifying the synthetic-to-real domain gap. This is the more honest indicator of practical usefulness and is reported explicitly in the accompanying project report.

(Exact numbers are reported in the project's Evaluation chapter. See the citation at the bottom of this card.)

Intended use

  • Research on text-based depression detection in informal conversational registers.
  • Fine-tuning baselines and domain-transfer experiments from public mental-health corpora to private messaging.
  • Educational and demonstrative use.

Out-of-scope use

  • Not a diagnostic tool. The model outputs a probability over two stylistic classes inferred from synthetic text; it has no clinical validation and must not be used to screen, diagnose, triage, or surveil individuals.
  • Not for deployment on anyone's messages without their informed consent. Using this model on conversations without every sender's knowledge is a privacy violation regardless of the technical capability.
  • Not suitable for crisis detection or intervention. Output from this model must not be used as a signal in automated systems that contact, flag, or act on people in distress.
  • English only. Performance on other languages is undefined.

Limitations

  • Synthetic training data bias. The classifier was trained on messages generated by a single LLM (claude-opus-4-6) with two prompts. It may have learned the stylistic fingerprint of an LLM imitating depression rather than the full diversity of genuine depressive expression.
  • Domain gap to real messages. Performance on naturalistic WhatsApp text is substantially lower than on synthetic text. The synthetic-to-real gap is documented in the accompanying project report.
  • Single-annotator real evaluation. The real test set is small (55 messages) and reflects one annotator's labelling, so it is directional rather than a statistically robust benchmark.
  • No demographic, dialectal, or clinical-population diversity is controlled for.
  • Short-message assumption. The model was trained on short informal messages; behaviour on long-form text is not characterised.

Ethical considerations

Automated depression detection on private messages raises serious consent, privacy, and potential harm concerns. Users of this model should:

  • Only run it on messages from participants who have given informed consent.
  • Never use its output as a substitute for clinical judgement.
  • Never deploy it in ways that could pathologise, surveil, or disadvantage individuals.
  • Follow local data-protection law (e.g. UK GDPR) when processing any real message data.

The accompanying system was designed to run entirely locally, with no external data transmission, precisely to keep these concerns tractable for end users.

Citation

If you use this model, please cite the accompanying project:

Khaled Al Buainain. Detection of Early Depression Indicators in WhatsApp Messages.
Final Year Individual Project, King's College London, 2026.

Please also cite the base model:

Ji, S., Zhang, T., Ansari, L., Fu, J., Tiwari, P., and Cambria, E. (2022).
MentalBERT: Publicly Available Pretrained Language Models for Mental Healthcare.
Proceedings of LREC 2022.

License

Released under CC BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0 International). Commercial use is not permitted. Use is additionally subject to the licensing of the base model mental/mental-roberta-base and Anthropic's terms for content produced by their models that shaped the training corpus.

Downloads last month
5
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for 5ald/mental-roberta-whatsapp-depression

Finetuned
(17)
this model

Dataset used to train 5ald/mental-roberta-whatsapp-depression