File size: 8,958 Bytes
ce71b02
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e13b10f
8521eb5
 
 
 
 
 
 
 
 
ce71b02
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
---
license: mit
language:
  - en
library_name: transformers
tags:
  - sentiment-analysis
  - literary-sentiment
  - roberta
  - text-classification
  - sentiment-arcs
datasets:
  - chcaa/fiction4sentiment
  - chcaa/Fiction4EmoBank
base_model: j-hartmann/sentiment-roberta-large-english-3-classes
pipeline_tag: text-classification
model-index:
  - name: sentiment-fiction-seq
    results:
      - task:
          type: text-classification
          name: Sentiment Analysis
        metrics:
          - name: Spearman ρ (Hemingway arc, detrended, vs. human)
            type: spearman_correlation
            value: 0.7812
          - name: Spearman ρ (Hemingway arc, raw, vs. human)
            type: spearman_correlation
            value: 0.7122
          - name: Spearman ρ (Ugly Duckling, detrended, vs. human)
            type: spearman_correlation
            value: 0.7414
---

# sentiment-fiction-seq

A RoBERTa-large model finetuned for 3-class sentiment classification (negative / neutral / positive) on literary and fictional text, with complete narrative sequences held out from training to enable evaluation of detrended sentiment arcs.

This is a variant of [fpianz/sentiment-fiction](https://huggingface.co/fpianz/sentiment-fiction). The two models share the same architecture, base model, and training procedure. They differ only in their training splits: this model excludes complete sequential texts (three Andersen fairy tales and the final section of Hemingway's *The Old Man and the Sea*) to allow uncontaminated evaluation of narrative arc dynamics. Users should validate both models on their own data to determine which best fits their use case.

## Model description

This model is a finetuned version of [j-hartmann/sentiment-roberta-large-english-3-classes](https://huggingface.co/j-hartmann/sentiment-roberta-large-english-3-classes) (RoBERTa-large, 355M parameters). It was trained on a combined corpus of human-annotated fiction sentences using class-weighted cross-entropy loss to handle label imbalance.

### Training data

Only human-annotated texts. Compared to `sentiment-fiction`, this model excludes all Andersen fairy tale sentences and 400 contiguous Hemingway sentences from training.

| Source | n (train) | Label type |
|--------|-----------|------------|
| Project Gutenberg and Wattpad excerpts | 6,646 | Nine emotions labels → binned to 3 classes |
| EmoBank Fiction (American National Corpus) | 2,164 | Continuous valence → binned to 3 classes |
| Fiction4 Hymns (translated from Danish) | 1,620 | Continuous valence → binned to 3 classes |
| Fiction4 Poetry (Plath) | 1,263 | Continuous valence → binned to 3 classes |
| Hemingway — *The Old Man and the Sea* (first 1,236 sentences) | 1,236 | Continuous 1–10 valence → binned to 3 classes |
| **Total** | **12,929** | |

Continuous valence scores were binned using the thresholds: ≤4 → negative, (4, 6] → neutral, >6 → positive on a 0–10 scale.

### Intended use

This model is intended for research on literary sentiment, narrative emotion arcs, and computational literary studies. It can be used for:

- Sentence-level sentiment classification of fiction and literary prose
- Generating continuous sentiment arcs by converting class probabilities to a valence score: `valence = p(positive) - p(negative)`
- Studying detrended sentiment dynamics in sequential narrative text

## Evaluation

### Sentence-level (raw) correlation

Spearman ρ between model-predicted continuous valence and human annotations, on sequential held-out texts.
Continuous valence for correlation is computed as `p(positive) − p(negative)` from the model's softmax probabilities, yielding a score in approximately [−1, +1] rather than a discrete class label.
Accuracy is computed on the 3-class prediction (argmax over negative/neutral/positive) against human valence binned with the same thresholds used for training (≤4 → negative, (4, 6] → neutral, >6 → positive).
Note that literary texts are heavily neutral-skewed, where always predicting "neutral" would do better. For this reason, the continuous valence correlation (Spearman ρ) is the more meaningful metric here.

| Eval set | n | Spearman ρ (Tr) | Spearman ρ (Sy) | Accuracy | Majority Baseline |
|----------|---|----------------|----------------|---------|---------|
| Hemingway — *The Old Man and the Sea* | 400 | **0.712** | 0.465 | 0.818 | 0.688 |
| Andersen — *The Ugly Duckling* | 211 | **0.600** | 0.469 | 0.668 | 0.692 |
| Andersen — *The Little Mermaid* | 293 | **0.654** | 0.523 | 0.614 | 0.474 |
| Andersen — *The Shadow* | 267 | **0.734** | 0.456 | 0.704 | 0.742 |

Tr = Transformer (this model), Sy = Syuzhet lexicon baseline (Jockers, 2015).

### Detrended arc correlation

Detrending follows Hu et al. (2021): the sentiment arc is integrated into a random walk, a nonlinear adaptive filter extracts the global trend, and the residuals capture local narrative dynamics. Spearman ρ is computed between the detrended model arc and the detrended human annotation arc, at window size L/8.

| Eval set | n | Raw Spearman ρ (Tr) | Detrended Spearman ρ (Tr) | Δ (Tr) | Raw Spearman ρ (Sy) |Detrended Spearman ρ (Sy) |
|----------|---|-----------|-----------------|---|-----|------------|
| Hemingway | 400 | 0.712 | **0.781** | +0.069 | 0.465 | 0.335 |
| *The Ugly Duckling* | 211 | 0.600 | **0.741** | +0.141 | 0.469 | 0.584 |
| *The Little Mermaid* | 293 | 0.654 | **0.754** | +0.100 | 0.523 | 0.624 |
| *The Shadow* | 267 | 0.734 | **0.796** | +0.062 | 0.456 | 0.657 |

Detrending consistently improves the transformer's correlation with human annotations, indicating that the model captures arc-level narrative dynamics beyond sentence-level sentiment. The Hemingway inter-annotator agreement (Spearman ρ between two human annotators) is 0.613 on this subset.

## Usage

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="fpianz/sentiment-fiction-seq")
result = classifier("The old man was thin and gaunt with deep wrinkles in the back of his neck.")
print(result)
# [{'label': 'negative', 'score': 0.82}]
```

For continuous sentiment arcs:

```python
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("fpianz/sentiment-fiction-seq")
model = AutoModelForSequenceClassification.from_pretrained("fpianz/sentiment-fiction-seq")

def valence(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1)[0]
    return (probs[2] - probs[0]).item()  # p(positive) - p(negative)

score = valence("He was an old man who fished alone in a skiff in the Gulf Stream.")
print(f"Valence: {score:.3f}")  # range approx [-1, +1]
```

## Training details

- **Base model:** j-hartmann/sentiment-roberta-large-english-3-classes
- **Architecture:** RoBERTa-large (355M parameters)
- **Loss:** Class-weighted cross-entropy (weights: negative=0.99, neutral=0.74, positive=1.56)
- **Epochs:** 5 (with early stopping, patience=3)
- **Learning rate:** 2e-5
- **Batch size:** 16
- **Max sequence length:** 512
- **Optimizer:** AdamW (weight decay=0.01, warmup ratio=0.1)
- **Precision:** FP16
- **Hardware:** NVIDIA A100 (University of Groningen Habrok HPC)

## Limitations

- The detrended arc evaluation is limited to three Andersen fairy tales (translated from Danish) and one section of a Hemingway novella. These results may not generalize to other genres, periods, or languages.
- Fiction4 texts are Google-translated from Danish (Feldkamp et al., 2024); translation artifacts may affect evaluation scores for the fairy tales.
- The 3-class label scheme (negative/neutral/positive) collapses the valence spectrum. The continuous valence conversion (`p(pos) - p(neg)`) provides finer granularity but is an approximation.
- This model has slightly less training data than `sentiment-fiction` (12,929 vs. 13,864 sentences). For sentence-level classification where arc evaluation is not needed, `sentiment-fiction` may be preferable.

## References

- [Sentiment Below the Surface: Omissive and Evocative Strategies in Literature and Beyond](https://ceur-ws.org/Vol-3834/paper98.pdf) (Feldkamp et al., CHR 2024)
- [DENS: A Dataset for Multi-class Emotion Analysis](https://aclanthology.org/D19-1656/) (Liu et al., EMNLP-IJCNLP 2019)
- [Comparing Tools for Sentiment Analysis of Danish Literature from Hymns to Fairy Tales: Low-Resource Language and Domain Challenges](https://aclanthology.org/2024.wassa-1.15/) (Feldkamp et al., WASSA 2024)
- [Dynamic evolution of sentiments in *Never Let Me Go*: Insights from multifractal theory and its implications for literary analysis](https://doi.org/10.1093/llc/fqz092) (Hu et al., DSH 2021)

## Citation

*Paper under review — citation will be added upon publication.*