cbcc / README.md

v29NA serving — canonical manifest (model_kind=multi_task, trainer_id=multi_task_gist); self-contained merged Qwen3-4B encoder + sklearn heads (4 tasks)

b694609 verified 12 days ago

preview code

raw

history blame contribute delete

5.01 kB

metadata

language:
  - en
  - es
  - pt
  - fr
  - ar
  - zh
license: cc-by-nc-4.0
library_name: sentence-transformers
pipeline_tag: text-classification
tags:
  - central-bank-communication
  - multi-dimensional-classification
  - multi_task_gist
model-index:
  - name: cbcc
    results:
      - task:
          type: text-classification
          name: topic classification
        dataset:
          name: CBC Held-out Eval
          type: private
        metrics:
          - type: f1_macro
            value: 0.8008
      - task:
          type: text-classification
          name: temporal_orientation classification
        dataset:
          name: CBC Held-out Eval
          type: private
        metrics:
          - type: f1_macro
            value: 0.8957
      - task:
          type: text-classification
          name: audience classification
        dataset:
          name: CBC Held-out Eval
          type: private
        metrics:
          - type: f1_macro
            value: 0.759
      - task:
          type: text-classification
          name: sentiment classification
        dataset:
          name: CBC Held-out Eval
          type: private
        metrics:
          - type: f1_macro
            value: 0.7511
base_model: Qwen/Qwen3-Embedding-4B

cbcc

Multi-dimensional classifier for central-bank communications produced by the CBCommunication training pipeline (multi_task_gist rung).

Provenance

Field	Value
Trainer	`multi_task_gist`
Model kind	`multi_task`
Encoder body	`Qwen/Qwen3-Embedding-4B`
Loss	`cached_gist`
Taxonomy version	`2026-04-rev2` (sha256 `e7c237aac8db66ca`)
Training examples	3584
Validation examples	1809
Git commit	`9d90b862` (dirty)
Created	2026-04-28T02:41:30.780109+00:00

Dimensions and labels

`topic` (21 classes)

Climate change
Crisis management
Currency circulation and management
Financial inclusion
Financial stability
Fiscal policy
Governance
MP - balance sheet size and asset purchase programs
MP - credit
MP - economic activity
MP - exchange rate
MP - inflation
MP - interest rate
MP - labor market
MP - open market operations
MP - reserve requirements
Metadata
Payment system
Structural economic reform
Supervision and regulation
Technological innovation and fintech

`temporal_orientation` (2 classes)

Backward-looking
Forward-looking

`audience` (6 classes)

Business Sector
Financial Sector
General Public
Government
International Stakeholders
Metadata

`sentiment` (6 classes)

Confidence-building
Dovish
Hawkish
Neutral/Balanced
Not applicable
Risk-highlighting

Evaluation (held-out validation set)

Dimension	Macro F1
`topic`	0.8008
`temporal_orientation`	0.8957
`audience`	0.7590
`sentiment`	0.7511

Intended use

Classify sentences from central-bank speeches, press releases, and financial-stability reports along the four CBC taxonomy dimensions (topic, temporal orientation, audience, sentiment). Produced for research and policy analysis at the IMF.

Limitations

Trained on a small labeled set; tail classes (low support) carry less reliable per-class metrics.
Multilingual coverage depends on the encoder; bge-m3 is competent in the official UN languages but performance varies.
Sentiment / temporal labels reflect the taxonomy decision rules in the source workbook; downstream consumers should re-read those rules before interpreting per-class deltas.

How to load

# Recommended (canonical port — Phase 1.1, 2026-04):
from cb_communication.processing.classification import load_named_classifier

# Auto-resolves the artefact via config/classifiers.toml. When the
# artefact is absent locally and ``hub_repo`` is configured, the loader
# pulls from this Hub repo automatically (private collaborators set
# their own ``HF_TOKEN``).
with load_named_classifier("cbcc") as clf:
    results = clf.classify_chunk([
        "Inflation expectations remain anchored at 2 percent.",
    ])

Alternative — explicit path-based load:

from huggingface_hub import snapshot_download
from cb_communication.processing.classification import load_classifier

local = snapshot_download(repo_id="thiagochris/cbcc")
with load_classifier(local) as clf:
    ...

Both call paths satisfy the canonical MultiTaskClassifier Protocol (see cb_communication/processing/classification/multi_task_classifier.py) — the runtime dispatches on model_kind from manifest.json.

Serving optimisations

This artefact has been optimised for inference (see serving_manifest.json). Production-relevant flags:

Flag	Value
LoRA adapters merged	yes
Weight dtype	`bf16`
Head format	`sklearn_joblib`
safetensors mmap	yes
Optimised at	2026-05-02T20:53:23.465044+00:00