cbcc / README.md
thiagochris's picture
v29NA serving — canonical manifest (model_kind=multi_task, trainer_id=multi_task_gist); self-contained merged Qwen3-4B encoder + sklearn heads (4 tasks)
b694609 verified
metadata
language:
  - en
  - es
  - pt
  - fr
  - ar
  - zh
license: cc-by-nc-4.0
library_name: sentence-transformers
pipeline_tag: text-classification
tags:
  - central-bank-communication
  - multi-dimensional-classification
  - multi_task_gist
model-index:
  - name: cbcc
    results:
      - task:
          type: text-classification
          name: topic classification
        dataset:
          name: CBC Held-out Eval
          type: private
        metrics:
          - type: f1_macro
            value: 0.8008
      - task:
          type: text-classification
          name: temporal_orientation classification
        dataset:
          name: CBC Held-out Eval
          type: private
        metrics:
          - type: f1_macro
            value: 0.8957
      - task:
          type: text-classification
          name: audience classification
        dataset:
          name: CBC Held-out Eval
          type: private
        metrics:
          - type: f1_macro
            value: 0.759
      - task:
          type: text-classification
          name: sentiment classification
        dataset:
          name: CBC Held-out Eval
          type: private
        metrics:
          - type: f1_macro
            value: 0.7511
base_model: Qwen/Qwen3-Embedding-4B

cbcc

Multi-dimensional classifier for central-bank communications produced by the CBCommunication training pipeline (multi_task_gist rung).

Provenance

Field Value
Trainer multi_task_gist
Model kind multi_task
Encoder body Qwen/Qwen3-Embedding-4B
Loss cached_gist
Taxonomy version 2026-04-rev2 (sha256 e7c237aac8db66ca)
Training examples 3584
Validation examples 1809
Git commit 9d90b862 (dirty)
Created 2026-04-28T02:41:30.780109+00:00

Dimensions and labels

topic (21 classes)

  • Climate change
  • Crisis management
  • Currency circulation and management
  • Financial inclusion
  • Financial stability
  • Fiscal policy
  • Governance
  • MP - balance sheet size and asset purchase programs
  • MP - credit
  • MP - economic activity
  • MP - exchange rate
  • MP - inflation
  • MP - interest rate
  • MP - labor market
  • MP - open market operations
  • MP - reserve requirements
  • Metadata
  • Payment system
  • Structural economic reform
  • Supervision and regulation
  • Technological innovation and fintech

temporal_orientation (2 classes)

  • Backward-looking
  • Forward-looking

audience (6 classes)

  • Business Sector
  • Financial Sector
  • General Public
  • Government
  • International Stakeholders
  • Metadata

sentiment (6 classes)

  • Confidence-building
  • Dovish
  • Hawkish
  • Neutral/Balanced
  • Not applicable
  • Risk-highlighting

Evaluation (held-out validation set)

Dimension Macro F1
topic 0.8008
temporal_orientation 0.8957
audience 0.7590
sentiment 0.7511

Intended use

Classify sentences from central-bank speeches, press releases, and financial-stability reports along the four CBC taxonomy dimensions (topic, temporal orientation, audience, sentiment). Produced for research and policy analysis at the IMF.

Limitations

  • Trained on a small labeled set; tail classes (low support) carry less reliable per-class metrics.
  • Multilingual coverage depends on the encoder; bge-m3 is competent in the official UN languages but performance varies.
  • Sentiment / temporal labels reflect the taxonomy decision rules in the source workbook; downstream consumers should re-read those rules before interpreting per-class deltas.

How to load

# Recommended (canonical port — Phase 1.1, 2026-04):
from cb_communication.processing.classification import load_named_classifier

# Auto-resolves the artefact via config/classifiers.toml. When the
# artefact is absent locally and ``hub_repo`` is configured, the loader
# pulls from this Hub repo automatically (private collaborators set
# their own ``HF_TOKEN``).
with load_named_classifier("cbcc") as clf:
    results = clf.classify_chunk([
        "Inflation expectations remain anchored at 2 percent.",
    ])

Alternative — explicit path-based load:

from huggingface_hub import snapshot_download
from cb_communication.processing.classification import load_classifier

local = snapshot_download(repo_id="thiagochris/cbcc")
with load_classifier(local) as clf:
    ...

Both call paths satisfy the canonical MultiTaskClassifier Protocol (see cb_communication/processing/classification/multi_task_classifier.py) — the runtime dispatches on model_kind from manifest.json.

Serving optimisations

This artefact has been optimised for inference (see serving_manifest.json). Production-relevant flags:

Flag Value
LoRA adapters merged yes
Weight dtype bf16
Head format sklearn_joblib
safetensors mmap yes
Optimised at 2026-05-02T20:53:23.465044+00:00