Spaces:

ngocdang83
/

HachimiMT-demo

Running

App Files Files Community

ngocdang83 commited on 13 days ago

Commit

b97e373

verified ·

1 Parent(s): f3dac7a

Add Hirashiba CT2 model options

Browse files

Files changed (5) hide show

README.md +210 -34
hachimimt-local.zip +2 -2
requirements.txt +5 -0
src/test_pronoun_harmonizer_v9.py +4 -1
src/translator.py +156 -8

README.md CHANGED Viewed

@@ -1,34 +1,210 @@
----
-title: HachimiMT — Dịch Trung Việt
-emoji: 📜
-colorFrom: red
-colorTo: yellow
-sdk: gradio
-sdk_version: 6.18.0
-app_file: app.py
-pinned: false
-short_description: Dịch truyện Trung → Việt
----
-# HachimiMT — Dịch Trung → Việt
-Công cụ dịch truyện tiếng Trung sang tiếng Việt bằng các model MarianMT (CTranslate2 INT8):
-- [HachimiMT-60-zh-vi](https://huggingface.co/ngocdang83/HachimiMT-60-zh-vi) (mặc định)
-- [HachimiMT-30-zh-vi](https://huggingface.co/ngocdang83/HachimiMT-30-zh-vi) (nhẹ ~35 MB)
-- [MoxhiMT-60](https://huggingface.co/DanVP/MoxhiMT-60) · [MoxhiMT-30](https://huggingface.co/DanVP/MoxhiMT-30)
-## Tính năng
-- Dịch văn bản trực tiếp + **đối chiếu song song** theo từng câu/đoạn, hoặc dịch file `.txt`.
-- **Chọn model** (HachimiMT / MoxhiMT, bản 60 hoặc 30) — model tự tải từ Hugging Face khi
-  chọn lần đầu (lazy), chạy CPU.
-- **Chuẩn hóa chữ Hán** phồn → giản trước khi dịch (model train trên giản thể).
-- **Tuỳ chọn chuẩn hóa xưng hô** (mục nâng cao, mặc định tắt) — ép xưng hô về Hán-Việt theo
-  từ tường minh trong nguồn:
-  - **Thân tộc**: `chị → tỷ`, `anh trai → ca ca`, `chị em → tỷ muội`… khi nguồn có 姐姐/哥哥/姐妹…
-  - **Đại từ**: `cậu → ngươi`, `cô ấy → nàng`, `tôi → ta` — chỉ áp ở văn cổ trang/tu tiên.
-  - **Ổn định ngôi hiện đại**: chỉnh ngôi theo ngữ cảnh (thầy/em, mẹ/con…) cho truyện hiện đại.
-> Space chạy **CPU** (CTranslate2 INT8). Văn bản dài sẽ chậm hơn máy có GPU; chia **theo câu**
-> giúp giảm trôi tên riêng.

+---
+title: HachimiMT — Dịch Trung Việt
+emoji: 📜
+colorFrom: red
+colorTo: yellow
+sdk: gradio
+sdk_version: 6.18.0
+app_file: app.py
+pinned: false
+short_description: Dịch truyện Trung → Việt
+---
+# HachimiMT — Dịch Trung Việt
+Công cụ dịch truyện tiếng Trung sang tiếng Việt, dùng các model MarianMT trên Hugging Face:
+- [HachimiMT-60-zh-vi](https://huggingface.co/ngocdang83/HachimiMT-60-zh-vi)
+- [HachimiMT-30-zh-vi](https://huggingface.co/ngocdang83/HachimiMT-30-zh-vi)
+- [MoxhiMT-60](https://huggingface.co/DanVP/MoxhiMT-60)
+- [MoxhiMT-30](https://huggingface.co/DanVP/MoxhiMT-30)
+- [HirashibaMT-Medium](https://huggingface.co/Moleys/hirashiba-mt-medium) (CT2 mirror: [ngungodan/hirashiba-mt-medium-ct2](https://huggingface.co/ngungodan/hirashiba-mt-medium-ct2))
+- [HirashibaMT-Tiny](https://huggingface.co/chi-vi/hirashiba-mt-tiny-zh-vi) (CT2 mirror: [ngungodan/hirashiba-mt-tiny-zh-vi-ct2](https://huggingface.co/ngungodan/hirashiba-mt-tiny-zh-vi-ct2))
+> Có script sẵn cho **Windows** (`setup.bat` / `start.bat`) và **macOS** (`setup_macos.sh` / `start_macos.sh`). Linux có thể dùng gần giống macOS.
+## Tính năng
+- Chọn model dịch (HachimiMT / MoxhiMT / HirashibaMT)
+- Dịch văn bản trực tiếp với **đối chiếu song song** theo từng câu/đoạn
+- Dịch file `.txt` và tải bản dịch
+- **CTranslate2 INT8** (mặc định) — nhanh hơn nhiều so với PyTorch, hỗ trợ batch inference
+- Tuỳ chọn hậu kỳ xưng hô:
+  - Hán-Việt hoá thân tộc/đại từ theo nguồn (`tỷ/muội/ca ca`, `ngươi/hắn/nàng/ta`)
+  - Ổn định ngôi xưng hiện đại V9 beta (`thầy/em`, `mẹ/con`, `anh/em`, phỏng vấn/nhà trường, giữ `ta/ngươi` ở đoạn tiên hiệp/độc thoại)
+- **Tự nhận cấu hình máy**: CPU/GPU, số luồng, VRAM → tự chọn batch/thread phù hợp. Máy yếu hay mạnh đều chạy được, chỉ khác tốc độ.
+- Tự nhận GPU NVIDIA khi có sẵn thư viện CUDA (qua torch CUDA); nếu không thì chạy CPU an toàn, không crash. Xem mục "Chạy bằng GPU NVIDIA".
+## Về model (tải từ Hugging Face)
+Repo **không kèm sẵn model** (quá nặng cho git). Model tự tải từ Hugging Face vào thư mục `models/` **theo nhu cầu** (lazy):
+- `setup.py` chỉ tải sẵn **1 model mặc định** (HachimiMT-60, bản CT2 ~57 MB) để dùng được ngay.
+- Các model khác trong danh sách **chỉ tải khi bạn chọn dịch lần đầu** — nên thêm bao nhiêu model vào app cũng không làm setup nặng thêm.
+- Trên giao diện, mỗi model có **badge trạng thái**: "✓ Đã tải — dịch được ngay" hoặc "⬇ Chưa có (~XX MB) — sẽ tự tải...". Khi bấm Dịch model chưa có, thanh tiến trình báo rõ "Đang tải … từ Hugging Face".
+- Mỗi model tải **một lần duy nhất**; lần sau dùng lại từ `models/`, không tải lại.
+- Đổi sang **engine PyTorch** thì cần cài thêm nhóm dependency PyTorch; app sẽ tự tải weights PyTorch (nặng hơn) ở lần dịch đầu với engine đó.
+- Cần **internet** khi tải model lần đầu. Sau đó chạy offline được.
+> Mẹo: muốn mang sang máy không có internet thì copy luôn thư mục `models/` đã tải sang — các model đã tải sẽ chạy ngay.
+### Thêm model mới vào app
+Mở `src/translator.py`, thêm một mục vào dict `MODELS` (xem 2 model có sẵn làm mẫu). Lưu ý:
+- `model_id`: đường dẫn repo HF (vd `DanVP/MoxhiMT-60`).
+- `ct2_model_id`: chỉ cần khai khi bản CT2 nằm ở repo khác với model gốc.
+- `ct2_subdir`: tên thư mục chứa bản CT2 trên repo — **mặc định `ct2-int8_float32`**; nếu repo dùng tên khác (vd `ct2-int8`) thì khai lại.
+- `ct2_size_mb`: dung lượng bản CT2 (để badge hiện "~XX MB"); để trống cũng được.
+- `use_marian_class`: `True` nếu config.json của model ghi `MarianMTModel`.
+Không cần sửa gì thêm — model mới sẽ tự xuất hiện trong dropdown và lazy-download khi dùng.
+## Cài đặt (Windows)
+**Cách nhanh:** chạy `setup.bat` rồi `start.bat`. Script sẽ cài các thư viện mặc định cho CTranslate2, tải model CT2 mặc định, và mở trình duyệt. Mặc định **không cài torch**.
+**Cách thủ công** (nếu muốn dùng virtual environment — khuyến khích):
+```bat
+cd D:\Projects\qt2
+python -m venv .venv
+.venv\Scripts\activate
+:: 1) Cài phần mặc định: CTranslate2 + Gradio + tokenizer tooling
+pip install -r requirements.txt
+:: 2) Tải model lần đầu (~57 MB — chỉ model CT2 mặc định, một lần duy nhất)
+cd src
+python setup.py
+```
+### Chạy bằng GPU NVIDIA (nhanh hơn nhiều)
+Engine mặc định là **CTranslate2** và chạy CPU. Để CTranslate2 dùng GPU NVIDIA, máy cần có **thư viện CUDA (cuBLAS/cuDNN)** — thường đi kèm khi bạn cài bản **torch CUDA**. Sau đó app tự nhận GPU, không cần đổi gì khác. GPU nhanh hơn CPU **nhiều lần** với văn bản dài (đo thực tế: gấp ~16× khi dịch hàng nghìn câu).
+**Cách dễ nhất — nút ngay trong app (không cần terminal):** nếu máy có GPU NVIDIA mà app đang chạy CPU, giao diện sẽ hiện khối **"⚡ Phát hiện GPU… Cài torch để bật GPU"**. Bấm nút đó, app tự chọn đúng bản torch CUDA theo driver và cài (tải ~2–3 GB, cần ~5 GB ổ trống, một lần). App **tự kiểm tra** cả torch CUDA lẫn CTranslate2 CUDA sau khi cài; xong thì **tắt và mở lại app** (`stop.bat` rồi `start.bat`) là chạy GPU. Nếu thích cài tay thì xem lệnh bên dưới.
+> ⚠️ **Lưu ý quan trọng (lỗi `cublas64_12.dll not found`):** CTranslate2 tự dò GPU NVIDIA **độc lập với torch**. Nếu máy **có GPU NVIDIA nhưng chưa cài torch CUDA**, CT2 sẽ cố dùng GPU rồi crash vì thiếu `cublas64_12.dll` (DLL này do torch CUDA cung cấp). Để tránh, app **tự động ép CT2 chạy CPU** khi không tìm thấy torch CUDA — nên mặc định bạn sẽ **không** gặp lỗi này, chỉ là chạy CPU.
+>
+> - Muốn GPU: cài **torch CUDA** (lệnh bên dưới) → app tự bật GPU.
+> - Đã có cuBLAS sẵn (vd cài CUDA Toolkit riêng) và muốn ép CT2 dùng GPU dù không có torch: đặt biến môi trường `HACHIMIMT_FORCE_CT2_CUDA=1` trước khi chạy (tự chịu trách nhiệm nếu thiếu DLL).
+Torch chỉ cần khi bạn muốn GPU cho CT2 (như trên) hoặc đổi sang backend **PyTorch** thử nghiệm. Backend PyTorch nặng hơn và thường chậm hơn CT2. Nếu muốn dùng PyTorch:
+```bat
+:: CPU-only PyTorch
+pip install torch --index-url https://download.pytorch.org/whl/cpu
+pip install -r requirements-pytorch.txt
+:: hoặc PyTorch CUDA cho NVIDIA, chọn đúng bản tại pytorch.org
+pip install torch --index-url https://download.pytorch.org/whl/cu128
+pip install -r requirements-pytorch.txt
+```
+Lệnh chính xác cho từng cấu hình lấy tại <https://pytorch.org/get-started/locally/>.
+## Cài đặt (macOS)
+Yêu cầu Python 3.10+. Trên macOS, app chạy mặc định bằng **CTranslate2 CPU**. CTranslate2 hiện không dùng Apple GPU/Metal/MPS, nên Mac M-series vẫn chạy bằng CPU.
+```bash
+cd /path/to/qt2
+chmod +x setup_macos.sh start_macos.sh stop_macos.sh
+./setup_macos.sh
+./start_macos.sh
+```
+Script macOS tạo `.venv`, cài dependencies mặc định **không có torch**, tải model CT2 mặc định, rồi chạy Gradio tại `http://127.0.0.1:7860`.
+Nếu muốn thử backend PyTorch trên macOS, cài thêm torch trong venv:
+```bash
+source .venv/bin/activate
+pip install torch
+pip install -r requirements-pytorch.txt
+```
+Lưu ý: backend PyTorch hiện chưa được tối ưu riêng cho MPS/Apple GPU trong app này.
+## Chạy
+```bat
+start.bat
+```
+hoặc thủ công:
+```bat
+cd src
+python app.py
+```
+Mở trình duyệt tại `http://127.0.0.1:7860` (app chỉ chạy nội bộ trên máy, không mở ra mạng).
+## Gợi ý sử dụng
+- **Engine CTranslate2** (mặc định): bản INT8 có sẵn trên HF, dịch batch nhiều chunk cùng lúc.
+- **Engine PyTorch**: tùy chọn thử nghiệm, cần cài thêm `torch` và `requirements-pytorch.txt`; thường chậm hơn CT2.
+- **HachimiMT-30**: nhẹ nhất (~35 MB), phù hợp thử nghiệm hoặc máy yếu.
+- **HirashibaMT-Medium**: model tham khảo không phải do repo này train; CT2 INT8 nằm ở repo phụ `ngungodan/hirashiba-mt-medium-ct2`.
+- **HirashibaMT-Tiny**: model tham khảo rất nhẹ; CT2 phải dùng bản `ct2-int8-keeppad` vì stock converter không giữ đúng `<pad>` decoder-start của Marian tiny. App đặt beam mặc định 1 để tránh lặp nhẹ ở beam cao.
+- Chia **theo câu** giúp giảm drift tên riêng trên văn bản dài (theo khuyến nghị của tác giả model).
+- Model được train trên **giản thể** (简体). App mặc định chuyển **phồn thể → giản thể** trước khi dịch; có thể đổi về "Giữ nguyên" trong phần cấu hình.
+- `Chuẩn hóa xưng hô thân tộc` xử lý lỗi kiểu `chị em` ↔ `tỷ muội` khi nguồn có `姐妹/姐姐/哥哥...`.
+- `Ổn định ngôi xưng hiện đại V9 (beta)` là option riêng, tắt mặc định; dùng khi chương hiện đại bị nhảy `ta/tôi`, `ngươi/cậu`, `thầy/em`, `mẹ/con`, `anh/em`. V9 ưu tiên quan hệ có tín hiệu rõ như phỏng vấn/nhà trường, giáo viên-học sinh, mẹ-con, anh-em, khoản vay; các đoạn tiên hiệp/độc thoại vẫn có thể giữ `ta/ngươi` để tránh sửa quá tay.
+- Route hậu kỳ dùng quyết định cấp chương: route hiện đại do V9 sở hữu đại từ, route cổ trang do normalizer Hán-Việt sở hữu đại từ, route mixed/unknown sẽ guard để tránh tự sửa sai.
+- File `.txt` đầu vào hỗ trợ UTF-8/UTF-8 BOM/GB18030/GBK/Big5; bản dịch xuất ra UTF-8.
+## Cấu hình tốc độ (tuỳ chọn)
+App tự nhận CPU/GPU và chọn batch/thread theo máy. Có thể override bằng biến môi trường trước khi chạy:
+```bat
+set HACHIMIMT_BATCH_SIZE=72
+set HACHIMIMT_THREADS=12
+set HACHIMIMT_TOKENIZE_WORKERS=16
+set HACHIMIMT_TOKENIZE_JOB_SIZE=32
+set HACHIMIMT_CT2_WINDOW_MULTIPLIER=4
+set HACHIMIMT_CT2_BATCH_TYPE=tokens
+set HACHIMIMT_INTER_THREADS=1
+set HACHIMIMT_COMPUTE_TYPE=int8_float16
+set HACHIMIMT_PROGRESS_SECONDS=0.5
+start.bat
+```
+- `HACHIMIMT_COMPUTE_TYPE`: mặc định GPU dùng `int8_float16`, CPU dùng `int8_float32`.
+- `HACHIMIMT_BATCH_SIZE`: tăng nếu GPU còn rảnh VRAM; giảm nếu dịch lỗi/OOM.
+- `HACHIMIMT_THREADS`: ngân sách thread CT2 tổng; khi có nhiều CT2 worker/replica, app tự chia `intra_threads` để tránh oversubscribe CPU.
+- `HACHIMIMT_TOKENIZE_WORKERS`: tăng khi CPU còn rảnh và GPU chờ dữ liệu.
+- `HACHIMIMT_TOKENIZE_JOB_SIZE`: số chunk mỗi tokenizer job; mặc định `32`.
+- `HACHIMIMT_CT2_WINDOW_MULTIPLIER`: số batch đưa vào mỗi lần gọi CT2; mặc định `4`. Khi dùng multi-GPU + `batch_type=tokens`, app tự tăng window hiệu dụng để CT2 có đủ sub-batch chia việc cho các GPU.
+- `HACHIMIMT_CT2_BATCH_TYPE`: mặc định `tokens`; đổi về `examples` nếu cần so sánh/fallback.
+- `HACHIMIMT_INTER_THREADS`: mặc định `1`; chỉ tăng nếu benchmark chứng minh nhanh hơn.
+- `HACHIMIMT_GPU_INDICES`: chọn GPU CT2, ví dụ `0`, `1`, hoặc `0,1`. Nếu không đặt, local PC mặc định dùng GPU `0`; Kaggle/Colab tự dùng toàn bộ GPU.
+- `HACHIMIMT_AUTO_ALL_GPUS=1`: ép auto dùng toàn bộ GPU khi không đặt `HACHIMIMT_GPU_INDICES`; hữu ích khi chắc các GPU đồng cấu hình.
+- `HACHIMIMT_PROGRESS_SECONDS`: giãn nhịp cập nhật UI progress; mặc định `0.5`.
+## Linux (gõ lệnh tay)
+Các bước tương đương macOS:
+```bash
+python3 -m venv .venv
+source .venv/bin/activate
+pip install -r requirements.txt
+cd src && python setup.py        # tải model lần đầu
+python app.py                    # chạy app
+```
+## Benchmark file
+```bat
+cd D:\Projects\qt2
+python src\benchmark_file.py path\to\file.txt --model HachimiMT-60 --backend ct2 --beam 2 --chunk-mode sentence
+```
+Mặc định benchmark cũng chuẩn hóa phồn thể → giản thể như UI. Muốn tắt:
+```bat
+python src\benchmark_file.py path\to\file.txt --normalize none
+```

hachimimt-local.zip CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1bd070a05e1cfb9cb45b78ab2caf13b04dacf52486856b321caa045b49021389
-size 89978

 version https://git-lfs.github.com/spec/v1
+oid sha256:9a1c47e5e6973d9bd8f0277d928c27839f560bb1b14f0e8690ba562e8723b8c7
+size 94764

requirements.txt CHANGED Viewed

@@ -1,4 +1,9 @@
 sentencepiece>=0.2.0
 gradio>=6.0.0,<7
 ctranslate2>=4.0.0
 huggingface_hub>=0.23.0

+# LƯU Ý: KHÔNG cài torch ở file này.
+# Engine mặc định CTranslate2 không cần torch/transformers.
+# Chỉ cài nhóm PyTorch riêng nếu muốn dùng backend PyTorch thử nghiệm — xem README.
 sentencepiece>=0.2.0
+tokenizers>=0.19.0
 gradio>=6.0.0,<7
 ctranslate2>=4.0.0
 huggingface_hub>=0.23.0

src/test_pronoun_harmonizer_v9.py CHANGED Viewed

@@ -673,4 +673,7 @@ check_rows(
 )
 print(f"KẾT QUẢ: {_pass} PASS / {_fail} FAIL")
-sys.exit(1 if _fail else 0)

 )
 print(f"KẾT QUẢ: {_pass} PASS / {_fail} FAIL")
+if _fail:
+    raise AssertionError(f"pronoun_harmonizer_v9 suite failed: {_fail}")
+if __name__ == "__main__":
+    sys.exit(0)

src/translator.py CHANGED Viewed

@@ -52,6 +52,9 @@ class ModelConfig:
     # Tên thư mục con chứa bản CT2 trên repo HF. Mặc định "ct2-int8_float32";
     # một số repo dùng tên khác (vd "ct2-int8"), khai lại ở đây cho từng model.
     ct2_subdir: str = "ct2-int8_float32"
 MODELS: dict[str, ModelConfig] = {
@@ -111,11 +114,42 @@ MODELS: dict[str, ModelConfig] = {
             "no_repeat_ngram_size": 2,
             "repetition_penalty": 1.2,
         },
-        ct2_max_input_tokens=512,
         ct2_max_output_tokens=512,
         default_beam=1,
         ct2_size_mb=38,
     ),
 }
 # Model tải sẵn khi chạy setup (dùng được ngay); các model khác lazy-download.
@@ -128,14 +162,20 @@ DEFAULT_CT2_SUBDIR = "ct2-int8_float32"
 def _ct2_download_patterns(config: ModelConfig) -> list[str]:
     return [
         "config.json",
         "source.spm",
         "target.spm",
         "vocab.json",
         "tokenizer_config.json",
         f"{config.ct2_subdir}/*",
     ]
 SourceTokenJobs = list[Future[list[list[str]]]]
@@ -176,14 +216,18 @@ def _ct2_translator_kwargs(
     inter_threads: int,
     gpu_indices: list[int] | None = None,
 ) -> tuple[dict[str, object], int, str | None]:
     kwargs: dict[str, object] = dict(
         device=device,
         compute_type=compute_type,
-        intra_threads=intra_threads,
-        inter_threads=max(1, int(inter_threads)),
     )
     if device != "cuda":
-        return kwargs, max(1, int(inter_threads)), None
     if not gpu_indices:
         raise RuntimeError("Không có GPU CUDA khả dụng.")
@@ -199,7 +243,10 @@ def _ct2_translator_kwargs(
         kwargs["inter_threads"] = 1
     actual_inter_threads = int(kwargs["inter_threads"])
-    return kwargs, len(selected) * actual_inter_threads, ",".join(str(i) for i in selected)
 @lru_cache(maxsize=1)
@@ -342,6 +389,100 @@ class CT2SentencePieceTokenizer:
         return decoded
 def model_local_dir(config: ModelConfig) -> Path:
     return MODELS_DIR / config.model_id.split("/")[-1]
@@ -356,7 +497,8 @@ def _pytorch_ready(path: Path) -> bool:
 def _tokenizer_ready(path: Path) -> bool:
-    return (path / "source.spm").exists() or (path / "tokenizer_config.json").exists()
 def is_model_downloaded(model_key: str, backend: Backend | str = Backend.CT2) -> bool:
@@ -387,13 +529,15 @@ def ensure_model_files(config: ModelConfig, backend: Backend) -> Path:
         if _ct2_ready(local_dir, config.ct2_subdir) and _tokenizer_ready(local_dir):
             return local_dir
         patterns = _ct2_download_patterns(config)
     else:
         if _pytorch_ready(local_dir) and _tokenizer_ready(local_dir):
             return local_dir
         patterns = None
     snapshot_download(
-        config.model_id,
         local_dir=str(local_dir),
         allow_patterns=patterns,
     )
@@ -433,6 +577,7 @@ class HachimiTranslator:
         batch_type = os.environ.get("HACHIMIMT_CT2_BATCH_TYPE", "tokens").strip().lower()
         self._ct2_batch_type = batch_type if batch_type in {"examples", "tokens"} else "tokens"
         self._ct2_compute_type: str | None = None
         self._ct2_actual_inter_threads = self._ct2_inter_threads
         self._ct2_worker_count = 1
         self._ct2_device_indices_label: str | None = None
@@ -543,6 +688,7 @@ class HachimiTranslator:
             msg += (
                 f" · batch_type={self._ct2_batch_type}"
                 f" · {window_part}"
                 f" · inter={self._ct2_actual_inter_threads}"
             )
             if self._ct2_worker_count > 1:
@@ -563,6 +709,7 @@ class HachimiTranslator:
         self._tokenizer = None
         self._model_path = None
         self._ct2_compute_type = None
         self._ct2_actual_inter_threads = self._ct2_inter_threads
         self._ct2_worker_count = 1
         self._ct2_device_indices_label = None
@@ -611,7 +758,7 @@ class HachimiTranslator:
     def _load_ct2(self, config: ModelConfig) -> None:
         model_path = ensure_model_files(config, Backend.CT2)
-        tokenizer = CT2SentencePieceTokenizer(model_path)
         env_compute_type = os.environ.get("HACHIMIMT_COMPUTE_TYPE", "").strip()
         ct2_device = "cuda" if self._profile.has_cuda else "cpu"
@@ -654,6 +801,7 @@ class HachimiTranslator:
                     str(model_path / config.ct2_subdir), **kwargs
                 )
                 self._ct2_compute_type = compute_type
                 self._ct2_actual_inter_threads = int(kwargs["inter_threads"])
                 self._ct2_worker_count = worker_count
                 self._ct2_device_indices_label = device_indices_label

     # Tên thư mục con chứa bản CT2 trên repo HF. Mặc định "ct2-int8_float32";
     # một số repo dùng tên khác (vd "ct2-int8"), khai lại ở đây cho từng model.
     ct2_subdir: str = "ct2-int8_float32"
+    # Nếu bản CT2 nằm ở repo khác với model gốc, khai riêng ở đây. PyTorch backend
+    # vẫn tải từ model_id, còn CT2 tải từ repo này vào cùng thư mục cache local.
+    ct2_model_id: str | None = None
 MODELS: dict[str, ModelConfig] = {
             "no_repeat_ngram_size": 2,
             "repetition_penalty": 1.2,
         },
+        # 30M model drifts/hallucinates on dense paragraph chunks. Keep the
+        # source cap short enough to split long entity-heavy paragraphs.
+        ct2_max_input_tokens=160,
         ct2_max_output_tokens=512,
         default_beam=1,
         ct2_size_mb=38,
     ),
+    "HirashibaMT-Medium": ModelConfig(
+        label="HirashibaMT-Medium",
+        model_id="Moleys/hirashiba-mt-medium",
+        use_marian_class=True,
+        generate_kwargs={
+            "max_new_tokens": 256,
+        },
+        ct2_max_input_tokens=128,
+        ct2_max_output_tokens=256,
+        default_beam=4,
+        ct2_size_mb=62,
+        ct2_model_id="ngungodan/hirashiba-mt-medium-ct2",
+    ),
+    "HirashibaMT-Tiny": ModelConfig(
+        label="HirashibaMT-Tiny",
+        model_id="chi-vi/hirashiba-mt-tiny-zh-vi",
+        use_marian_class=True,
+        generate_kwargs={
+            "max_length": 512,
+        },
+        # Tiny uses a 4-layer Marian model. Keep paragraph chunks short and use
+        # greedy/low-beam decoding; higher beams can introduce light duplicates.
+        ct2_max_input_tokens=160,
+        ct2_max_output_tokens=512,
+        default_beam=1,
+        ct2_size_mb=17,
+        ct2_subdir="ct2-int8-keeppad",
+        ct2_model_id="ngungodan/hirashiba-mt-tiny-zh-vi-ct2",
+    ),
 }
 # Model tải sẵn khi chạy setup (dùng được ngay); các model khác lazy-download.
 def _ct2_download_patterns(config: ModelConfig) -> list[str]:
     return [
         "config.json",
+        "generation_config.json",
         "source.spm",
         "target.spm",
+        "tokenizer.json",
         "vocab.json",
         "tokenizer_config.json",
         f"{config.ct2_subdir}/*",
     ]
+def _ct2_repo_id(config: ModelConfig) -> str:
+    return config.ct2_model_id or config.model_id
 SourceTokenJobs = list[Future[list[list[str]]]]
     inter_threads: int,
     gpu_indices: list[int] | None = None,
 ) -> tuple[dict[str, object], int, str | None]:
+    actual_inter_threads = max(1, int(inter_threads))
+    requested_intra_threads = max(1, int(intra_threads))
     kwargs: dict[str, object] = dict(
         device=device,
         compute_type=compute_type,
+        inter_threads=actual_inter_threads,
     )
+    worker_count = actual_inter_threads
+    device_indices_label = None
     if device != "cuda":
+        kwargs["intra_threads"] = max(1, requested_intra_threads // worker_count)
+        return kwargs, worker_count, device_indices_label
     if not gpu_indices:
         raise RuntimeError("Không có GPU CUDA khả dụng.")
         kwargs["inter_threads"] = 1
     actual_inter_threads = int(kwargs["inter_threads"])
+    worker_count = len(selected) * actual_inter_threads
+    kwargs["intra_threads"] = max(1, requested_intra_threads // worker_count)
+    device_indices_label = ",".join(str(i) for i in selected)
+    return kwargs, worker_count, device_indices_label
 @lru_cache(maxsize=1)
         return decoded
+class CT2FastTokenizer:
+    """Minimal tokenizer.json wrapper for CTranslate2 inference."""
+    def __init__(self, model_path: Path) -> None:
+        try:
+            from tokenizers import Tokenizer
+        except Exception as exc:
+            raise RuntimeError(
+                "Model này dùng tokenizer.json; cần cài package tokenizers."
+            ) from exc
+        self._tokenizer = Tokenizer.from_file(str(model_path / "tokenizer.json"))
+        self.pad_token_id = self._tokenizer.token_to_id("<pad>")
+        self._eos_token_id = self._tokenizer.token_to_id("</s>")
+    def _encode_one(
+        self,
+        text: str,
+        *,
+        truncation: bool = False,
+        max_length: int | None = None,
+    ) -> list[int]:
+        token_ids = list(self._tokenizer.encode(text).ids)
+        if truncation and max_length is not None and len(token_ids) > max_length:
+            token_ids = token_ids[:max_length]
+            if token_ids and self._eos_token_id is not None:
+                token_ids[-1] = self._eos_token_id
+        return token_ids
+    def __call__(
+        self,
+        text_or_texts: str | list[str],
+        *,
+        truncation: bool = False,
+        max_length: int | None = None,
+        padding: bool = False,
+    ) -> dict[str, list[int] | list[list[int]]]:
+        del padding
+        if isinstance(text_or_texts, str):
+            return {
+                "input_ids": self._encode_one(
+                    text_or_texts,
+                    truncation=truncation,
+                    max_length=max_length,
+                )
+            }
+        return {
+            "input_ids": [
+                self._encode_one(text, truncation=truncation, max_length=max_length)
+                for text in text_or_texts
+            ]
+        }
+    def convert_ids_to_tokens(self, token_ids: list[int]) -> list[str]:
+        tokens: list[str] = []
+        for token_id in token_ids:
+            token = self._tokenizer.id_to_token(int(token_id))
+            if token is None:
+                raise ValueError(f"Token id không có trong vocab: {token_id}")
+            tokens.append(token)
+        return tokens
+    def convert_tokens_to_ids(self, tokens: list[str]) -> list[int]:
+        token_ids: list[int] = []
+        for token in tokens:
+            token_id = self._tokenizer.token_to_id(token)
+            if token_id is None:
+                raise ValueError(f"Token không có trong vocab: {token!r}")
+            token_ids.append(int(token_id))
+        return token_ids
+    def decode(self, token_ids: list[int], *, skip_special_tokens: bool = True) -> str:
+        return self.batch_decode([token_ids], skip_special_tokens=skip_special_tokens)[0]
+    def batch_decode(
+        self,
+        token_ids_batch: list[list[int]],
+        *,
+        skip_special_tokens: bool = True,
+    ) -> list[str]:
+        return [
+            self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
+            for token_ids in token_ids_batch
+        ]
+def _load_ct2_tokenizer(model_path: Path):
+    if (model_path / "source.spm").exists() and (model_path / "target.spm").exists():
+        return CT2SentencePieceTokenizer(model_path)
+    if (model_path / "tokenizer.json").exists():
+        return CT2FastTokenizer(model_path)
+    raise RuntimeError("Không tìm thấy tokenizer CT2: cần source.spm/target.spm hoặc tokenizer.json.")
 def model_local_dir(config: ModelConfig) -> Path:
     return MODELS_DIR / config.model_id.split("/")[-1]
 def _tokenizer_ready(path: Path) -> bool:
+    has_sentencepiece = (path / "source.spm").exists() and (path / "target.spm").exists()
+    return has_sentencepiece or (path / "tokenizer.json").exists()
 def is_model_downloaded(model_key: str, backend: Backend | str = Backend.CT2) -> bool:
         if _ct2_ready(local_dir, config.ct2_subdir) and _tokenizer_ready(local_dir):
             return local_dir
         patterns = _ct2_download_patterns(config)
+        repo_id = _ct2_repo_id(config)
     else:
         if _pytorch_ready(local_dir) and _tokenizer_ready(local_dir):
             return local_dir
         patterns = None
+        repo_id = config.model_id
     snapshot_download(
+        repo_id,
         local_dir=str(local_dir),
         allow_patterns=patterns,
     )
         batch_type = os.environ.get("HACHIMIMT_CT2_BATCH_TYPE", "tokens").strip().lower()
         self._ct2_batch_type = batch_type if batch_type in {"examples", "tokens"} else "tokens"
         self._ct2_compute_type: str | None = None
+        self._ct2_actual_intra_threads = self._ct2_threads
         self._ct2_actual_inter_threads = self._ct2_inter_threads
         self._ct2_worker_count = 1
         self._ct2_device_indices_label: str | None = None
             msg += (
                 f" · batch_type={self._ct2_batch_type}"
                 f" · {window_part}"
+                f" · intra={self._ct2_actual_intra_threads}"
                 f" · inter={self._ct2_actual_inter_threads}"
             )
             if self._ct2_worker_count > 1:
         self._tokenizer = None
         self._model_path = None
         self._ct2_compute_type = None
+        self._ct2_actual_intra_threads = self._ct2_threads
         self._ct2_actual_inter_threads = self._ct2_inter_threads
         self._ct2_worker_count = 1
         self._ct2_device_indices_label = None
     def _load_ct2(self, config: ModelConfig) -> None:
         model_path = ensure_model_files(config, Backend.CT2)
+        tokenizer = _load_ct2_tokenizer(model_path)
         env_compute_type = os.environ.get("HACHIMIMT_COMPUTE_TYPE", "").strip()
         ct2_device = "cuda" if self._profile.has_cuda else "cpu"
                     str(model_path / config.ct2_subdir), **kwargs
                 )
                 self._ct2_compute_type = compute_type
+                self._ct2_actual_intra_threads = int(kwargs["intra_threads"])
                 self._ct2_actual_inter_threads = int(kwargs["inter_threads"])
                 self._ct2_worker_count = worker_count
                 self._ct2_device_indices_label = device_indices_label