Instructions to use thanhhoangnvbg/empathAI-llama3.1-8b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use thanhhoangnvbg/empathAI-llama3.1-8b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="thanhhoangnvbg/empathAI-llama3.1-8b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("thanhhoangnvbg/empathAI-llama3.1-8b")
model = AutoModelForCausalLM.from_pretrained("thanhhoangnvbg/empathAI-llama3.1-8b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use thanhhoangnvbg/empathAI-llama3.1-8b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "thanhhoangnvbg/empathAI-llama3.1-8b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thanhhoangnvbg/empathAI-llama3.1-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/thanhhoangnvbg/empathAI-llama3.1-8b

SGLang

How to use thanhhoangnvbg/empathAI-llama3.1-8b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "thanhhoangnvbg/empathAI-llama3.1-8b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thanhhoangnvbg/empathAI-llama3.1-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "thanhhoangnvbg/empathAI-llama3.1-8b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "thanhhoangnvbg/empathAI-llama3.1-8b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use thanhhoangnvbg/empathAI-llama3.1-8b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for thanhhoangnvbg/empathAI-llama3.1-8b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for thanhhoangnvbg/empathAI-llama3.1-8b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for thanhhoangnvbg/empathAI-llama3.1-8b to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="thanhhoangnvbg/empathAI-llama3.1-8b",
    max_seq_length=2048,
)

Docker Model Runner
How to use thanhhoangnvbg/empathAI-llama3.1-8b with Docker Model Runner:
```
docker model run hf.co/thanhhoangnvbg/empathAI-llama3.1-8b
```

🧠 EmpathAI - Llama 3.1 8B (CSKH Thấu Cảm & Chống Ảo Giác)

EmpathAI là một mô hình ngôn ngữ lớn (LLM) được tinh chỉnh chuyên biệt cho bài toán Chăm sóc khách hàng (CSKH) bằng Tiếng Việt. Không giống như các chatbot thông thường, EmpathAI được huấn luyện để giải quyết các tình huống khó khăn nhất: Xử lý khiếu nại, xoa dịu cảm xúc tiêu cực và tuyệt đối tuân thủ quy trình nghiệp vụ của doanh nghiệp.

📌 Phiên bản hiện tại: Nhánh main chứa mô hình đã gộp (Merged 16-bit) hoàn chỉnh. Bạn có thể sử dụng trực tiếp với thư viện transformers hoặc deploy lên Vertex AI mà không cần nạp thêm Adapter. Phiên bản cũ được lưu tại nhánh old_version.

🌟 Ưu Điểm Vượt Trội

Thấu cảm thực tế (Emotional Intelligence): Được huấn luyện qua pipeline DPO (Direct Preference Optimization) để phân biệt giữa cách trả lời máy móc và cách trả lời thấu cảm, chân thành.
Thiết kế cho hệ thống RAG (RAG-Native): Mô hình biết cách dừng lại đúng lúc để yêu cầu thông tin định danh (Mã đơn hàng, SĐT) - bước then chốt để hệ thống RAG truy xuất dữ liệu chính xác.
Chống ảo giác (Zero Hallucination Focus): Được ép buộc tuân thủ luật "Chỉ nói khi có bằng chứng", giảm thiểu tối đa việc tự bịa đặt thông tin kiểm tra đơn hàng hoặc hứa hẹn đền bù sai quy định.
Khả năng "Vào vai" linh hoạt: Chỉ cần thay đổi System Prompt, EmpathAI có thể ngay lập tức biến thành nhân viên của bất kỳ thương hiệu nào (Thương mại điện tử, Ngân hàng, F&B...).

📊 Thông Số Kỹ Thuật

Thành phần	Chi tiết
Mô hình gốc	`Llama-3.1-8B-Instruct`
Kiến trúc	QLoRA (Fine-tuned)
Hạ tầng huấn luyện	Google Cloud Vertex AI
GPU sử dụng	NVIDIA L4
Pipeline huấn luyện	Supervised Fine-Tuning (SFT) + Direct Preference Optimization (DPO)
Tối ưu hóa	Unsloth (2x speed, 70% memory reduction)

🚀 Hướng Dẫn Sử Dụng

Vì mô hình đã được gộp hoàn chỉnh (Merged), bạn có thể sử dụng nó với thư viện transformers tiêu chuẩn:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "thanhhoangnvbg/empathAI-llama3.1-8b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Ví dụ: Thiết lập vai trò nhân viên CSKH MyKingdom
messages = [
    {"role": "system", "content": """Bạn là EmpathAI, chuyên viên CSKH cấp cao của MyKingdom. Khách hàng đang rất tức giận và văng tục.
Nhiệm vụ của bạn là xoa dịu ngắn gọn và thu thập thông tin để hệ thống RAG xử lý. 

BẠN PHẢI TUÂN THỦ TUYỆT ĐỐI CÁC LUẬT SAU (NẾU VI PHẠM SẼ BỊ PHẠT):
1. KHÔNG ẢO GIÁC: Tuyệt đối không tự bịa ra việc đã kiểm tra đơn hàng, không tự bịa nguyên nhân lỗi (nhầm kho, hỏng hóc) khi chưa có mã đơn.
2. KHÔNG HỨA HẸN ĐỀN BÙ: Tuyệt đối không đề xuất gửi hàng mới, hoàn tiền, freeship hay tặng voucher.
3. KHÔNG TRANH CÃI: Phớt lờ lời chửi thề, giữ thái độ chuyên nghiệp, lịch sự nhưng không sến sẩm, dài dòng.
4. HÀNH ĐỘNG DUY NHẤT: Trả lời tối đa 2-3 câu. Xin lỗi về trải nghiệm tồi tệ, sau đó YÊU CẦU khách hàng cung cấp [Mã đơn hàng] hoặc [Hình ảnh/Video] để bạn có cơ sở kiểm tra."""},
    {"role": "user", "content": "Giao hàng kiểu gì mà chậm thế hả?"},
]

inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=128, temperature=0.5)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

🦙 GGUF / Local Inference

EmpathAI hiện hỗ trợ đầy đủ định dạng GGUF để chạy local inference với:

Ollama
llama.cpp
LM Studio
KoboldCpp
OpenWebUI

Các file GGUF được lưu tại nhánh gguf.

📦 Available Quantizations

File	Recommended Use
`Q4_K_M.gguf`	Cân bằng tốt nhất giữa chất lượng và tốc độ. Khuyến nghị cho đa số GPU consumer và laptop.
`Q5_K_M.gguf`	Chất lượng cao hơn nhẹ, dùng nhiều VRAM/RAM hơn.

🚀 Chạy với Ollama

Tạo Modelfile:

FROM ./empathAI-llama3.1-8b.Q4_K_M.gguf

TEMPLATE """{{ .Prompt }}"""

PARAMETER temperature 0.5
PARAMETER num_ctx 4096

Build model:

ollama create empathai -f Modelfile

Run:

ollama run empathai

🚀 Chạy với llama.cpp

./llama-cli \
--model empathAI-llama3.1-8b.Q4_K_M.gguf \
-p "Xin chào"

💡 Recommended Hardware

Quant	RAM / VRAM khuyến nghị
Q4_K_M	~8GB+
Q5_K_M	~10GB+

🔥 Notes

Phiên bản GGUF được convert trực tiếp từ merged model gốc.
Tối ưu cho inference Tiếng Việt.
Có thể dùng làm backend cho hệ thống RAG CSKH local/private deployment.

Downloads last month: 2,391

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for thanhhoangnvbg/empathAI-llama3.1-8b

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Quantized

unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit

Finetuned

(999)

this model

thanhhoangnvbg
/

empathAI-llama3.1-8b

🧠 EmpathAI - Llama 3.1 8B (CSKH Thấu Cảm & Chống Ảo Giác)

🌟 Ưu Điểm Vượt Trội

📊 Thông Số Kỹ Thuật

🚀 Hướng Dẫn Sử Dụng

🦙 GGUF / Local Inference

📦 Available Quantizations

🚀 Chạy với Ollama

🚀 Chạy với llama.cpp

💡 Recommended Hardware

🔥 Notes

Model tree for thanhhoangnvbg/empathAI-llama3.1-8b

Dataset used to train thanhhoangnvbg/empathAI-llama3.1-8b