Instructions to use LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF", dtype="auto") - llama-cpp-python
How to use LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF", filename="EXAONE-4.0-1.2B-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M
- SGLang
How to use LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF with Ollama:
ollama run hf.co/LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M
- Unsloth Studio
How to use LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF to start chatting
- Pi
How to use LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF with Docker Model Runner:
docker model run hf.co/LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M
- Lemonade
How to use LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.EXAONE-4.0-1.2B-GGUF-Q4_K_M
List all available models
lemonade list
Use Docker
docker model run hf.co/LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF:
🎉 License Updated! We are pleased to announce our more flexible licensing terms 🤗
✈️ Try on FriendliAI (licensed under commercial purposes)
📢 EXAONE 4.0 is officially supported by llama.cpp! Please check the guide below
EXAONE-4.0-1.2B-GGUF
Introduction
We introduce EXAONE 4.0, which integrates a Non-reasoning mode and Reasoning mode to achieve both the excellent usability of EXAONE 3.5 and the advanced reasoning abilities of EXAONE Deep. To pave the way for the agentic AI era, EXAONE 4.0 incorporates essential features such as agentic tool use, and its multilingual capabilities are extended to support Spanish in addition to English and Korean.
The EXAONE 4.0 model series consists of two sizes: a mid-size 32B model optimized for high performance, and a small-size 1.2B model designed for on-device applications.
In the EXAONE 4.0 architecture, we apply new architectural changes compared to previous EXAONE models as below:
- Hybrid Attention: For the 32B model, we adopt hybrid attention scheme, which combines Local attention (sliding window attention) with Global attention (full attention) in a 3:1 ratio. We do not use RoPE (Rotary Positional Embedding) for global attention for better global context understanding.
- QK-Reorder-Norm: We reorder the LayerNorm position from the traditional Pre-LN scheme by applying LayerNorm directly to the attention and MLP outputs, and we add RMS normalization right after the Q and K projection. It helps yield better performance on downstream tasks despite consuming more computation.
For more details, please refer to our technical report, HuggingFace paper, blog, and GitHub.
Model Configuration
- Number of Parameters (without embeddings): 1.07B
- Number of Layers: 30
- Number of Attention Heads: GQA with 32-heads and 8-KV heads
- Vocab Size: 102,400
- Context Length: 65,536 tokens
- Quantization:
Q8_0,Q6_K,Q5_K_M,Q4_K_M,IQ4_XSin GGUF format (also includesBF16weights)
Quickstart
llama.cpp
You can run EXAONE models locally using llama.cpp by following these steps:
Install the latest version of llama.cpp (version >=
b5932). Please check the official installation guide from llama.cpp.Download the EXAONE 4.0 model weights in GGUF format.
huggingface-cli download LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF \ --include "EXAONE-4.0-1.2B-Q4_K_M.gguf" \ --local-dir .
Generation with `llama-cli`
Apply chat template using transformers.
This process is necessary to avoid issues with current EXAONE modeling code in
llama.cpp. This is work in progress at our PR. We will update this once these issues are solved.from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "LGAI-EXAONE/EXAONE-4.0-1.2B" tokenizer = AutoTokenizer.from_pretrained(model_name) messages = [ {"role": "user", "content": "Let's work together on local system!"} ] input_text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, ) print(repr(input_text)) with open("inputs.txt", "w") as f: f.write(input_text)Generate result with greedy decoding.
llama-cli -m EXAONE-4.0-1.2B-Q4_K_M.gguf \ -fa -ngl 31 \ --temp 0.0 --top-k 1 \ -f inputs.txt -no-cnv
OpenAI compatible server with `llama-server`
Run llama-server with EXAONE 4.0 Jinja template. You can find the chat template file in this repository.
llama-server -m EXAONE-4.0-1.2B-Q4_K_M.gguf \ -c 131072 -fa -ngl 31 \ --temp 0.6 --top-p 0.95 \ --jinja --chat-template-file chat_template.jinja \ --host 0.0.0.0 --port 8820 \ -a EXAONE-4.0-1.2B-Q4_K_MUse OpenAI chat completion to test the GGUF model.
curl -X POST http://localhost:8820/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "EXAONE-4.0-1.2B-Q4_K_M", "messages": [ {"role": "user", "content": "Let'\''s work together on server!"} ], "max_tokens": 1024, "temperature": 0.6, "top_p": 0.95, "chat_template_kwargs": {"enable_thinking": false} }'
Performance
The following tables show the evaluation results of each model, with reasoning and non-reasoning mode. The evaluation details can be found in the technical report.
- ✅ denotes the model has a hybrid reasoning capability, evaluated by selecting reasoning / non-reasoning on the purpose.
- To assess Korean practical and professional knowledge, we adopt both the KMMLU-Redux and KMMLU-Pro benchmarks. Both datasets are publicly released!
- The evaluation results are based on the original model, not quantized model.
32B Reasoning Mode
| EXAONE 4.0 32B | Phi 4 reasoning-plus | Magistral Small-2506 | Qwen 3 32B | Qwen 3 235B | DeepSeek R1-0528 | |
|---|---|---|---|---|---|---|
| Model Size | 32.0B | 14.7B | 23.6B | 32.8B | 235B | 671B |
| Hybrid Reasoning | ✅ | ✅ | ✅ | |||
| World Knowledge | ||||||
| MMLU-Redux | 92.3 | 90.8 | 86.8 | 90.9 | 92.7 | 93.4 |
| MMLU-Pro | 81.8 | 76.0 | 73.4 | 80.0 | 83.0 | 85.0 |
| GPQA-Diamond | 75.4 | 68.9 | 68.2 | 68.4 | 71.1 | 81.0 |
| Math/Coding | ||||||
| AIME 2025 | 85.3 | 78.0 | 62.8 | 72.9 | 81.5 | 87.5 |
| HMMT Feb 2025 | 72.9 | 53.6 | 43.5 | 50.4 | 62.5 | 79.4 |
| LiveCodeBench v5 | 72.6 | 51.7 | 55.8 | 65.7 | 70.7 | 75.2 |
| LiveCodeBench v6 | 66.7 | 47.1 | 47.4 | 60.1 | 58.9 | 70.3 |
| Instruction Following | ||||||
| IFEval | 83.7 | 84.9 | 37.9 | 85.0 | 83.4 | 80.8 |
| Multi-IF (EN) | 73.5 | 56.1 | 27.4 | 73.4 | 73.4 | 72.0 |
| Agentic Tool Use | ||||||
| BFCL-v3 | 63.9 | N/A | 40.4 | 70.3 | 70.8 | 64.7 |
| Tau-Bench (Airline) | 51.5 | N/A | 38.5 | 34.5 | 37.5 | 53.5 |
| Tau-Bench (Retail) | 62.8 | N/A | 10.2 | 55.2 | 58.3 | 63.9 |
| Multilinguality | ||||||
| KMMLU-Pro | 67.7 | 55.8 | 51.5 | 61.4 | 68.1 | 71.7 |
| KMMLU-Redux | 72.7 | 62.7 | 54.6 | 67.5 | 74.5 | 77.0 |
| KSM | 87.6 | 79.8 | 71.9 | 82.8 | 86.2 | 86.7 |
| MMMLU (ES) | 85.6 | 84.3 | 68.9 | 82.8 | 86.7 | 88.2 |
| MATH500 (ES) | 95.8 | 94.2 | 83.5 | 94.3 | 95.1 | 96.0 |
32B Non-Reasoning Mode
| EXAONE 4.0 32B | Phi 4 | Mistral-Small-2506 | Gemma3 27B | Qwen3 32B | Qwen3 235B | Llama-4-Maverick | DeepSeek V3-0324 | |
|---|---|---|---|---|---|---|---|---|
| Model Size | 32.0B | 14.7B | 24.0B | 27.4B | 32.8B | 235B | 402B | 671B |
| Hybrid Reasoning | ✅ | ✅ | ✅ | |||||
| World Knowledge | ||||||||
| MMLU-Redux | 89.8 | 88.3 | 85.9 | 85.0 | 85.7 | 89.2 | 92.3 | 92.3 |
| MMLU-Pro | 77.6 | 70.4 | 69.1 | 67.5 | 74.4 | 77.4 | 80.5 | 81.2 |
| GPQA-Diamond | 63.7 | 56.1 | 46.1 | 42.4 | 54.6 | 62.9 | 69.8 | 68.4 |
| Math/Coding | ||||||||
| AIME 2025 | 35.9 | 17.8 | 30.2 | 23.8 | 20.2 | 24.7 | 18.0 | 50.0 |
| HMMT Feb 2025 | 21.8 | 4.0 | 16.9 | 10.3 | 9.8 | 11.9 | 7.3 | 29.2 |
| LiveCodeBench v5 | 43.3 | 24.6 | 25.8 | 27.5 | 31.3 | 35.3 | 43.4 | 46.7 |
| LiveCodeBench v6 | 43.1 | 27.4 | 26.9 | 29.7 | 28.0 | 31.4 | 32.7 | 44.0 |
| Instruction Following | ||||||||
| IFEval | 84.8 | 63.0 | 77.8 | 82.6 | 83.2 | 83.2 | 85.4 | 81.2 |
| Multi-IF (EN) | 71.6 | 47.7 | 63.2 | 72.1 | 71.9 | 72.5 | 77.9 | 68.3 |
| Long Context | ||||||||
| HELMET | 58.3 | N/A | 61.9 | 58.3 | 54.5 | 63.3 | 13.7 | N/A |
| RULER | 88.2 | N/A | 71.8 | 66.0 | 85.6 | 90.6 | 2.9 | N/A |
| LongBench v1 | 48.1 | N/A | 51.5 | 51.5 | 44.2 | 45.3 | 34.7 | N/A |
| Agentic Tool Use | ||||||||
| BFCL-v3 | 65.2 | N/A | 57.7 | N/A | 63.0 | 68.0 | 52.9 | 63.8 |
| Tau-Bench (Airline) | 25.5 | N/A | 36.1 | N/A | 16.0 | 27.0 | 38.0 | 40.5 |
| Tau-Bench (Retail) | 55.9 | N/A | 35.5 | N/A | 47.6 | 56.5 | 6.5 | 68.5 |
| Multilinguality | ||||||||
| KMMLU-Pro | 60.0 | 44.8 | 51.0 | 50.7 | 58.3 | 64.4 | 68.8 | 67.3 |
| KMMLU-Redux | 64.8 | 50.1 | 53.6 | 53.3 | 64.4 | 71.7 | 76.9 | 72.2 |
| KSM | 59.8 | 29.1 | 35.5 | 36.1 | 41.3 | 46.6 | 40.6 | 63.5 |
| Ko-LongBench | 76.9 | N/A | 55.4 | 72.0 | 73.9 | 74.6 | 65.6 | N/A |
| MMMLU (ES) | 80.6 | 81.2 | 78.4 | 78.7 | 82.1 | 83.7 | 86.9 | 86.7 |
| MATH500 (ES) | 87.3 | 78.2 | 83.4 | 86.8 | 84.7 | 87.2 | 78.7 | 89.2 |
| WMT24++ (ES) | 90.7 | 89.3 | 92.2 | 93.1 | 91.4 | 92.9 | 92.7 | 94.3 |
1.2B Reasoning Mode
| EXAONE 4.0 1.2B | EXAONE Deep 2.4B | Qwen 3 0.6B | Qwen 3 1.7B | SmolLM 3 3B | |
|---|---|---|---|---|---|
| Model Size | 1.28B | 2.41B | 596M | 1.72B | 3.08B |
| Hybrid Reasoning | ✅ | ✅ | ✅ | ✅ | |
| World Knowledge | |||||
| MMLU-Redux | 71.5 | 68.9 | 55.6 | 73.9 | 74.8 |
| MMLU-Pro | 59.3 | 56.4 | 38.3 | 57.7 | 57.8 |
| GPQA-Diamond | 52.0 | 54.3 | 27.9 | 40.1 | 41.7 |
| Math/Coding | |||||
| AIME 2025 | 45.2 | 47.9 | 15.1 | 36.8 | 36.7 |
| HMMT Feb 2025 | 34.0 | 27.3 | 7.0 | 21.8 | 26.0 |
| LiveCodeBench v5 | 44.6 | 47.2 | 12.3 | 33.2 | 27.6 |
| LiveCodeBench v6 | 45.3 | 43.1 | 16.4 | 29.9 | 29.1 |
| Instruction Following | |||||
| IFEval | 67.8 | 71.0 | 59.2 | 72.5 | 71.2 |
| Multi-IF (EN) | 53.9 | 54.5 | 37.5 | 53.5 | 47.5 |
| Agentic Tool Use | |||||
| BFCL-v3 | 52.9 | N/A | 46.4 | 56.6 | 37.1 |
| Tau-Bench (Airline) | 20.5 | N/A | 22.0 | 31.0 | 37.0 |
| Tau-Bench (Retail) | 28.1 | N/A | 3.3 | 6.5 | 5.4 |
| Multilinguality | |||||
| KMMLU-Pro | 42.7 | 24.6 | 21.6 | 38.3 | 30.5 |
| KMMLU-Redux | 46.9 | 25.0 | 24.5 | 38.0 | 33.7 |
| KSM | 60.6 | 60.9 | 22.8 | 52.9 | 49.7 |
| MMMLU (ES) | 62.4 | 51.4 | 48.8 | 64.5 | 64.7 |
| MATH500 (ES) | 88.8 | 84.5 | 70.6 | 87.9 | 87.5 |
1.2B Non-Reasoning Mode
| EXAONE 4.0 1.2B | Qwen 3 0.6B | Gemma 3 1B | Qwen 3 1.7B | SmolLM 3 3B | |
|---|---|---|---|---|---|
| Model Size | 1.28B | 596M | 1.00B | 1.72B | 3.08B |
| Hybrid Reasoning | ✅ | ✅ | ✅ | ✅ | |
| World Knowledge | |||||
| MMLU-Redux | 66.9 | 44.6 | 40.9 | 63.4 | 65.0 |
| MMLU-Pro | 52.0 | 26.6 | 14.7 | 43.7 | 43.6 |
| GPQA-Diamond | 40.1 | 22.9 | 19.2 | 28.6 | 35.7 |
| Math/Coding | |||||
| AIME 2025 | 23.5 | 2.6 | 2.1 | 9.8 | 9.3 |
| HMMT Feb 2025 | 13.0 | 1.0 | 1.5 | 5.1 | 4.7 |
| LiveCodeBench v5 | 26.4 | 3.6 | 1.8 | 11.6 | 11.4 |
| LiveCodeBench v6 | 30.1 | 6.9 | 2.3 | 16.6 | 20.6 |
| Instruction Following | |||||
| IFEval | 74.7 | 54.5 | 80.2 | 68.2 | 76.7 |
| Multi-IF (EN) | 62.1 | 37.5 | 32.5 | 51.0 | 51.9 |
| Long Context | |||||
| HELMET | 41.2 | 21.1 | N/A | 33.8 | 38.6 |
| RULER | 77.4 | 55.1 | N/A | 65.9 | 66.3 |
| LongBench v1 | 36.9 | 32.4 | N/A | 41.9 | 39.9 |
| Agentic Tool Use | |||||
| BFCL-v3 | 55.7 | 44.1 | N/A | 52.2 | 47.3 |
| Tau-Bench (Airline) | 10.0 | 31.5 | N/A | 13.5 | 38.0 |
| Tau-Bench (Retail) | 21.7 | 5.7 | N/A | 4.6 | 6.7 |
| Multilinguality | |||||
| KMMLU-Pro | 37.5 | 24.6 | 9.7 | 29.5 | 27.6 |
| KMMLU-Redux | 40.4 | 22.8 | 19.4 | 29.8 | 26.4 |
| KSM | 26.3 | 0.1 | 22.8 | 16.3 | 16.1 |
| Ko-LongBench | 69.8 | 16.4 | N/A | 57.1 | 15.7 |
| MMMLU (ES) | 54.6 | 39.5 | 35.9 | 54.3 | 55.1 |
| MATH500 (ES) | 71.2 | 38.5 | 41.2 | 66.0 | 62.4 |
| WMT24++ (ES) | 65.9 | 58.2 | 76.9 | 76.7 | 84.0 |
Usage Guideline
To achieve the expected performance, we recommend using the following configurations:
- For non-reasoning mode, we recommend using a lower temperature value such as
temperature<0.6for better performance.- For reasoning mode (using
<think>block), we recommend usingtemperature=0.6andtop_p=0.95.
- If you suffer from the model degeneration, we recommend using
presence_penalty=1.5.- For Korean general conversation with 1.2B model, we suggest to use
temperature=0.1to avoid code switching.
Limitation
The EXAONE language model has certain limitations and may occasionally generate inappropriate responses. The language model generates responses based on the output probability of tokens, and it is determined during learning from training data. While we have made every effort to exclude personal, harmful, and biased information from the training data, some problematic content may still be included, potentially leading to undesirable responses. Please note that the text generated by EXAONE language model does not reflect the views of LG AI Research.
- Inappropriate answers may be generated, which contain personal, harmful or other inappropriate information.
- Biased responses may be generated, which are associated with age, gender, race, and so on.
- The generated responses rely heavily on statistics from the training data, which can result in the generation of semantically or syntactically incorrect sentences.
- Since the model does not reflect the latest information, the responses may be false or contradictory.
LG AI Research strives to reduce potential risks that may arise from EXAONE language models. Users are not allowed to engage in any malicious activities (e.g., keying in illegal information) that may induce the creation of inappropriate outputs violating LG AI's ethical principles when using EXAONE language models.
License
The model is licensed under EXAONE AI Model License Agreement 1.2 - NC
The main difference from the older version is as below:
- We removed the claim of model output ownership from the license.
- We restrict the model use against the development of models that compete with EXAONE.
- We allow the model to be used for educational purposes, not just research.
Citation
@article{exaone-4.0,
title={EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes},
author={{LG AI Research}},
journal={arXiv preprint arXiv:2507.11407},
year={2025}
}
Contact
LG AI Research Technical Support: contact_us@lgresearch.ai
- Downloads last month
- 1,062
4-bit
5-bit
6-bit
8-bit
16-bit
Model tree for LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF
Base model
LGAI-EXAONE/EXAONE-4.0-1.2B
Install from pip and serve model
# Install vLLM from pip: pip install vllm# Start the vLLM server: vllm serve "LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF"# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LGAI-EXAONE/EXAONE-4.0-1.2B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'