Instructions to use Sooryeon/qwen3.5-27b-ocr-sft-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Sooryeon/qwen3.5-27b-ocr-sft-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Sooryeon/qwen3.5-27b-ocr-sft-v1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Sooryeon/qwen3.5-27b-ocr-sft-v1")
model = AutoModelForMultimodalLM.from_pretrained("Sooryeon/qwen3.5-27b-ocr-sft-v1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Sooryeon/qwen3.5-27b-ocr-sft-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Sooryeon/qwen3.5-27b-ocr-sft-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sooryeon/qwen3.5-27b-ocr-sft-v1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Sooryeon/qwen3.5-27b-ocr-sft-v1

SGLang

How to use Sooryeon/qwen3.5-27b-ocr-sft-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Sooryeon/qwen3.5-27b-ocr-sft-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sooryeon/qwen3.5-27b-ocr-sft-v1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Sooryeon/qwen3.5-27b-ocr-sft-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Sooryeon/qwen3.5-27b-ocr-sft-v1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Sooryeon/qwen3.5-27b-ocr-sft-v1 with Docker Model Runner:
```
docker model run hf.co/Sooryeon/qwen3.5-27b-ocr-sft-v1
```

Qwen3.5-27B OCR SFT v1

Korean Public Documents & English Academic Papers · OCR · Semantic Structuring 한국 공공문서 · 영어 논문 · OCR · 의미 분석 & 구조화 파인튜닝 모델

This model is not a plain OCR model. It is a Qwen3.5-VL 27B checkpoint fine-tuned to interpret the meaning of document content and re-emit it as structured output (Markdown, hierarchical tables, key-value fields, sectioned blocks) — trained with a balanced Korean / English corpus that emphasizes Korean public-sector documents and English research papers.

이 모델은 단순 텍스트 인식 OCR이 아닙니다. Qwen3.5-VL 27B를 베이스로, 한국 공공문서의 의미를 해석하고 구조화된 표현(Markdown / 계층적 표 / 키-값 / 섹션)으로 출력하도록 파인튜닝한 버전입니다. 학습 데이터는 AIHub의 한국 공공기관 문서와 HuggingFace 상의 영어 논문 데이터셋을 한국어·영어 균형 있게 구성했습니다.

⚠️ v1 — experimental checkpoint. Evaluation metrics and training recipe will be expanded in later revisions. / 실험적 체크포인트입니다.

🎯 What this model is good at / 이 모델의 강점

A generic OCR model "reads the glyphs." This model reads, then understands, then reorganizes.

Structured output — Identifies titles, body text, tables, lists, signature blocks, stamp regions; re-emits as Markdown / HTML tables / JSON-like structure. 구조화 출력 — 제목·본문·표·목록·서명란·도장 영역을 식별하고 Markdown / HTML 표 / JSON-like 구조로 재구성.
Semantic analysis — Not a raw string dump; fields are grouped by what they mean (발신기관, 문서번호, 결재선, 수신처, 시행일자, 붙임 / sender, doc-number, routing, addressee, effective date, attachments). 의미 분석 — 스트링 추출이 아닌 "이 항목이 무엇을 의미하는가" 기준으로 필드를 정리.
Korean public-document specificity — Handles 공문 서식, 관인/직인 영역, 기관 특유 표기(○○시장, 붙임, 수신자 참조), hierarchical legal numbering (제1조–제2항–제3호). 한국 공공문서 특성 반영 — 공문 서식, 관인·직인, 기관 고유 표기, 제N조–제N항–제N호 계층.
English academic papers — Abstract / section / figure-caption segmentation, citation-friendly reading order, math-adjacent tables. 영어 논문 — 초록·섹션·그림 캡션 분리, 인용을 고려한 읽기 순서, 표/수식 인접 구조 복원.
Complex tables — Merged cells, multi-row headers, empty cells, mixed units, footnoted tables — reconstructed as semantic units. 복잡한 표 처리 — 병합 셀, 다중 헤더, 빈 칸, 단위 혼재, 주석 포함 표를 의미 단위로 복원.
Long context — Up to 262K tokens; multi-page documents can be processed in one pass. 장문 문맥 — 최대 262K 토큰으로 수십 페이지를 단일 호출에 처리.

📌 Specifications / 주요 사양


Base model	`Qwen/Qwen3.5-27B`
Architecture	`Qwen3_5ForConditionalGeneration` (Hybrid Linear + Full Attention)
Parameters	≈ 27B · merged full weights (no adapter)
Precision	bfloat16
Context length	262,144 tokens
Vocab size	248,320
Vision patch / merge	16 / 2 (Qwen3VLProcessor)
MTP module	1 layer (`model-mtp.safetensors`, optional serving)
Fine-tune type	Full-parameter SFT, merged checkpoint
Languages	Korean ↔ English (balanced)
License	Apache-2.0

📚 Training Data / 학습 데이터

Balanced Korean / English corpus, document-centric: 한국어 / 영어 균형 구성, 문서 중심 코퍼스:

AIHub Korean public-sector document datasets — 공문, 고시, 공고, 신청서, 증명서, 결재 문서, 회의록, 각종 행정 양식. AIHub의 한국 공공기관 문서 데이터셋들: 공문 / 고시 / 공고 / 신청서 / 증명서 / 결재 문서 / 회의록 / 행정 양식.
HuggingFace English academic paper datasets — abstract, figures, tables, bibliography-style layouts. HuggingFace 상의 영어 논문 데이터셋: 초록, 그림, 표, 참고문헌 레이아웃.

Tasks covered during SFT: OCR ground-truth transcription, Markdown/HTML structuring, field extraction, short semantic summary. SFT 과업: OCR 원문 전사, Markdown/HTML 구조화, 필드 추출, 짧은 의미 요약.

🚀 Quick Start / 빠른 시작

1) Transformers

from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image

model_id = "Sooryeon/qwen3.5-27b-ocr-sft-v1"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
    model_id, dtype="bfloat16", device_map="auto"
)

image = Image.open("document.png").convert("RGB")
messages = [{
    "role": "user",
    "content": [
        {"type": "image", "image": image},
        {"type": "text", "text":
            "Convert this document to Markdown, preserving layout. "
            "Render tables as HTML <table> with rowspan/colspan. "
            "Emit metadata fields (doc number, date, sender, recipient) as a separate block."
        },
    ],
}]
inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors="pt"
).to(model.device)

out = model.generate(**inputs, max_new_tokens=8192)
print(processor.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

2) vLLM serving

vllm serve Sooryeon/qwen3.5-27b-ocr-sft-v1 \
    --dtype bfloat16 \
    --max-model-len 32768 \
    --gpu-memory-utilization 0.90 \
    --trust-remote-code

OpenAI-compatible multimodal endpoint. Send images as image_url content parts. OpenAI 호환 멀티모달 엔드포인트 — 이미지를 image_url 컨텐츠 파트로 전송하세요.

💡 Prompt Patterns / 권장 프롬프트 패턴

Different instructions unlock different layers of output from the same model. 지시문을 달리하면 동일 모델이 다른 층위의 출력을 냅니다.

① Layout-faithful Markdown / 문서 구조 복원

Convert this document to Markdown preserving the original layout.
- Heading hierarchy as #/##/###
- Tables as HTML <table> including rowspan/colspan
- Footnotes, attachments, signature blocks as separate sections

② Field extraction / 필드 추출 (JSON)

Extract the following fields from this document:
  sender_org, recipient, doc_number, effective_date, title, author,
  approval_status, attachments
Respond as JSON only.

③ Semantic summary + structure / 의미 요약 + 구조화

Summarize the document in 3 sentences, then list the key items as
hierarchical bullet points grouped by semantic role.

⚙️ Serving Tips / 서빙 팁

Context length — Short-to-medium documents are in-distribution; for very long documents, page-chunked serving is more stable (tune --max-model-len per use case). 학습 맥락은 중·단문 중심 — 매우 긴 문서는 페이지 단위 청킹이 안정적.
Image resolution — The image processor accepts up to longest_edge = 16,777,216 pixels, but practical VRAM-friendly range is longest side 1,792–2,560 px. 실효 해상도는 장변 1,792–2,560px 권장.
Sampling — For structured output, temperature=0.2–0.4; for number / table fidelity, use temperature=0.0. 구조화 출력은 temperature=0.2~0.4, 표·숫자 정확도 우선이면 0.0 권장.
MTP — model-mtp.safetensors contains Multi-Token Prediction weights. Only enable on engines that support MTP; HF transformers inference does not require it. MTP 가중치는 지원 엔진에서만 활성화 — 일반 transformers 추론에는 불필요.

📉 Limitations / 한계

v1 experimental checkpoint — Natural images, handwriting, equations, and other domains outside the training mix show higher variance. v1 실험 체크포인트 — 자연 이미지·필기체·수식 등 학습 분포 밖은 편차 큼.
Potential hallucination — Semantic summarization / structuring can embellish. Cross-check against source for any legal or official use. 환각 가능성 — 구조화·요약에서 표현이 보완될 수 있으므로 법적·공적 용도는 원문 교차 검증 필수.
PII awareness — Public documents may contain Korean resident IDs, phone numbers, etc. Apply masking in your downstream pipeline. 공공문서에는 주민등록번호·연락처 등이 포함될 수 있으므로 별도 마스킹 적용.

📄 License

Apache-2.0 — see LICENSE or apache.org/licenses/LICENSE-2.0. Base model usage also follows Qwen/Qwen3.5-27B terms.

📚 Citation / 인용

@misc{sooryeon2026qwen35ocrsftv1,
  title  = {Qwen3.5-27B OCR SFT v1: Korean Public Documents and English
            Academic Papers with Semantic Structuring},
  author = {Sooryeon},
  year   = {2026},
  url    = {https://huggingface.co/Sooryeon/qwen3.5-27b-ocr-sft-v1}
}

🔗 Links

Base model: Qwen/Qwen3.5-27B
This model: Sooryeon/qwen3.5-27b-ocr-sft-v1

Fine-tuned with care for Korean public-sector and English academic document understanding. 한국 공공문서 및 영어 학술 문서 이해를 위해 세심하게 파인튜닝되었습니다.

Downloads last month: 3

Safetensors

Model size

27B params

Tensor type

BF16

Model tree for Sooryeon/qwen3.5-27b-ocr-sft-v1

Base model

Qwen/Qwen3.5-27B

Finetuned

(279)

this model