Instructions to use Sooryeon/qwen3.5-27b-ocr-sft-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Sooryeon/qwen3.5-27b-ocr-sft-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Sooryeon/qwen3.5-27b-ocr-sft-v1") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("Sooryeon/qwen3.5-27b-ocr-sft-v1") model = AutoModelForMultimodalLM.from_pretrained("Sooryeon/qwen3.5-27b-ocr-sft-v1") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Sooryeon/qwen3.5-27b-ocr-sft-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Sooryeon/qwen3.5-27b-ocr-sft-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sooryeon/qwen3.5-27b-ocr-sft-v1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Sooryeon/qwen3.5-27b-ocr-sft-v1
- SGLang
How to use Sooryeon/qwen3.5-27b-ocr-sft-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Sooryeon/qwen3.5-27b-ocr-sft-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sooryeon/qwen3.5-27b-ocr-sft-v1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Sooryeon/qwen3.5-27b-ocr-sft-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sooryeon/qwen3.5-27b-ocr-sft-v1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Sooryeon/qwen3.5-27b-ocr-sft-v1 with Docker Model Runner:
docker model run hf.co/Sooryeon/qwen3.5-27b-ocr-sft-v1
Qwen3.5-27B OCR SFT v1
Korean Public Documents & English Academic Papers · OCR · Semantic Structuring 한국 공공문서 · 영어 논문 · OCR · 의미 분석 & 구조화 파인튜닝 모델
This model is not a plain OCR model. It is a Qwen3.5-VL 27B checkpoint fine-tuned to interpret the meaning of document content and re-emit it as structured output (Markdown, hierarchical tables, key-value fields, sectioned blocks) — trained with a balanced Korean / English corpus that emphasizes Korean public-sector documents and English research papers.
이 모델은 단순 텍스트 인식 OCR이 아닙니다. Qwen3.5-VL 27B를 베이스로, 한국 공공문서의 의미를 해석하고 구조화된 표현(Markdown / 계층적 표 / 키-값 / 섹션)으로 출력하도록 파인튜닝한 버전입니다. 학습 데이터는 AIHub의 한국 공공기관 문서와 HuggingFace 상의 영어 논문 데이터셋을 한국어·영어 균형 있게 구성했습니다.
⚠️ v1 — experimental checkpoint. Evaluation metrics and training recipe will be expanded in later revisions. / 실험적 체크포인트입니다.
🎯 What this model is good at / 이 모델의 강점
A generic OCR model "reads the glyphs." This model reads, then understands, then reorganizes.
Structured output — Identifies titles, body text, tables, lists, signature blocks, stamp regions; re-emits as Markdown / HTML tables / JSON-like structure. 구조화 출력 — 제목·본문·표·목록·서명란·도장 영역을 식별하고 Markdown / HTML 표 / JSON-like 구조로 재구성.
Semantic analysis — Not a raw string dump; fields are grouped by what they mean (발신기관, 문서번호, 결재선, 수신처, 시행일자, 붙임 / sender, doc-number, routing, addressee, effective date, attachments). 의미 분석 — 스트링 추출이 아닌 "이 항목이 무엇을 의미하는가" 기준으로 필드를 정리.
Korean public-document specificity — Handles 공문 서식, 관인/직인 영역, 기관 특유 표기(
○○시장,붙임,수신자 참조), hierarchical legal numbering (제1조–제2항–제3호). 한국 공공문서 특성 반영 — 공문 서식, 관인·직인, 기관 고유 표기, 제N조–제N항–제N호 계층.English academic papers — Abstract / section / figure-caption segmentation, citation-friendly reading order, math-adjacent tables. 영어 논문 — 초록·섹션·그림 캡션 분리, 인용을 고려한 읽기 순서, 표/수식 인접 구조 복원.
Complex tables — Merged cells, multi-row headers, empty cells, mixed units, footnoted tables — reconstructed as semantic units. 복잡한 표 처리 — 병합 셀, 다중 헤더, 빈 칸, 단위 혼재, 주석 포함 표를 의미 단위로 복원.
Long context — Up to 262K tokens; multi-page documents can be processed in one pass. 장문 문맥 — 최대 262K 토큰으로 수십 페이지를 단일 호출에 처리.
📌 Specifications / 주요 사양
| Base model | Qwen/Qwen3.5-27B |
| Architecture | Qwen3_5ForConditionalGeneration (Hybrid Linear + Full Attention) |
| Parameters | ≈ 27B · merged full weights (no adapter) |
| Precision | bfloat16 |
| Context length | 262,144 tokens |
| Vocab size | 248,320 |
| Vision patch / merge | 16 / 2 (Qwen3VLProcessor) |
| MTP module | 1 layer (model-mtp.safetensors, optional serving) |
| Fine-tune type | Full-parameter SFT, merged checkpoint |
| Languages | Korean ↔ English (balanced) |
| License | Apache-2.0 |
📚 Training Data / 학습 데이터
Balanced Korean / English corpus, document-centric: 한국어 / 영어 균형 구성, 문서 중심 코퍼스:
- AIHub Korean public-sector document datasets — 공문, 고시, 공고, 신청서, 증명서, 결재 문서, 회의록, 각종 행정 양식. AIHub의 한국 공공기관 문서 데이터셋들: 공문 / 고시 / 공고 / 신청서 / 증명서 / 결재 문서 / 회의록 / 행정 양식.
- HuggingFace English academic paper datasets — abstract, figures, tables, bibliography-style layouts. HuggingFace 상의 영어 논문 데이터셋: 초록, 그림, 표, 참고문헌 레이아웃.
Tasks covered during SFT: OCR ground-truth transcription, Markdown/HTML structuring, field extraction, short semantic summary. SFT 과업: OCR 원문 전사, Markdown/HTML 구조화, 필드 추출, 짧은 의미 요약.
🚀 Quick Start / 빠른 시작
1) Transformers
from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image
model_id = "Sooryeon/qwen3.5-27b-ocr-sft-v1"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
model_id, dtype="bfloat16", device_map="auto"
)
image = Image.open("document.png").convert("RGB")
messages = [{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text":
"Convert this document to Markdown, preserving layout. "
"Render tables as HTML <table> with rowspan/colspan. "
"Emit metadata fields (doc number, date, sender, recipient) as a separate block."
},
],
}]
inputs = processor.apply_chat_template(
messages, add_generation_prompt=True, tokenize=True,
return_dict=True, return_tensors="pt"
).to(model.device)
out = model.generate(**inputs, max_new_tokens=8192)
print(processor.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
2) vLLM serving
vllm serve Sooryeon/qwen3.5-27b-ocr-sft-v1 \
--dtype bfloat16 \
--max-model-len 32768 \
--gpu-memory-utilization 0.90 \
--trust-remote-code
OpenAI-compatible multimodal endpoint. Send images as image_url content parts.
OpenAI 호환 멀티모달 엔드포인트 — 이미지를 image_url 컨텐츠 파트로 전송하세요.
💡 Prompt Patterns / 권장 프롬프트 패턴
Different instructions unlock different layers of output from the same model. 지시문을 달리하면 동일 모델이 다른 층위의 출력을 냅니다.
① Layout-faithful Markdown / 문서 구조 복원
Convert this document to Markdown preserving the original layout.
- Heading hierarchy as #/##/###
- Tables as HTML <table> including rowspan/colspan
- Footnotes, attachments, signature blocks as separate sections
② Field extraction / 필드 추출 (JSON)
Extract the following fields from this document:
sender_org, recipient, doc_number, effective_date, title, author,
approval_status, attachments
Respond as JSON only.
③ Semantic summary + structure / 의미 요약 + 구조화
Summarize the document in 3 sentences, then list the key items as
hierarchical bullet points grouped by semantic role.
⚙️ Serving Tips / 서빙 팁
- Context length — Short-to-medium documents are in-distribution; for very long documents, page-chunked serving is more stable (tune
--max-model-lenper use case). 학습 맥락은 중·단문 중심 — 매우 긴 문서는 페이지 단위 청킹이 안정적. - Image resolution — The image processor accepts up to
longest_edge = 16,777,216pixels, but practical VRAM-friendly range is longest side 1,792–2,560 px. 실효 해상도는 장변 1,792–2,560px 권장. - Sampling — For structured output,
temperature=0.2–0.4; for number / table fidelity, usetemperature=0.0. 구조화 출력은temperature=0.2~0.4, 표·숫자 정확도 우선이면0.0권장. - MTP —
model-mtp.safetensorscontains Multi-Token Prediction weights. Only enable on engines that support MTP; HF transformers inference does not require it. MTP 가중치는 지원 엔진에서만 활성화 — 일반 transformers 추론에는 불필요.
📉 Limitations / 한계
- v1 experimental checkpoint — Natural images, handwriting, equations, and other domains outside the training mix show higher variance. v1 실험 체크포인트 — 자연 이미지·필기체·수식 등 학습 분포 밖은 편차 큼.
- Potential hallucination — Semantic summarization / structuring can embellish. Cross-check against source for any legal or official use. 환각 가능성 — 구조화·요약에서 표현이 보완될 수 있으므로 법적·공적 용도는 원문 교차 검증 필수.
- PII awareness — Public documents may contain Korean resident IDs, phone numbers, etc. Apply masking in your downstream pipeline. 공공문서에는 주민등록번호·연락처 등이 포함될 수 있으므로 별도 마스킹 적용.
📄 License
Apache-2.0 — see LICENSE or apache.org/licenses/LICENSE-2.0.
Base model usage also follows Qwen/Qwen3.5-27B terms.
📚 Citation / 인용
@misc{sooryeon2026qwen35ocrsftv1,
title = {Qwen3.5-27B OCR SFT v1: Korean Public Documents and English
Academic Papers with Semantic Structuring},
author = {Sooryeon},
year = {2026},
url = {https://huggingface.co/Sooryeon/qwen3.5-27b-ocr-sft-v1}
}
🔗 Links
- Base model:
Qwen/Qwen3.5-27B - This model:
Sooryeon/qwen3.5-27b-ocr-sft-v1
Fine-tuned with care for Korean public-sector and English academic document understanding. 한국 공공문서 및 영어 학술 문서 이해를 위해 세심하게 파인튜닝되었습니다.
- Downloads last month
- 3
Model tree for Sooryeon/qwen3.5-27b-ocr-sft-v1
Base model
Qwen/Qwen3.5-27B