--- license: apache-2.0 library_name: transformers tags: - multimodal - vision-language - code-generation - tikz - geometric-reasoning - computer-vision - cvpr2026 - internvl - internlm2 - instruction-tuning datasets: - SJY-1995/GeoTikz-Base - SJY-1995/GeoTikz-Instruct model-index: - name: GeoTikzBridge-Instruct-8B results: - task: type: image-to-text name: Instruction-Guided Geometric Code Generation dataset: name: GeoTikz-Instruct type: SJY-1995/GeoTikz-Instruct metrics: - type: CLIP-S value: 99.2 name: CLIP-S --- # GeoTikzBridge-Instruct-8B ## Model Overview GeoTikzBridge-Instruct-8B is the instruction-tuned variant of the GeoTikzBridge series, proposed in the **CVPR 2026** accepted paper *GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning*. Built on GeoTikzBridge-Base-8B, this model is further fine-tuned on the 419k-scale GeoTikz-Instruct dataset, enabling strong instruction-following capabilities for geometric tasks. Beyond basic image-to-TikZ conversion, it supports instruction-guided auxiliary line generation, interactive geometric modification, and step-by-step geometric reasoning, making it a powerful tool for educational and research scenarios requiring interactive geometric manipulation. ## Model Details ### Core Architecture - Backbone Foundation: Initialized from GeoTikzBridge-Base-8B (InternVL2 8B with InternLM2 7B backbone), inheriting its strong geometric perception and code generation capabilities. - Parameter Scale: 8 billion parameters, maintaining efficient inference while supporting complex instruction understanding. - Instruction Tuning: Fine-tuned on a diverse set of geometric instruction-response pairs, enabling the model to understand and execute natural language instructions for geometric figure manipulation. ### Core Capabilities 1. **Instruction-Guided TikZ Generation**: Generates TikZ code based on natural language instructions (e.g., "Draw a right triangle with a height of 5cm and label the right angle"). 2. **Auxiliary Line Generation**: Adds auxiliary lines (e.g., perpendicular bisectors, angle bisectors, medians) to existing geometric figures as instructed, supporting geometric problem-solving. 3. **Interactive Geometric Modification**: Modifies existing geometric figures (e.g., resizing, rotating, adding/removing elements) according to user instructions. 4. **Basic Geometric Reasoning**: Provides step-by-step geometric reasoning processes (in text form) alongside TikZ code generation for simple geometric problems. ## Intended Use & Limitations ### Intended Use Cases - Core Scenarios: Interactive geometric teaching aids, step-by-step geometry problem-solving assistance, dynamic geometric illustration generation for educational materials, and research on geometric reasoning with multimodal models. - Research Purposes: Serves as a baseline for instruction-tuned multimodal code generation and geometric reasoning research. - Downstream Expansion: Can be integrated into educational platforms or geometric drawing tools to provide intelligent, interactive support. ### Out-of-Scope Use Cases - Non-geometric image manipulation or code generation. - High-precision engineering drawing generation requiring professional CAD software. - Solving advanced mathematical proofs or complex geometric problems beyond the scope of plane geometry. ### Model Limitations - The model primarily understands and executes instructions in English; instructions in other languages may lead to suboptimal results. - While it can generate reasoning text for simple problems, it is not a substitute for professional mathematical proof systems. - Complex or ambiguous instructions may require multiple rounds of clarification to achieve the desired result. ## Quick Start ### Environment Setup Install basic dependencies: ```bash pip install transformers torch pillow accelerate ``` For full training/inference dependencies, please refer to the official project repository: [GeoTikzBridge GitHub](sslocal://flow/file_open?url=https%3A%2F%2Fgithub.com%2Fsjy-1995%2FGeoTikzBridge%2F&flow_extra=eyJsaW5rX3R5cGUiOiJjb2RlX2ludGVycHJldGVyIn0=) ### Inference Example Quickly load the model for instruction-guided geometric code generation: ```python from transformers import AutoProcessor, AutoModelForCausalLM import torch from PIL import Image # Load model and processor model_name = "SJY-1995/GeoTikzBridge-Instruct-8B" processor = AutoProcessor.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto" ) # Load input geometric image (optional, depending on the instruction) # If the instruction requires modifying an existing figure, load the image here # image = Image.open("existing_geometric_figure.png").convert("RGB") # If generating a new figure from scratch, you can use a placeholder or omit the image (depending on model input requirements) # Build an instruction-guided prompt prompt = "" # inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device) # Generate TikZ code with torch.no_grad(): output = model.generate( **inputs, max_new_tokens=4096, temperature=0.2, top_p=0.95, do_sample=False ) # Decode and output the generated result tikz_code = processor.decode(output[0], skip_special_tokens=True) print("Generated TikZ Code:\n", tikz_code) ``` ## Training Details ### Training Dataset The model is initialized from GeoTikzBridge-Base-8B and further fine-tuned on the **GeoTikz-Instruct** dataset, which contains approximately 419k high-quality instruction-geometric response pairs. The dataset covers diverse instruction types, including figure generation, auxiliary line addition, figure modification, and basic reasoning. Dataset Links: - GeoTikz-Base: [SJY-1995/GeoTikz-Base](sslocal://flow/file_open?url=https%3A%2F%2Fhuggingface.co%2Fdatasets%2FSJY-1995%2FGeoTikz-Base&flow_extra=eyJsaW5rX3R5cGUiOiJjb2RlX2ludGVycHJldGVyIn0=) - GeoTikz-Instruct: [SJY-1995/GeoTikz-Instruct](sslocal://flow/file_open?url=https%3A%2F%2Fhuggingface.co%2Fdatasets%2FSJY-1995%2FGeoTikz-Instruct&flow_extra=eyJsaW5rX3R5cGUiOiJjb2RlX2ludGVycHJldGVyIn0=) ### Key Training Hyperparameters (Refer to the paper or official repository for detailed Instruct-version hyperparameters; the following are illustrative.) | Hyperparameter | Configuration | |----------------|---------------| | Global Batch Size | 64 | | Peak Learning Rate | 2e-7 | | Training Epochs | 2 | | Max Sequence Length | 12800 | | Training Precision | BF16 | ### Training Framework & Scripts Refer to the official project repository for training details: [GeoTikzBridge GitHub](sslocal://flow/file_open?url=https%3A%2F%2Fgithub.com%2Fsjy-1995%2FGeoTikzBridge%2F&flow_extra=eyJsaW5rX3R5cGUiOiJjb2RlX2ludGVycHJldGVyIn0=) ## Model Family | Model Name | Parameter Size | Core Capability | Model Link | |------------|----------------|-----------------|------------| | GeoTikzBridge-Base-8B | 8B | Basic geometric image-to-TikZ code generation | [🤗 Hugging Face](sslocal://flow/file_open?url=https%3A%2F%2Fhuggingface.co%2FSJY-1995%2FGeoTikzBridge-Base-8B&flow_extra=eyJsaW5rX3R5cGUiOiJjb2RlX2ludGVycHJldGVyIn0=) | | GeoTikzBridge-Base-38B | 38B | High-precision complex geometric figure TikZ code generation | [🤗 Hugging Face](sslocal://flow/file_open?url=https%3A%2F%2Fhuggingface.co%2FSJY-1995%2FGeoTikzBridge-Base-38B&flow_extra=eyJsaW5rX3R5cGUiOiJjb2RlX2ludGVycHJldGVyIn0=) | | GeoTikzBridge-Instruct-8B | 8B | Instruction following, auxiliary line generation, interactive geometric reasoning | [🤗 Hugging Face](sslocal://flow/file_open?url=https%3A%2F%2Fhuggingface.co%2FSJY-1995%2FGeoTikzBridge-Instruct-8B&flow_extra=eyJsaW5rX3R5cGUiOiJjb2RlX2ludGVycHJldGVyIn0=) | ## Citation If you use this model, related datasets or code in your research or projects, please cite the following paper: ```bibtex @inproceedings{ geotikzbridge, title={GeoTikzBridge: Advancing Multimodal Code Generation for Geometric Perception and Reasoning}, author={Jiayin Sun and Caixia Sun and Boyu Yang and Hailin Li and Xiao Chen and Yi Zhang and Errui Ding and Liang Li and Chao Deng and Junlan Feng}, booktitle={2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year={2026} } ```