QASem Hebrew LoRA Adapter (DictaLM 2.0)

This repository provides a LoRA adapter for Hebrew QA-based semantic parsing (QASem).

Overview

This repository provides a LoRA adapter for performing QA-based semantic parsing (QASem) in Hebrew.

QASem represents predicate–argument structure using natural-language question–answer pairs, rather than predefined semantic role labels. This makes the representation more interpretable and flexible across languages.

The adapter is built on top of:

Base model: dicta-il/dictalm2.0-instruct

and enables efficient semantic parsing using parameter-efficient fine-tuning (LoRA).

✨ Why this model matters

Traditional semantic role labeling methods rely on fixed label schemas and costly expert annotation.

This model takes a different approach by:

Representing semantics using natural-language question–answer pairs
Enabling automatic dataset construction via cross-lingual projection
Supporting scalable semantic parsing across languages
Achieving strong performance with efficient fine-tuned models

This makes it possible to build semantic parsers for new languages with minimal cost.

Use Cases

This model can be used for:

Research in QA-based semantic parsing (QASem) and semantic representation learning
Extraction of predicate–argument structures from Hebrew text
Automatic dataset creation for training semantic models in new languages
Downstream NLP applications such as:
- Information extraction
- Text understanding
- Factuality and attribution evaluation

Language

Hebrew 🇮🇱

Training Data

The model was trained on the Multilingual QASem Dataset:

👉 https://huggingface.co/datasets/biu-nlp/Multilingual_QASem_Datasets

The dataset includes:

Automatically generated QASem annotations
Train / Development / Test splits
Multiple languages: French, Hebrew, Russian
Tens of thousands of QA pairs per language

The data was constructed using a cross-lingual projection approach, ensuring scalability across languages.

📄 Associated Work

This model and the underlying dataset are introduced in: Effective QA-Driven Annotation of Predicate-Argument Relations Across Languages.

The paper presents the full methodology, dataset construction process, and evaluation across multiple languages.

🚀 Quick Start (Recommended)

Using the XQASem Parser

For a simple and structured interface, you can use the XQASem parser.

Installation

pip install xqasem

Basic Example

from xqasem import XQasemParser

parser = XQasemParser.from_language("he")

sentences = [
    "המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות."
]

df = parser(sentences)

print(df)

Output Format

The model produces structured predicate–argument representations in the form of:

A predicate (verb or nominal)
A natural-language question
A corresponding answer span from the sentence

This structure can be easily converted into tabular or JSON format for downstream use.

Example Output

sentence	predicate	predicate_type	question	answer
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות.	הדגישו	verb	מי הדגיש משהו?	המומחים
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות.	הדגישו	verb	מה מישהו הדגיש?	שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות.	מאיץ	verb	מה מאיץ משהו?	האלגוריתם החדש
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות.	מאיץ	verb	מה משהו מאיץ?	את עיבוד הבקשות המורכבות

👉 For more details and advanced usage, see the project repository:
https://github.com/JohnnieDavidov/xqasem

Manual Model Loading (Advanced)

from transformers import AutoTokenizer

from peft import AutoPeftModelForCausalLM
model_id = "YonatanDavidov/qasem-he-dictalm2-lora"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoPeftModelForCausalLM.from_pretrained(model_id)

Limitations

Performance may degrade on out-of-domain text
Complex or ambiguous predicates may lead to inconsistent outputs
The model is optimized for QASem-style generation and not for general-purpose text generation

📄 Citation

If you use this model, please cite our work:

@inproceedings{davidov-etal-2026-effective,
    title = "Effective {QA}-Driven Annotation of Predicate{--}Argument Relations Across Languages",
    author = "Davidov, Jonathan  and
      Slobodkin, Aviv  and
      Klein, Shmuel Tomi  and
      Tsarfaty, Reut  and
      Dagan, Ido  and
      Klein, Ayal",
    editor = "Demberg, Vera  and
      Inui, Kentaro  and
      Marquez, Llu{\'i}s",
    booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 1: Long Papers)",
    month = mar,
    year = "2026",
    address = "Rabat, Morocco",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2026.eacl-long.112/",
    doi = "10.18653/v1/2026.eacl-long.112",
    pages = "2484--2502",
    ISBN = "979-8-89176-380-7",
}

Downloads last month: 3

Model tree for YonatanDavidov/qasem-he-dictalm2-lora

Base model

dicta-il/dictalm2.0

Finetuned

dicta-il/dictalm2.0-instruct

Adapter

(2)

this model

YonatanDavidov
/

qasem-he-dictalm2-lora