QASem Hebrew LoRA Adapter (DictaLM 2.0)

This repository provides a LoRA adapter for Hebrew QA-based semantic parsing (QASem).

Overview

This repository provides a LoRA adapter for performing QA-based semantic parsing (QASem) in Hebrew.

QASem represents predicate–argument structure using natural-language question–answer pairs, rather than predefined semantic role labels. This makes the representation more interpretable and flexible across languages.

The adapter is built on top of:

Base model: dicta-il/dictalm2.0-instruct

and enables efficient semantic parsing using parameter-efficient fine-tuning (LoRA).

✨ Why this model matters

Traditional semantic role labeling methods rely on fixed label schemas and costly expert annotation.

This model takes a different approach by:

  • Representing semantics using natural-language question–answer pairs
  • Enabling automatic dataset construction via cross-lingual projection
  • Supporting scalable semantic parsing across languages
  • Achieving strong performance with efficient fine-tuned models

This makes it possible to build semantic parsers for new languages with minimal cost.

Use Cases

This model can be used for:

  • Research in QA-based semantic parsing (QASem) and semantic representation learning
  • Extraction of predicate–argument structures from Hebrew text
  • Automatic dataset creation for training semantic models in new languages
  • Downstream NLP applications such as:
    • Information extraction
    • Text understanding
    • Factuality and attribution evaluation

Language

  • Hebrew 🇮🇱

Training Data

The model was trained on the Multilingual QASem Dataset:

👉 https://huggingface.co/datasets/biu-nlp/Multilingual_QASem_Datasets

The dataset includes:

  • Automatically generated QASem annotations
  • Train / Development / Test splits
  • Multiple languages: French, Hebrew, Russian
  • Tens of thousands of QA pairs per language

The data was constructed using a cross-lingual projection approach, ensuring scalability across languages.

📄 Associated Work

This model and the underlying dataset are introduced in: Effective QA-Driven Annotation of Predicate-Argument Relations Across Languages.

The paper presents the full methodology, dataset construction process, and evaluation across multiple languages.

🚀 Quick Start (Recommended)

Using the XQASem Parser

For a simple and structured interface, you can use the XQASem parser.

Installation

pip install xqasem

Basic Example

from xqasem import XQasemParser

parser = XQasemParser.from_language("he")

sentences = [
    "המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות."
]

df = parser(sentences)

print(df)

Output Format

The model produces structured predicate–argument representations in the form of:

  • A predicate (verb or nominal)
  • A natural-language question
  • A corresponding answer span from the sentence

This structure can be easily converted into tabular or JSON format for downstream use.

Example Output

sentence predicate predicate_type question answer
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. הדגישו verb מי הדגיש משהו? המומחים
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. הדגישו verb מה מישהו הדגיש? שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. מאיץ verb מה מאיץ משהו? האלגוריתם החדש
המומחים הדגישו שהאלגוריתם החדש מאיץ משמעותית את עיבוד הבקשות המורכבות. מאיץ verb מה משהו מאיץ? את עיבוד הבקשות המורכבות

👉 For more details and advanced usage, see the project repository:
https://github.com/JohnnieDavidov/xqasem

Manual Model Loading (Advanced)

from transformers import AutoTokenizer

from peft import AutoPeftModelForCausalLM
model_id = "YonatanDavidov/qasem-he-dictalm2-lora"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoPeftModelForCausalLM.from_pretrained(model_id)

Limitations

  • Performance may degrade on out-of-domain text
  • Complex or ambiguous predicates may lead to inconsistent outputs
  • The model is optimized for QASem-style generation and not for general-purpose text generation

📄 Citation

If you use this model, please cite our work:

@inproceedings{davidov-etal-2026-effective,
    title = "Effective {QA}-Driven Annotation of Predicate{--}Argument Relations Across Languages",
    author = "Davidov, Jonathan  and
      Slobodkin, Aviv  and
      Klein, Shmuel Tomi  and
      Tsarfaty, Reut  and
      Dagan, Ido  and
      Klein, Ayal",
    editor = "Demberg, Vera  and
      Inui, Kentaro  and
      Marquez, Llu{\'i}s",
    booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 1: Long Papers)",
    month = mar,
    year = "2026",
    address = "Rabat, Morocco",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2026.eacl-long.112/",
    doi = "10.18653/v1/2026.eacl-long.112",
    pages = "2484--2502",
    ISBN = "979-8-89176-380-7",
}
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for YonatanDavidov/qasem-he-dictalm2-lora

Adapter
(2)
this model

Dataset used to train YonatanDavidov/qasem-he-dictalm2-lora