---
language:
- ar
pipeline_tag: feature-extraction
---
We have successfully trained a [FastText](https://fasttext.cc/)-based Word2Vec model on our dataset, utilizing an embedding size of 100 dimensions.  
This model is designed to generate vector representations for individual words and sub-words, allowing it to effectively capture semantic and morphological relationships within the text.  \
To obtain representations at the sentence level, we compute embeddings for all constituent words and sub-words in a given text and then apply averaging.  
This approach ensures that the resulting sentence embedding encapsulates the overall meaning and preserving contextual nuances.