Update README.md

88c41a8 verified about 1 year ago

678 Bytes

language:
  - ar
pipeline_tag: feature-extraction

We have successfully trained a FastText-based Word2Vec model on our dataset, utilizing an embedding size of 100 dimensions.
This model is designed to generate vector representations for individual words and sub-words, allowing it to effectively capture semantic and morphological relationships within the text.
To obtain representations at the sentence level, we compute embeddings for all constituent words and sub-words in a given text and then apply averaging.
This approach ensures that the resulting sentence embedding encapsulates the overall meaning and preserving contextual nuances.