Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
Omartificial-Intelligence-Space 's Collections
Arab-Culture-Aligned Multimodal Embedding Models & Datasets
Arabic Semantic Embeddings
Saudi Dialect Sentence Embedding Models Collection
SHAMIYAT: A Collection of Syrian Dialect Datasets & LLMs
DIRA – Diraya Arabic Reasoning AI
Arabic Matryoshka & GATE Embedding Models
Arabic NLI & Semantic Similarity Datasets
Arabic Re-Ranking Hub
AraEuroBERT
Arabic ModernBERT
ArabianLLM Series
Arabic LLAMA3 & 3.1 FineTuned Models
Huggingface FineWeb2 Arabic Dataset Portions

Huggingface FineWeb2 Arabic Dataset Portions

updated Nov 28, 2025

Collection of a comprehensive dataset of Arabic text sourced from the FineWeb2 project, representing diverse content across Arabic MSA and Dialect.

Upvote
1

  • HuggingFaceFW/fineweb-2

    Viewer • Updated Oct 27, 2025 • 4.48B • 55.7k • 806

    Note This is the Original Repo for FineWeb2 include 1000s languages. Fine the Arabic Portions below


  • Omartificial-Intelligence-Space/FineWeb2-MSA

    Viewer • Updated Dec 15, 2024 • 907M • 123 • 2

  • Omartificial-Intelligence-Space/FineWeb2-Egyptian-Arabic

    Viewer • Updated Dec 12, 2024 • 23.9M • 92 • 2

  • Omartificial-Intelligence-Space/FineWeb2-Moroccan-Arabic

    Viewer • Updated Dec 12, 2024 • 69.6M • 335 • 3

  • Omartificial-Intelligence-Space/FineWeb2-North-Levantine-Arabic

    Viewer • Updated Dec 12, 2024 • 223k • 43 • 2

  • Omartificial-Intelligence-Space/FineWeb2-Najdi-Arabic

    Viewer • Updated Dec 12, 2024 • 48.4M • 68 • 3
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs