Open LLM Leaderboard
Track, rank and evaluate open LLMs and chatbots
Track, rank and evaluate open LLMs and chatbots
Embedding Leaderboard
Compare LLM hardware performance and find the best model
Explore and compare speech recognition model benchmarks
Explore and compare code model performance on a leaderboard
View the LMArena leaderboard in fullβscreen
Request evaluation for a new model
Display leaderboard of language models
Submit model evaluation results to leaderboard
Serve a web page from a Flask server
Browse and compare AI model evaluations
View and submit LLM evaluations
Submit model evaluations and view the leaderboard
Compare model answers to questions in French
Explore and filter LLM benchmark results
Submit video model evaluation results to a public benchmark
Launch a Streamlit web app interface
Evaluate LLMs' cybersecurity risks and capabilities
Submit and evaluate models for contextual understanding tasks
Search for model performance across languages and benchmarks
Explore and submit LLM benchmarks
VLMEvalKit Evaluation Results Collection
Explore and compare model scores on RewardBench benchmarks
Jailbreak the LLM and privacy guardrails
Filter data on contamination in datasets and models
Track, rank and evaluate open Arabic LLMs and chatbots
Explore and compare QA and long doc benchmarks
Submit and evaluate model results on MM-UPD benchmarks
Explore code-generation model leaderboards and task details
Evaluate open LLMs in the languages of LATAM and Spain.
Explore and compare LLM performance on financial benchmarks