Spaces:

Glint-Research
/

Tiny-ML-Leaderboard

Running

!lm_eval --model hf \
  --model_args pretrained=MihaiPopa-1/CinnabarLM-4M-Base \
  --tasks blimp,arc_easy,wikitext \
  --device cuda \
  --batch_size 8 \
  --output_path results/

gives:

Tasks	Version	Filter	Metric		Value		Stderr
blimp	2	none	acc	↑	0.6287	±	0.0016
arc_easy	1	none	acc	↑	0.2736	±	0.0091
wikitext		none	byte_perplexity	↓	4.6778	±	N/A

CompactAI

GlintResearch org 25 days ago

I need CinnabarLM 1.4M and CinnabarLM 1.5M now

MihaiPopa-1

25 days ago

I need CinnabarLM 1.4M and CinnabarLM 1.5M now

We wait for the numbers as well!

MihaiPopa-1

25 days ago

•

edited 25 days ago

Here are the results for CinnabarLM 1.5M as well!

!lm_eval --model hf \
  --model_args pretrained=MihaiPopa-1/CinnabarLM-1.5M-Base \
  --tasks blimp,arc_easy,wikitext \
  --device cuda \
  --batch_size 256 \
  --output_path results/

gives:

Tasks	Version	Filter	Metric		Value		Stderr
blimp	2	none	acc	↑	0.6051	±	0.0017
arc_easy	1	none	acc	↑	0.2668	±	0.0091
wikitext		none	byte_perplexity	↓	5.0949	±	N/A

MihaiPopa-1

25 days ago

CinnabarLM 1.4M results!

!lm_eval --model hf \
  --model_args pretrained=MihaiPopa-1/CinnabarLM-1.4M-Base \
  --tasks blimp,arc_easy,wikitext \
  --device cuda \
  --batch_size 256 \
  --output_path results/

gives:

Tasks	Version	Filter	Metric		Value		Stderr
blimp	2	none	acc	↑	0.6070	±	0.0017
arc_easy	1	none	acc	↑	0.2458	±	0.0088
wikitext		none	byte_perplexity	↓	4.9808	±	N/A

We can conclude you can merge (but clean it up!)

MihaiPopa-1

25 days ago

So, ALL DONE! You can fix and merge it!

CompactAI

GlintResearch org 25 days ago

thx
im not going to merge this because I am unsure on how to edit PR's lol
but there's going to be a update within like 5 mins

CompactAI

GlintResearch org 25 days ago

you currently lead wikitext perplexity score
did you train on wikitext lol

CompactAI changed pull request status to closed 25 days ago

CompactAI

GlintResearch org 25 days ago

its updated now

CompactAI deleted the refs/pr/5 ref 25 days ago

MihaiPopa-1

25 days ago

No, I didn't train on Wikitext, I trained on FineWeb.

Datdanboi25

25 days ago

Haha almost looks too good to be true

MihaiPopa-1

25 days ago

•

edited 25 days ago

Next LLM to add will be PotentSulfurLM 500K (https://huggingface.co/MihaiPopa-1/PotentSulfurLM-500K-Base)!

It's a LLM with 587K parameters. Trained on ~200M tokens, ~4x the amount used to train CinnabarLM 1.5M!

!lm_eval --model hf \
  --model_args pretrained=MihaiPopa-1/PotentSulfurLM-500K-Base \
  --tasks blimp,arc_easy,wikitext \
  --device cuda \
  --batch_size 1024 \
  --output_path results/

gives:

Tasks	Version	Filter	Metric		Value		Stderr
blimp	2	none	acc	↑	0.5901	±	0.0017
arc_easy	1	none	acc	↑	0.2706	±	0.0091
wikitext	2	none	bits_per_byte	↓	2.6131	±	N/A

MRiabov

25 days ago

Guys, just curious - did you try RL training it? I can't ignore that one wants to train on verifiable rewards. (but yes, perplexity might be too tall yet)

CompactAI unpinned discussion 11 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment