Not sure if it looks good, but we'll see!

#5
No description provided.
GlintResearch org

gimme like 5s to review

OK, we'll give some time to review!

GlintResearch org

Is it possible for you to include any benchmark data?

I don't have any! Maybe I will try to include benchmark data, but I'll need to test the models first! What tool and script did you use?

GlintResearch org

How to evaluate my models?

GlintResearch org

Should explain that in the readme :P
it has plenty of examples

CompactAI pinned discussion

Ok, we wait for the numbers!

Here are the results for CinnabarLM 4M!

!lm_eval --model hf \
  --model_args pretrained=MihaiPopa-1/CinnabarLM-4M-Base \
  --tasks blimp,arc_easy,wikitext \
  --device cuda \
  --batch_size 8 \
  --output_path results/

gives:

Tasks Version Filter n-shot Metric Value Stderr
blimp 2 none 0 acc 0.6287 ± 0.0016
arc_easy 1 none 0 acc 0.2736 ± 0.0091
wikitext none 0 byte_perplexity 4.6778 ± N/A
GlintResearch org

I need CinnabarLM 1.4M and CinnabarLM 1.5M now

I need CinnabarLM 1.4M and CinnabarLM 1.5M now

We wait for the numbers as well!

Here are the results for CinnabarLM 1.5M as well!

!lm_eval --model hf \
  --model_args pretrained=MihaiPopa-1/CinnabarLM-1.5M-Base \
  --tasks blimp,arc_easy,wikitext \
  --device cuda \
  --batch_size 256 \
  --output_path results/

gives:

Tasks Version Filter n-shot Metric Value Stderr
blimp 2 none 0 acc 0.6051 ± 0.0017
arc_easy 1 none 0 acc 0.2668 ± 0.0091
wikitext none 0 byte_perplexity 5.0949 ± N/A

CinnabarLM 1.4M results!

!lm_eval --model hf \
  --model_args pretrained=MihaiPopa-1/CinnabarLM-1.4M-Base \
  --tasks blimp,arc_easy,wikitext \
  --device cuda \
  --batch_size 256 \
  --output_path results/

gives:

Tasks Version Filter n-shot Metric Value Stderr
blimp 2 none 0 acc 0.6070 ± 0.0017
arc_easy 1 none 0 acc 0.2458 ± 0.0088
wikitext none 0 byte_perplexity 4.9808 ± N/A

We can conclude you can merge (but clean it up!)

So, ALL DONE! You can fix and merge it!

GlintResearch org

thx
im not going to merge this because I am unsure on how to edit PR's lol
but there's going to be a update within like 5 mins

GlintResearch org

you currently lead wikitext perplexity score
did you train on wikitext lol
image

CompactAI changed pull request status to closed
GlintResearch org

its updated now

CompactAI deleted the refs/pr/5 ref

No, I didn't train on Wikitext, I trained on FineWeb.

Haha almost looks too good to be true

Next LLM to add will be PotentSulfurLM 500K (https://huggingface.co/MihaiPopa-1/PotentSulfurLM-500K-Base)!

It's a LLM with 587K parameters. Trained on ~200M tokens, ~4x the amount used to train CinnabarLM 1.5M!

!lm_eval --model hf \
  --model_args pretrained=MihaiPopa-1/PotentSulfurLM-500K-Base \
  --tasks blimp,arc_easy,wikitext \
  --device cuda \
  --batch_size 1024 \
  --output_path results/

gives:

Tasks Version Filter n-shot Metric Value Stderr
blimp 2 none 0 acc 0.5901 ± 0.0017
arc_easy 1 none 0 acc 0.2706 ± 0.0091
wikitext 2 none 0 bits_per_byte 2.6131 ± N/A

Guys, just curious - did you try RL training it? I can't ignore that one wants to train on verifiable rewards. (but yes, perplexity might be too tall yet)

CompactAI unpinned discussion

Sign up or log in to comment