sha-index

AI & ML interests

None defined yet.

Recent Activity

FlameF0X submitted a paper 19 days ago

Triplet-Block Diffusion RWKV

FlameF0X updated a dataset 2 months ago

SHA-index/model-dna-index

FlameF0X updated a Space 2 months ago

SHA-index/Search-SHA

View all activity

posted an update 6 days ago

Post

165

My models on the Intel Low-Bit LLM Leaderboard

Figured I'd share where my quantized models landed on Intel/low_bit_open_llm_leaderboard since I hadn't posted about it yet.

FlameF0X/Qwen3-4B-Distilled-Claude-4.6 (NVFP4 and MXFP4) sit at ranks 23 and 24 with 62.68% and 61.18% average, right below the base Qwen3-4B. Not bad considering they were distilled from Claude 4.6 rather than trained from scratch.

FlameF0X/LFM2.5-1.2B-Distilled-Claude-4.6 and FlameF0X/LFM2.5-1.2B-Thinking-CodeX land around rank 47-49, competitive with MiniCPM5-1B and the Qwen3 sub-1B models despite being a larger base architecture.

The funny one is FlameF0X/Qwen2-0.2B-pt and FlameF0X/Qwen2-0.2B-it. They're not properly trained — genuinely undertrained, basically undefined — and they still beat openai/gpt-oss-20b at rank 66. The 20B model. Not sure what that says but it's something.

FlameF0X/LFM2-Research is at the bottom of my lineup but it's a research artifact, not meant to be competitive.

Chart below showing my models vs nearby competitors, with size vs performance on the left.

Chart made by Claude

1 reply

·

posted an update 14 days ago

Post

7119

MiniMax-M3 coming soon.
https://github.com/MiniMax-AI/MiniMax-M3

submitted a paper to Daily Papers 19 days ago

Triplet-Block Diffusion RWKV

Paper • 2605.25969 • Published 23 days ago • 25

posted an update about 1 month ago

Post

285

I did some testing on the scalability of FWKV. It hits a speed bottleneck at 1B due to the T4’s bandwidth limitations. Theoretically, it should match RWKV’s inference speed if the GPU had more bandwidth. So the 1B size is not accurate.

posted an update about 1 month ago

Post

278

Greetings Hugging Face!

I started a new project called **FWKV** (Feed-forward Weighted Key Value, or Floored Weighted Key Value), a RWKV-style LM that uses FFNNs (Feed-Forward Neural Networks) instead of RNN and floor(W·K·V). I'm hoping to make it much more efficient and scalable than RWKV.

So far I have:

- FlameF0X/FWKV-29M — this one is undertrained and doesn't have a Space yet. In the attached image you can see its speed on a T4 compared to models with the same configuration.

The only model that's fully working right now is:
- FlameF0X/FWKV-TinyStories — trained on TinyStories for one epoch. The demo Space is FlameF0X/FWKV-demo.

2 replies

·

updated a dataset 2 months ago

SHA-index/model-dna-index

Viewer • Updated Apr 7 • 11.3k • 8

updated a Space 2 months ago

Search SHA

Trace model weight origins using SHA256 hashes

in SHA-index/model-dna-index 6 months ago

[bot] Conversion to Parquet

#1 opened 6 months ago by

parquet-converter

updated a Space 6 months ago

SHA-Index

published a Space 6 months ago

SHA-Index

published a dataset 6 months ago

SHA-index/model-dna-index

Viewer • Updated Apr 7 • 11.3k • 8

published a Space 6 months ago

Search SHA

Trace model weight origins using SHA256 hashes

posted an update 10 months ago

Post

4377

I am very sad to say that the budget in creating of SnowflakeCore-G1 1b and 7b MoE models ran out and I can't pre-train them anymore.

7 replies

·

posted an update 10 months ago

Post

821

the training for SnowflakeCore-G1-1B and 7B would be retaken because now I implemented DeepSpeed and management to use two gpus.

posted an update 11 months ago

Post

278

The development of SnowflakeCore-G1-7B-MoE it getting delay. In the mean time I am working on SnowflakeCore-G1-1B-MoE witch would be a pre-train chatbot.

1 reply

·

posted an update 11 months ago

Post

2959

The development of SnowflakeCore-G1-7B-MoE. I can't say when it would be publish yet because it's big and it requires a lot of computational power.

1 reply

·

posted an update 11 months ago

Post

294

I just finished the benchmarks for https://huggingface.co/FlameF0X/SnowflakeCore-G1-Tiny and https://huggingface.co/FlameF0X/SnowflakeCore-G1-Tiny2 in comparation with openai-community/gpt2 .

posted an update 11 months ago

Post

316

Hello! Important announcement, I will rename SnowflakeCore-G1-Medium to SnowflakeCore-G1-Tiny2 because it's going to have the same parameters as the Tiny version, but this one is trained on more data.

1 reply

·

posted an update 11 months ago

Post

747

Currently working on SnowflakeCore-G1-Medium. [Updated loss cruve]

3 replies

·

posted an update 11 months ago

Post

157

Hello there world! I am happy to announce that you now can fine-tune https://huggingface.co/FlameF0X/SnowflakeCore-G1-Tiny , the code for that is in the model card.

I aslo lost the training log 😐