AI & ML interests

None defined yet.

Recent Activity

FlameF0Xย  submitted a paper 19 days ago
Triplet-Block Diffusion RWKV
FlameF0Xย  updated a dataset 2 months ago
SHA-index/model-dna-index
FlameF0Xย  updated a Space 2 months ago
SHA-index/Search-SHA
View all activity

FlameF0Xย 
posted an update 6 days ago
view post
Post
165

My models on the Intel Low-Bit LLM Leaderboard

Figured I'd share where my quantized models landed on Intel/low_bit_open_llm_leaderboard since I hadn't posted about it yet.

FlameF0X/Qwen3-4B-Distilled-Claude-4.6 (NVFP4 and MXFP4) sit at ranks 23 and 24 with 62.68% and 61.18% average, right below the base Qwen3-4B. Not bad considering they were distilled from Claude 4.6 rather than trained from scratch.

FlameF0X/LFM2.5-1.2B-Distilled-Claude-4.6 and FlameF0X/LFM2.5-1.2B-Thinking-CodeX land around rank 47-49, competitive with MiniCPM5-1B and the Qwen3 sub-1B models despite being a larger base architecture.

The funny one is FlameF0X/Qwen2-0.2B-pt and FlameF0X/Qwen2-0.2B-it. They're not properly trained โ€” genuinely undertrained, basically undefined โ€” and they still beat openai/gpt-oss-20b at rank 66. The 20B model. Not sure what that says but it's something.

FlameF0X/LFM2-Research is at the bottom of my lineup but it's a research artifact, not meant to be competitive.

Chart below showing my models vs nearby competitors, with size vs performance on the left.

Chart made by Claude
  • 1 reply
ยท
FlameF0Xย 
posted an update 14 days ago
FlameF0Xย 
posted an update about 1 month ago
view post
Post
285
I did some testing on the scalability of FWKV. It hits a speed bottleneck at 1B due to the T4โ€™s bandwidth limitations. Theoretically, it should match RWKVโ€™s inference speed if the GPU had more bandwidth. So the 1B size is not accurate.
FlameF0Xย 
posted an update about 1 month ago
view post
Post
278
Greetings Hugging Face!

I started a new project called **FWKV** (Feed-forward Weighted Key Value, or Floored Weighted Key Value), a RWKV-style LM that uses FFNNs (Feed-Forward Neural Networks) instead of RNN and floor(WยทKยทV). I'm hoping to make it much more efficient and scalable than RWKV.

So far I have:

- FlameF0X/FWKV-29M โ€” this one is undertrained and doesn't have a Space yet. In the attached image you can see its speed on a T4 compared to models with the same configuration.

The only model that's fully working right now is:
- FlameF0X/FWKV-TinyStories โ€” trained on TinyStories for one epoch. The demo Space is FlameF0X/FWKV-demo.
  • 2 replies
ยท
FlameF0Xย 
updated a Space 6 months ago
FlameF0Xย 
published a Space 6 months ago
FlameF0Xย 
posted an update 10 months ago
view post
Post
4377
I am very sad to say that the budget in creating of SnowflakeCore-G1 1b and 7b MoE models ran out and I can't pre-train them anymore.
  • 7 replies
ยท
FlameF0Xย 
posted an update 10 months ago
view post
Post
821
the training for SnowflakeCore-G1-1B and 7B would be retaken because now I implemented DeepSpeed and management to use two gpus.
FlameF0Xย 
posted an update 11 months ago
view post
Post
278
The development of SnowflakeCore-G1-7B-MoE it getting delay. In the mean time I am working on SnowflakeCore-G1-1B-MoE witch would be a pre-train chatbot.
  • 1 reply
ยท
FlameF0Xย 
posted an update 11 months ago
view post
Post
2959
The development of SnowflakeCore-G1-7B-MoE. I can't say when it would be publish yet because it's big and it requires a lot of computational power.
  • 1 reply
ยท
FlameF0Xย 
posted an update 11 months ago
FlameF0Xย 
posted an update 11 months ago
view post
Post
316
Hello! Important announcement, I will rename SnowflakeCore-G1-Medium to SnowflakeCore-G1-Tiny2 because it's going to have the same parameters as the Tiny version, but this one is trained on more data.
  • 1 reply
ยท
FlameF0Xย 
posted an update 11 months ago
view post
Post
747
Currently working on SnowflakeCore-G1-Medium. [Updated loss cruve]
  • 3 replies
ยท
FlameF0Xย 
posted an update 11 months ago