DiffusionGemma-26B-A4B

by Austriani - opened 16 days ago

Discussion

Austriani

16 days ago

Hello, Ready.Art! I used your Melody models and liked its prose. I want to ask you if you going to create finetune like that for DiffusionGemma architecture, as I heard it got day-one support.

Also I would like to ask you a few other questions:
I noticed that speed while using Gemma-4-12B in IQ4_XS_HB quantization is lower than using NetherMoon-12B (other roleplay merge), is it Gemma or quantization problem?
I want to know recommended samplers if possible, as when using Gemma-4-12B-Melody1437, I have noticed unneeded extension of model message, overextending its empressions, thoughts and etc., while focusing only up to 10% to its "voice". Is it samplers or model problems, if samplers one, I would like to have full list of recommend by you samplers (temp, top-k, top-p min-p, rep. pen. and etc.).

Thanks!

FrenzyBiscuit

Ready.Art org 16 days ago

IQ4_XS is a major PITA and breaks thinking on the 31B models.

I’ll likely pull the quant.

What are your samplers, and what is your prompts?

I’ll be honest I haven’t tested the 12B much and only tuned it because of demand. I vastly prefer 26B and 31B. The 12B is known to have some intelligent issues on my models

Austriani

16 days ago

•

edited 16 days ago

IQ4_XS is a major PITA and breaks thinking on the 31B models.

I’ll likely pull the quant.

What are your samplers, and what is your prompts?

I’ll be honest I haven’t tested the 12B much and only tuned it because of demand. I vastly prefer 26B and 31B. The 12B is known to have some intelligent issues on my models

My samplers are next:
temp - 0.9
top-k - 40
top-p - 0.85
min-p - 0.02
rep. pen. - 1.02
rep. pen. range - 1024
DRY:
Multiplier - 0.85
Base - 1.75
Allowed length - 2
Pen. range - 0
Adaptive-P and XTC set to default
Response (tokens) - 150
Context - 16384

Prompt:
Write {{char}}'s next reply in a fictional chat between {{char}} and {{user}}.
Maximum of 4 sentences per paragraph. 2 paragraphs per message.

Maybe sometimes adding something like "write in realistic voice", "be short" and etc.

Model still writes too much, it always tries to write 2 paragraphs, usually using 5+ sentences in each, which of course causes response to be abrubted as response limit is hit. I tried with all of my characters, while every other model works well, this one doesn't.

Additional information: I using model without thinking, using ik_llama.cpp

And if possible I still want to know if you are going to create finetunes for DiffusionGemma.

FrenzyBiscuit

Ready.Art org 16 days ago

•

edited 16 days ago

The model was trained on multi turn conversations (2 q/a pair) and as we found early on such training does make the model reply longer.

For shorter replies the model needs to be trained on 1 q/a pair.

Not much you’re going to be able to do about it.

Yes I am looking at training that model but I’m waiting for axolotol support to mature

FrenzyBiscuit

Ready.Art org 16 days ago

After reviewing this in more detail, pretty sure this is all the training data.

I'll try to include longer conversations in the training data for Dark Scarlett. Currently each Q/A pair is mostly pure detail, followed by 1-2 sentences from the AI in conversation. So what you're seeing tracks.

I don't personally mind this kind of output, but I'll try to be mindful of other users.

FrenzyBiscuit

Ready.Art org 16 days ago

Try dark scarlett. It should talk more.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment