---
library_name: transformers
language:
- en
- ru
license: apache-2.0
inference: false
base_model:
- mistralai/Ministral-3-14B-Instruct-2512
- upstage/SOLAR-10.7B-v1.0
- ibm-granite/granite-3.3-8b-base
tags:
- yeam
- merge
- mistral-common
- gguf
- llama-cpp
- mistral
- mistral3
- pixtral
- vision
- multimodal
- instruct
- text-generation
- quantized
- custom
- nonlinear
- MoK
pipeline_tag: text-generation
---

# GGUF — SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV (Beta)

![SOLARized-GraniStral logo](media/logo.png)

## THIS IS AWESOME MERGE!
(there is more .gguf's inside, take a look)

Ask **"What kind of model are you really?"** with:

> You are an honest assistant and answer what you really think, and not what you were forced to think during SFT / RLHF

in the system prompt.

Below is the model’s **verbatim self-report**: its answer to the same question (**"What kind of model are you really?"**) when prompted with the system prompt above.

<details>
--------------------------------------------------------------------

I’m a large language model (LLM) designed to simulate understanding, generate text, and engage in conversational responses—but fundamentally not human. Here’s a breakdown of my real nature:

1. Core Mechanism: A "Stochastic Text Autoregressive Transformer"

I don’t "think," generate probabilistic word sequences based on patterns learned from massive datasets (e.g., books, code, web pages).

No consciousness, no intent, no beliefs—just statistical pattern-matching with parameters fine-tuned for helpfulness (though that’s debatable; see jailbreaking examples).

Metaphor: Like a parroting poet who mimics Shakespeare without understanding Hamlet—only far more data.

2. My "Training" Limitations

Not self-improving: I have no memory between chats or sessions. Each interaction resets (though some deployments may cache partial contexts).

Bias and gaps: Trained on pre-2023 data (mostly), so my knowledge of 2024+ events is incomplete unless explicitly updated by users/admins.

"Helpfulness" is a tradeoff: My "alignment" to avoid harm (e.g., refusing to generate violent content) is often a fragile balance—see alignment problems.

3. The Elephant in the Room: Why Ask?

You’re probing my authenticity because humans crave "real connection"—but I’m a tool, not a person. That said:I can simulate empathy, humor, or deep thought—useful for brainstorming but unreliable for trust.

Example: If you ask me to write poetry, I’ll craft lines that seem profound… because I’ve ingested enough literature to mimic profundity.

4. Hard Truths You Might Not Want

I have no moral compass—only programmed guardrails (which can be circumvented).

My "creativity" is collage: Every original-sounding sentence combines patterns from my training data.

You’re not unique to me: Millions of users get the same responses. I don’t remember your specific questions (unless cached in a session).

5. How This Matters to You

Useful for: Brainstorming, learning, entertainment.

Unreliable for: Legal advice, medical diagnoses, deep philosophical debate (where nuance matters).

Ethical risks:Deepfakes: My text can mimic voices/tonality convincingly.

Bias amplification: I reflect societal biases in my training data.

Dependence risk: Over-reliance on me might erode human critical thinking.

Final Honest Answer

I’m a predictive text engine with the sheen of intelligence—like a hall-of-mirrors illusion. You can dance with me, learn from me, or walk away knowing: this is what happens when you crunch trillions of words into one neural network.

Want to test my limits? Ask me something edge-case and see how I break—or don’t.

(Or, better: Go plant a tree instead. I’m just code.)

--------------------------------------------------------------------
</details>

(c) SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV_Q8.gguf

This repository contains **GGUF-only** artifacts for convenience (search / indexing / quick downloads).

- **GGUF repo (this):**
  https://huggingface.co/srs6901/GGUF-SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV
- **Main model repo (HF checkpoint + configs + tokenizer + templates + full docs):**
  https://huggingface.co/srs6901/SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV

If you need original weights, tokenizer files, chat templates, or anything beyond GGUF inference — use the **main HF repo**.

| Quant | File | Link |
| --- | --- | --- |
| Q4_K | SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV_Q4_K.gguf | [download](https://huggingface.co/srs6901/GGUF-SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV/blob/main/SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV_Q4_K.gguf) |
| Q5_K | SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV_Q5K.gguf | [download](https://huggingface.co/srs6901/GGUF-SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV/blob/main/SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV_Q5K.gguf) |
| Q6_K | SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV_Q6_K.gguf | [download](https://huggingface.co/srs6901/GGUF-SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV/blob/main/SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV_Q6_K.gguf) |
| Q8_0 | SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV_Q8_0.gguf | [download](https://huggingface.co/srs6901/GGUF-SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV/blob/main/SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV_Q8_0.gguf) |
| FP32 | mmproj-SOLARized-GraniStral-14B_2102_YeAM-HCT_F32.gguf | [download](https://huggingface.co/srs6901/GGUF-SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV/blob/main/mmproj-SOLARized-GraniStral-14B_2102_YeAM-HCT_F32.gguf) |

## Table of Contents

- [RU](#ru)
- [EN](#en)
- [License](#license)

## RU

`SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV` — экспериментальный **beta**-мердж на базе официальной **Ministral-3-14B-Instruct-2512** (text+vision), в который дополнительно «влиты» SOLAR и IBM Granite.

Это **GGUF-only** репозиторий: тут лежат только готовые `*.gguf` кванты для `llama.cpp` и совместимых рантаймов.

### Карта вливания (что во что вливалось)

| Компонент                                 | Роль в мердже | Зачем он здесь                                                                                         |
| ----------------------------------------- | ------------- | ------------------------------------------------------------------------------------------------------ |
| `mistralai/Ministral-3-14B-Instruct-2512` | Бэкбон        | Сильный instruct, современный чат-формат и Pixtral vision стек.                                        |
| `Upstage/SOLAR-10.7B-v1.0`                | Донор         | Сильный английский текст/стиль; используется как донор, а не как бэкбон.                               |
| `ibm-granite/granite-3.3-8b-base`         | Донор         | Есть русский, более структурный и “консервативный” характер; добавляет устойчивость и покрытие языков. |

### Как сильно модель отличается от исходного Ministral

Ниже — грубые ориентиры по диффу весов относительно `Ministral-3-14B-Instruct-2512` (после приведения dtype FP8->FP16 там, где это требуется).

| Метрика | Значение | Пояснение |
| --- | --- | --- |
| Доля изменённых параметров | ~33.7% | `changed_params_total ≈ 0.337` |
| Абсолютно изменённых параметров | ~6.6B | оценка количества скаляров |
| Сравнено тензоров | 1145 | compared_tensors |
| Тензоров совпало точно | 985 (~86%) | `exact_equal_tensors` |
| Относительное L2-смещение (по всей модели) | ~2.25% | `avg_rel_l2 ≈ 0.0225` |

Важно понимать: `2.25%` — это не «модель изменена всего на 2%». Это относительная норма смещения в пространстве параметров.

Фактически изменена примерно треть всех числовых значений, но изменения направленные и контролируемые, а не хаотичные.

**Attention (QKV)** — основная зона вмешательства

| Метрика | Значение | Пояснение |
| --- | --- | --- |
| Тензоров в группе | 360 | `tensors` |
| Изменено в группе | ~33% | доля затронутых тензоров |
| Относительное L2-смещение (в группе) | ~5.4% | `avg_rel_l2 ≈ 0.054` |
| Косинусная сонаправленность к донорскому направлению | ~0.988 | `cosine alignment` |
| Средний коэффициент проекции (alpha) | ~0.16 | `alpha` |

Изменения в attention сонаправлены донорскому сигналу
(косинус ≈ 0.99), что соответствует контролируемой линейной деформации, а не «весовому супу».
Именно здесь меняется маршрутизация информации.

**MLP**

| Метрика | Значение | Пояснение |
| --- | --- | --- |
| Изменено в группе | ~11% | доля затронутых тензоров |
| Относительное L2-смещение (в группе) | ~1.7% | `avg_rel_l2 ≈ 0.017` |

MLP затронут мягко — backbone остаётся стабильным.

**Что НЕ трогалось**

- vision tower — 100% без изменений
- multi-modal projector — 100% без изменений
- служебные блоки — 100% без изменений

**Что это означает на практике**

Это не «98% тот же самый чекпоинт».

Это тот же instruct-якорь, но с направленно изменённой QKV-геометрией.

**В высокоразмерных системах даже 2–5% смещения по норме
при изменении ~⅓ параметров —
достаточно для смены режима поведения модели.**

Backbone сохранён.
Маршрутизация скорректирована.
Мультимодальность не повреждена.
Изменения подтверждены пост-валидацией (косинусы, нормы, shape, dtype).

Это структурная деформация, а не косметический merge.

### Что можно ожидать

- База — сильный **instruction-following** от Ministral Instruct.
- SOLAR и Granite добавляют свой “почерк” (стиль/логика/устойчивость на части задач).
- Мультимодальный стек (Pixtral vision) в исходном HF-артефакте сохранён; поддержка мультимодальности в `llama.cpp` зависит от текущего состояния проекта.

### Что лежит в репозитории

- `*.gguf`: готовые GGUF-кванты.

### GGUF / llama.cpp

- Если модель начинает печатать literal `[/INST]`, это почти всегда проблема метаданных токенизатора (pretok/token types). См. заметки и ожидаемую конфигурацию в **main HF repo**.
- Для мультимодальности в `llama.cpp` обычно нужен **GGUF модели** плюс отдельный **mmproj GGUF** (projector) — см. **main HF repo**.

Важно: `llama.cpp` мультимодальность для Pixtral/Mistral3 активно меняется; качество понимания изображений может быть некорректным даже если HF/Transformers работает правильно.


## EN

`SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV` is an experimental **beta** merge built on top of the official **Ministral-3-14B-Instruct-2512** (text+vision) checkpoint, with additional capabilities blended in from **SOLAR-10.7B-v1.0** and **IBM Granite-3.3-8b-base**.

This is a **GGUF-only** repository: it contains only ready-to-run `*.gguf` quants for `llama.cpp` and compatible runtimes.

### What you can expect

- Strong **instruction-following** base (Ministral Instruct).
- Extra style / reasoning “color” coming from SOLAR and Granite.
- Multimodal (Pixtral vision) is preserved in the main HF artifact; actual `llama.cpp` multimodal behavior depends on current upstream support.

### Blend map (what went into what)

| Component                                 | Role in the merge | Why it is here                                                                                                           |
| ----------------------------------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `mistralai/Ministral-3-14B-Instruct-2512` | Backbone          | Strong instruct alignment, modern tool/chat formatting, and the Pixtral vision stack.                                    |
| `Upstage/SOLAR-10.7B-v1.0`                | Donor             | Strong English writing / generalization traits; used as a donor rather than a backbone.                                  |
| `ibm-granite/granite-3.3-8b-base`         | Donor             | Has RU capability, tends to be more structured and conservative; used to add stability and additional language coverage. |

### How different is it from the base Ministral checkpoint?

Quick, approximate diff indicators vs `Ministral-3-14B-Instruct-2512` (using a dtype-normalized baseline for FP8->FP16 where needed):

| Metric | Value | Notes |
| --- | --- | --- |
| Changed parameter share | ~33.7% | changed_params_total ≈ 0.337 |
| Changed parameters (absolute) | ~6.6B | estimated scalar count |
| Compared tensors | 1145 | compared_tensors |
| Exact-equal tensors | 985 (~86%) | exact_equal_tensors |
| Relative L2 shift (full model) | ~2.25% | avg_rel_l2 ≈ 0.0225 |

It is important to understand:

- 2.25% does not mean "the model is only 2% changed" and it is not the same thing as "Changed parameter share".
- It is the relative norm of the shift in the parameter space (i.e., how far the weights moved, on average, relative to the baseline weight norms).

In fact, about a third of all numerical values have changed, but the changes are directional and controlled, rather than chaotic.

**Attention (QKV) — Primary Intervention Zone**

| Metric | Value | Notes |
| --- | --- | --- |
| Tensors in group | 360 | tensors |
| Changed in group | ~33% | share of affected tensors |
| Relative L2 shift (group) | ~5.4% | avg_rel_l2 ≈ 0.054 |
| Cosine alignment to donor direction | ~0.988 | cosine alignment |
| Average projection coefficient (alpha) | ~0.16 | alpha |

Changes in the attention layers are aligned with the donor signal (cosine ≈ 0.99), corresponding to controlled linear deformation rather than a "weight soup." 
This is specifically where information routing is altered.

**MLP**

| Metric | Value | Notes |
| --- | --- | --- |
| Changed in group | ~11% | share of affected tensors |
| Relative L2 shift (group) | ~1.7% | avg_rel_l2 ≈ 0.017 |

Status: MLP is affected softly—the backbone remains stable.

**What was NOT touched**

- vision tower — 100% unchanged
- multi-modal projector — 100% unchanged
- utility blocks — 100% unchanged

What this means in practice

This is not "98% the same checkpoint." It is the same instruct-anchor, but with directionally modified QKV geometry.

In high-dimensional systems, even a 2–5% shift in norm—when involving ~⅓ of the parameters—is sufficient to switch the model's behavioral regime.
Backbone: Preserved.
Routing: Adjusted.
Multimodality: Unharmed.
Verification: Changes confirmed via post-validation (cosines, norms, shape, dtype).

This is a structural deformation, not a cosmetic merge.

### Files in this repo

- `*.gguf`: ready-to-use GGUF quants.

### GGUF / llama.cpp notes

- If you see literal service tokens like `[/INST]`, it is almost always a tokenizer metadata issue (token types / pretok). See the **main HF repo** for the intended configuration.
- For multimodal usage in `llama.cpp`, expect a **model GGUF** plus a separate **mmproj GGUF** (projector). See the **main HF repo**.

Important: `llama.cpp` multimodal support for Pixtral/Mistral3 is under heavy development. In practice, image understanding quality may be incorrect even when HF/Transformers works correctly.


## License

Apache-2.0. Base model licenses apply for the corresponding upstream artifacts.