Any-to-Any
Transformers
Safetensors
neo_chat
feature-extraction
multimodal
text-to-image
image-to-text
image-editing
interleaved-generation
custom_code
Instructions to use sensenova/SenseNova-U1-8B-MoT-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sensenova/SenseNova-U1-8B-MoT-SFT with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("sensenova/SenseNova-U1-8B-MoT-SFT", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Upload folder using huggingface_hub
Browse files- .gitattributes +11 -0
- docs/assets/perform_vs_speed_avg3.webp +3 -0
- docs/assets/perform_vs_speed_avg8.webp +3 -0
- docs/assets/showcases/interleave/reasoning.png +3 -0
- docs/assets/showcases/t2i_infographic/0029.webp +3 -0
- docs/assets/showcases/t2i_infographic/0030.webp +3 -0
- docs/assets/showcases/t2i_infographic/0031.webp +3 -0
- docs/assets/showcases/t2i_infographic/0032.webp +3 -0
- docs/assets/showcases/t2i_infographic/0033.webp +3 -0
- docs/assets/teaser.webp +3 -0
- docs/assets/teaser_1.webp +3 -0
- docs/assets/teaser_2.webp +3 -0
- docs/inference_infra.md +5 -5
- docs/inference_infra_CN.md +5 -4
- docs/prompt_enhancement.md +12 -3
- docs/showcases.md +7 -1
- docs/showcases_CN.md +7 -1
.gitattributes
CHANGED
|
@@ -175,3 +175,14 @@ docs/assets/showcases/interleave/case_0007_bowie_slide_design.webp filter=lfs di
|
|
| 175 |
docs/assets/teaser_2.png filter=lfs diff=lfs merge=lfs -text
|
| 176 |
docs/assets/showcases/t2i_infographic/0028.webp filter=lfs diff=lfs merge=lfs -text
|
| 177 |
docs/assets/showcases/t2i_infographic/u1-case2.webp filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 175 |
docs/assets/teaser_2.png filter=lfs diff=lfs merge=lfs -text
|
| 176 |
docs/assets/showcases/t2i_infographic/0028.webp filter=lfs diff=lfs merge=lfs -text
|
| 177 |
docs/assets/showcases/t2i_infographic/u1-case2.webp filter=lfs diff=lfs merge=lfs -text
|
| 178 |
+
docs/assets/perform_vs_speed_avg3.webp filter=lfs diff=lfs merge=lfs -text
|
| 179 |
+
docs/assets/perform_vs_speed_avg8.webp filter=lfs diff=lfs merge=lfs -text
|
| 180 |
+
docs/assets/showcases/interleave/reasoning.png filter=lfs diff=lfs merge=lfs -text
|
| 181 |
+
docs/assets/showcases/t2i_infographic/0029.webp filter=lfs diff=lfs merge=lfs -text
|
| 182 |
+
docs/assets/showcases/t2i_infographic/0030.webp filter=lfs diff=lfs merge=lfs -text
|
| 183 |
+
docs/assets/showcases/t2i_infographic/0031.webp filter=lfs diff=lfs merge=lfs -text
|
| 184 |
+
docs/assets/showcases/t2i_infographic/0032.webp filter=lfs diff=lfs merge=lfs -text
|
| 185 |
+
docs/assets/showcases/t2i_infographic/0033.webp filter=lfs diff=lfs merge=lfs -text
|
| 186 |
+
docs/assets/teaser.webp filter=lfs diff=lfs merge=lfs -text
|
| 187 |
+
docs/assets/teaser_1.webp filter=lfs diff=lfs merge=lfs -text
|
| 188 |
+
docs/assets/teaser_2.webp filter=lfs diff=lfs merge=lfs -text
|
docs/assets/perform_vs_speed_avg3.webp
ADDED
|
Git LFS Details
|
docs/assets/perform_vs_speed_avg8.webp
ADDED
|
Git LFS Details
|
docs/assets/showcases/interleave/reasoning.png
ADDED
|
Git LFS Details
|
docs/assets/showcases/t2i_infographic/0029.webp
ADDED
|
Git LFS Details
|
docs/assets/showcases/t2i_infographic/0030.webp
ADDED
|
Git LFS Details
|
docs/assets/showcases/t2i_infographic/0031.webp
ADDED
|
Git LFS Details
|
docs/assets/showcases/t2i_infographic/0032.webp
ADDED
|
Git LFS Details
|
docs/assets/showcases/t2i_infographic/0033.webp
ADDED
|
Git LFS Details
|
docs/assets/teaser.webp
ADDED
|
Git LFS Details
|
docs/assets/teaser_1.webp
ADDED
|
Git LFS Details
|
docs/assets/teaser_2.webp
ADDED
|
Git LFS Details
|
docs/inference_infra.md
CHANGED
|
@@ -78,8 +78,8 @@ see [`deployment.md`](./deployment.md).
|
|
| 78 |
|
| 79 |
### Generation Performance
|
| 80 |
|
| 81 |
-
The table below
|
| 82 |
-
Fill in measured numbers for each machine and deployment profile.
|
| 83 |
Note: TP2+CFG2 means Tensor Parallel=2 + CFG Parallel=2.
|
| 84 |
|
| 85 |
<div align="center">
|
|
@@ -100,7 +100,7 @@ In NEO-Unify, the KV cache for the generation stage is provided by the understan
|
|
| 100 |
|
| 101 |
The table below compares the latency of a single diffusion step for
|
| 102 |
**2048x2048** image generation with **CFG enabled**. Unless otherwise noted,
|
| 103 |
-
all measurements are taken on **H100**; the `
|
| 104 |
`2x H100`.
|
| 105 |
Note: TP2+CFG2 means Tensor Parallel=2 + CFG Parallel=2.
|
| 106 |
|
|
@@ -113,7 +113,7 @@ Note: TP2+CFG2 means Tensor Parallel=2 + CFG Parallel=2.
|
|
| 113 |
| GLM-Image | 9B | 7B | 1.394 |
|
| 114 |
| ERNIE-Image | 8B | 8B | 1.565 |
|
| 115 |
| LongCat-Image | 8B | 6B | 0.796 |
|
| 116 |
-
|
|
| 117 |
-
|
|
| 118 |
|
| 119 |
</div>
|
|
|
|
| 78 |
|
| 79 |
### Generation Performance
|
| 80 |
|
| 81 |
+
The table below reports **2048x2048** image generation latency for
|
| 82 |
+
**SenseNova-U1-8B-MoT(NEO-Unify)**. Fill in measured numbers for each machine and deployment profile.
|
| 83 |
Note: TP2+CFG2 means Tensor Parallel=2 + CFG Parallel=2.
|
| 84 |
|
| 85 |
<div align="center">
|
|
|
|
| 100 |
|
| 101 |
The table below compares the latency of a single diffusion step for
|
| 102 |
**2048x2048** image generation with **CFG enabled**. Unless otherwise noted,
|
| 103 |
+
all measurements are taken on **H100**; the `TP2+CFG2` result uses
|
| 104 |
`2x H100`.
|
| 105 |
Note: TP2+CFG2 means Tensor Parallel=2 + CFG Parallel=2.
|
| 106 |
|
|
|
|
| 113 |
| GLM-Image | 9B | 7B | 1.394 |
|
| 114 |
| ERNIE-Image | 8B | 8B | 1.565 |
|
| 115 |
| LongCat-Image | 8B | 6B | 0.796 |
|
| 116 |
+
| SenseNova-U1-8B-MoT (Neo-Unify) | 8B | 8B | 0.312 |
|
| 117 |
+
| SenseNova-U1-8B-MoT (Neo-Unify, TP2+CFG2) | 8B | 8B | 0.158 |
|
| 118 |
|
| 119 |
</div>
|
docs/inference_infra_CN.md
CHANGED
|
@@ -76,7 +76,8 @@ Docker 镜像、启动命令与 API 测试的简明操作手册,请参见 [`de
|
|
| 76 |
|
| 77 |
### 生成性能
|
| 78 |
|
| 79 |
-
下表
|
|
|
|
| 80 |
注:TP2+CFG2 表示张量并行=2 + CFG 并行=2。
|
| 81 |
|
| 82 |
<div align="center">
|
|
@@ -94,7 +95,7 @@ Docker 镜像、启动命令与 API 测试的简明操作手册,请参见 [`de
|
|
| 94 |
|
| 95 |
### 跨模型速度对比
|
| 96 |
|
| 97 |
-
下表对比了在启用**CFG**条件下,生成 **2048x2048** 图像时单个 diffusion step 的延迟。除特别说明外,所有数据均在 **H100** 上测得;其中 `NEO-Unify
|
| 98 |
注:TP2+CFG2 表示张量并行=2 + CFG 并行=2。
|
| 99 |
|
| 100 |
<div align="center">
|
|
@@ -106,7 +107,7 @@ Docker 镜像、启动命令与 API 测试的简明操作手册,请参见 [`de
|
|
| 106 |
| GLM-Image | 9B | 7B | 1.394 |
|
| 107 |
| ERNIE-Image | 8B | 8B | 1.565 |
|
| 108 |
| LongCat-Image | 8B | 6B | 0.796 |
|
| 109 |
-
|
|
| 110 |
-
| NEO-Unify
|
| 111 |
|
| 112 |
</div>
|
|
|
|
| 76 |
|
| 77 |
### 生成性能
|
| 78 |
|
| 79 |
+
下表给出 **SenseNova-U1-8B-MoT(NEO-Unify)** 在
|
| 80 |
+
**2048x2048** 图像生成任务上的基准模版。列出了不同机型与部署配置下的实测数据。
|
| 81 |
注:TP2+CFG2 表示张量并行=2 + CFG 并行=2。
|
| 82 |
|
| 83 |
<div align="center">
|
|
|
|
| 95 |
|
| 96 |
### 跨模型速度对比
|
| 97 |
|
| 98 |
+
下表对比了在启用**CFG**条件下,生成 **2048x2048** 图像时单个 diffusion step 的延迟。除特别说明外,所有数据均在 **H100** 上测得;其中 `SenseNova-U1-8B-MoT (NEO-Unify, TP2+CFG2)` 使用的是 `2x H100`。
|
| 99 |
注:TP2+CFG2 表示张量并行=2 + CFG 并行=2。
|
| 100 |
|
| 101 |
<div align="center">
|
|
|
|
| 107 |
| GLM-Image | 9B | 7B | 1.394 |
|
| 108 |
| ERNIE-Image | 8B | 8B | 1.565 |
|
| 109 |
| LongCat-Image | 8B | 6B | 0.796 |
|
| 110 |
+
| SenseNova-U1-8B-MoT (NEO-Unify) | 8B | 8B | 0.312 |
|
| 111 |
+
| SenseNova-U1-8B-MoT (NEO-Unify, TP2+CFG2) | 8B | 8B | 0.158 |
|
| 112 |
|
| 113 |
</div>
|
docs/prompt_enhancement.md
CHANGED
|
@@ -27,8 +27,6 @@ Skip `--enhance` when:
|
|
| 27 |
user prompt ──► LLM (system prompt = infographic expander) ──► expanded prompt ──► SenseNova-U1
|
| 28 |
```
|
| 29 |
|
| 30 |
-
Upstream system prompt: [SenseNova-Skills / u1-infographic](https://github.com/OpenSenseNova/SenseNova-Skills/blob/main/skills/u1-infographic/references/prompts-expand-system.md).
|
| 31 |
-
|
| 32 |
## 3. Configuration
|
| 33 |
|
| 34 |
All configuration is environment-variable based so the same script can
|
|
@@ -45,12 +43,23 @@ First, create a `.env` file and populate it with the four required parameters. T
|
|
| 45 |
Add `--print_enhance` to echo the original + enhanced prompt for
|
| 46 |
debugging.
|
| 47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
### 3.1 Recommended backends
|
| 49 |
|
| 50 |
| Model | Backend | Endpoint template | Notes |
|
| 51 |
| :---- | :------ | :---------------- | :---- |
|
| 52 |
| **Gemini 3.1 Pro** (Default) | `chat_completions` | `https://generativelanguage.googleapis.com/v1beta/openai/chat/completions` | Best overall infographic quality in our internal bench. Excellent at structured / hierarchical content. |
|
| 53 |
-
| SenseNova
|
| 54 |
| Anthropic Claude (Sonnet/Opus) | `anthropic` | `https://api.anthropic.com/v1/messages` | Strong typography discipline, slightly less "information-dense" out of the box. |
|
| 55 |
| Kimi 2.5 | `chat_completions` | `https://api.moonshot.cn/v1/chat/completions` | Good Chinese enhancements, weaker for English-dense infographics in our runs. |
|
| 56 |
| Gemini 3.1 Flash-Lite (Third-party service) | `chat_completions` | `https://aigateway.edgecloudapp.com/v1/f194fd69361cd590f1fa136c9c90eca1/senseai` | The overall quality of the information chart is high and its generation speed is fast. |
|
|
|
|
| 27 |
user prompt ──► LLM (system prompt = infographic expander) ──► expanded prompt ──► SenseNova-U1
|
| 28 |
```
|
| 29 |
|
|
|
|
|
|
|
| 30 |
## 3. Configuration
|
| 31 |
|
| 32 |
All configuration is environment-variable based so the same script can
|
|
|
|
| 43 |
Add `--print_enhance` to echo the original + enhanced prompt for
|
| 44 |
debugging.
|
| 45 |
|
| 46 |
+
To use **SenseNova 6.7 Flash-Lite** as the enhancer, get an API key from
|
| 47 |
+
[SenseNova Console · token-plan](https://platform.sensenova.cn/token-plan),
|
| 48 |
+
then set:
|
| 49 |
+
|
| 50 |
+
```bash
|
| 51 |
+
U1_ENHANCE_BACKEND=chat_completions
|
| 52 |
+
U1_ENHANCE_ENDPOINT=https://token.sensenova.cn/v1/chat/completions
|
| 53 |
+
U1_ENHANCE_MODEL=sensenova-6.7-flash-lite
|
| 54 |
+
U1_ENHANCE_API_KEY=<your SenseNova API key>
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
### 3.1 Recommended backends
|
| 58 |
|
| 59 |
| Model | Backend | Endpoint template | Notes |
|
| 60 |
| :---- | :------ | :---------------- | :---- |
|
| 61 |
| **Gemini 3.1 Pro** (Default) | `chat_completions` | `https://generativelanguage.googleapis.com/v1beta/openai/chat/completions` | Best overall infographic quality in our internal bench. Excellent at structured / hierarchical content. |
|
| 62 |
+
| SenseNova 6.7 Flash-Lite | `chat_completions` | `https://token.sensenova.cn/v1/chat/completions` | Near Gemini 3.1 Pro quality on Chinese content at lower per-token cost, preferred for production. |
|
| 63 |
| Anthropic Claude (Sonnet/Opus) | `anthropic` | `https://api.anthropic.com/v1/messages` | Strong typography discipline, slightly less "information-dense" out of the box. |
|
| 64 |
| Kimi 2.5 | `chat_completions` | `https://api.moonshot.cn/v1/chat/completions` | Good Chinese enhancements, weaker for English-dense infographics in our runs. |
|
| 65 |
| Gemini 3.1 Flash-Lite (Third-party service) | `chat_completions` | `https://aigateway.edgecloudapp.com/v1/f194fd69361cd590f1fa136c9c90eca1/senseai` | The overall quality of the information chart is high and its generation speed is fast. |
|
docs/showcases.md
CHANGED
|
@@ -115,6 +115,12 @@ Reproducible prompts are in
|
|
| 115 |
<td align="center"><a href="./assets/showcases/t2i_infographic/0027.webp"><img width="230" alt="t2i image 0024" src="./assets/showcases/t2i_infographic/0027.webp"></a></td>
|
| 116 |
<td align="center"><a href="./assets/showcases/t2i_infographic/0026.webp"><img width="230" alt="t2i image 0025" src="./assets/showcases/t2i_infographic/0026.webp"></a></td>
|
| 117 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 118 |
</table>
|
| 119 |
|
| 120 |
|
|
@@ -235,7 +241,7 @@ Reproducible prompts are in
|
|
| 235 |
|
| 236 |
| |
|
| 237 |
| :---: |
|
| 238 |
-
| [<img alt="interleave case 05" src="./assets/showcases/interleave/
|
| 239 |
|
| 240 |
|
| 241 |
---
|
|
|
|
| 115 |
<td align="center"><a href="./assets/showcases/t2i_infographic/0027.webp"><img width="230" alt="t2i image 0024" src="./assets/showcases/t2i_infographic/0027.webp"></a></td>
|
| 116 |
<td align="center"><a href="./assets/showcases/t2i_infographic/0026.webp"><img width="230" alt="t2i image 0025" src="./assets/showcases/t2i_infographic/0026.webp"></a></td>
|
| 117 |
</tr>
|
| 118 |
+
<tr>
|
| 119 |
+
<td align="center"><a href="./assets/showcases/t2i_infographic/0029.webp"><img width="230" alt="t2i image 0022" src="./assets/showcases/t2i_infographic/0029.webp"></a></td>
|
| 120 |
+
<td align="center"><a href="./assets/showcases/t2i_infographic/0030.webp"><img width="230" alt="t2i image 0023" src="./assets/showcases/t2i_infographic/0030.webp"></a></td>
|
| 121 |
+
<td align="center"><a href="./assets/showcases/t2i_infographic/0031.webp"><img width="230" alt="t2i image 0024" src="./assets/showcases/t2i_infographic/0031.webp"></a></td>
|
| 122 |
+
<td align="center"><a href="./assets/showcases/t2i_infographic/0032.webp"><img width="230" alt="t2i image 0025" src="./assets/showcases/t2i_infographic/0032.webp"></a></td>
|
| 123 |
+
</tr>
|
| 124 |
</table>
|
| 125 |
|
| 126 |
|
|
|
|
| 241 |
|
| 242 |
| |
|
| 243 |
| :---: |
|
| 244 |
+
| [<img alt="interleave case 05" src="./assets/showcases/interleave/reasoning.png">](./assets/showcases/interleave/reasoning.png) |
|
| 245 |
|
| 246 |
|
| 247 |
---
|
docs/showcases_CN.md
CHANGED
|
@@ -109,6 +109,12 @@
|
|
| 109 |
<td align="center"><a href="./assets/showcases/t2i_infographic/0027.webp"><img width="230" alt="t2i image 0024" src="./assets/showcases/t2i_infographic/0027.webp"></a></td>
|
| 110 |
<td align="center"><a href="./assets/showcases/t2i_infographic/0026.webp"><img width="230" alt="t2i image 0025" src="./assets/showcases/t2i_infographic/0026.webp"></a></td>
|
| 111 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
</table>
|
| 113 |
|
| 114 |
|
|
@@ -214,7 +220,7 @@
|
|
| 214 |
|
| 215 |
| |
|
| 216 |
| :---: |
|
| 217 |
-
| [<img alt="interleave reasoning case 2" src="./assets/showcases/interleave/
|
| 218 |
|
| 219 |
---
|
| 220 |
|
|
|
|
| 109 |
<td align="center"><a href="./assets/showcases/t2i_infographic/0027.webp"><img width="230" alt="t2i image 0024" src="./assets/showcases/t2i_infographic/0027.webp"></a></td>
|
| 110 |
<td align="center"><a href="./assets/showcases/t2i_infographic/0026.webp"><img width="230" alt="t2i image 0025" src="./assets/showcases/t2i_infographic/0026.webp"></a></td>
|
| 111 |
</tr>
|
| 112 |
+
<tr>
|
| 113 |
+
<td align="center"><a href="./assets/showcases/t2i_infographic/0029.webp"><img width="230" alt="t2i image 0022" src="./assets/showcases/t2i_infographic/0029.webp"></a></td>
|
| 114 |
+
<td align="center"><a href="./assets/showcases/t2i_infographic/0030.webp"><img width="230" alt="t2i image 0023" src="./assets/showcases/t2i_infographic/0030.webp"></a></td>
|
| 115 |
+
<td align="center"><a href="./assets/showcases/t2i_infographic/0031.webp"><img width="230" alt="t2i image 0024" src="./assets/showcases/t2i_infographic/0031.webp"></a></td>
|
| 116 |
+
<td align="center"><a href="./assets/showcases/t2i_infographic/0032.webp"><img width="230" alt="t2i image 0025" src="./assets/showcases/t2i_infographic/0032.webp"></a></td>
|
| 117 |
+
</tr>
|
| 118 |
</table>
|
| 119 |
|
| 120 |
|
|
|
|
| 220 |
|
| 221 |
| |
|
| 222 |
| :---: |
|
| 223 |
+
| [<img alt="interleave reasoning case 2" src="./assets/showcases/interleave/reasoning.png">](./assets/showcases/interleave/reasoning.png) |
|
| 224 |
|
| 225 |
---
|
| 226 |
|