TAI Research commited on May 23

Commit

29691f6

0 Parent(s):

Initial commit: Lumina_Dev_Legacy (archived)

Files changed (30) hide show

LICENSE +21 -0
README.md +158 -0
SOLUTION_SUMMARY.md +49 -0
configs/data/laion_filtered.yaml +52 -0
configs/model/diffusion.yaml +19 -0
configs/model/unet_light.yaml +22 -0
configs/training/p4_optimized.yaml +69 -0
configs/training/schedule_256.yaml +33 -0
data/laion/dataset_info.json +14 -0
data/laion/metadata.parquet +0 -0
docs/LAION_DATASET_GUIDE.md +279 -0
requirements.txt +28 -0
scripts/benchmark.py +490 -0
scripts/download_laion.py +272 -0
scripts/export.py +270 -0
scripts/train.py +293 -0
src/data/__pycache__/dataset.cpython-313.pyc +0 -0
src/data/dataset.py +298 -0
src/data/preprocessing.py +295 -0
src/data/text_encoder.py +227 -0
src/inference/api.py +631 -0
src/inference/optimization.py +427 -0
src/inference/sampler.py +428 -0
src/models/attention.py +143 -0
src/models/diffusion.py +263 -0
src/models/unet_light.py +379 -0
src/training/callbacks.py +324 -0
src/training/memory_manager.py +245 -0
src/training/trainer_p4.py +378 -0
tests/test_basic.py +250 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2026 TAI Research
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md ADDED Viewed

	@@ -0,0 +1,158 @@

+---
+license: mit
+tags:
+- image-generation
+- legacy
+- archived
+- research
+- pytorch
+---
+# 🎨 Lumina_Dev_Legacy — [ARCHIVED]
+> **⚠️ 项目状态：已归档 / 停止开发**
+>
+> 本项目为 TAI Research 早期探索性项目，开发阶段已终止，代码仅作存档用途。项目未完成完整训练，不提供预训练模型权重。
+| 项目信息 | |
+|---------|---|
+| **状态** | 🔴 已废弃 (Archived) |
+| **原因** | 项目方向调整，资源重新分配至其他研究领域 |
+| **最后更新** | 2026 |
+| **训练状态** | ❌ 未完成训练 |
+| **可用资源** | 仅代码框架，无预训练权重 |
+---
+## 项目背景
+Lumina 原本的目标是开发一个**针对有限硬件（特别是 NVIDIA P4，8GB 显存）优化的轻量级图像生成模型**，基于扩散模型架构，专注于文本到图像生成。
+### 原始设计目标
+- **极致的显存优化**：专门为 8GB 显存的 GPU 优化
+- **轻量级架构**：参数量 < 2000 万
+- **完整训练管道**：从数据预处理到模型训练
+- **高效推理**：支持多种采样器
+- **模块化设计**：易于扩展和定制
+### 为什么废弃？
+1. **方向调整**：TAI Research 将重心转向语言模型和 AI 安全领域
+2. **资源限制**：图像生成模型训练需要大量计算资源
+3. **竞争环境**：开源社区已有成熟方案（Stable Diffusion、Flux 等）
+---
+## 代码内容
+本仓库仅包含开发阶段的代码框架：
+```
+lumina_legacy/
+├── configs/                    # 配置文件
+│   ├── model/                 # 模型配置（UNet架构）
+│   ├── training/              # 训练配置（P4优化）
+│   └── data/                  # 数据配置
+├── src/                       # 源代码
+│   ├── models/                # UNet + 注意力机制
+│   ├── training/              # 训练器 + 内存优化
+│   ├── data/                  # LAION数据处理
+│   └── inference/             # 采样器（DDIM/DPM/LCM）
+├── scripts/                   # 工具脚本
+│   ├── train.py              # 训练入口
+│   ├── download_laion.py     # 数据下载
+│   └── webui.py              # Gradio界面
+└── tests/                     # 单元测试
+```
+**⚠️ 注意**：
+- 代码未经完整测试，可能存在 bug
+- 训练管道未验证
+- 不保证可复现
+---
+## 技术栈
+| 组件 | 技术 |
+|------|------|
+| 框架 | PyTorch 2.0+ |
+| 架构 | 轻量级 UNet + Cross-Attention |
+| 扩散 | DDPM / DDIM |
+| 文本编码 | CLIP (预训练冻结) |
+| 精度 | FP16 混合精度 |
+| 优化 | 梯度检查点 + 梯度累积 |
+### 原始设计架构
+```
+输入 (4×64×64)
+    ↓
+Conv2d (4→64)
+    ↓
+[下采样块 × 4]
+    ↓
+[注意力层 (8×8)]
+    ↓
+[上采样块 × 4]
+    ↓
+Conv2d (64→4)
+    ↓
+输出 (4×64×64)
+```
+---
+## 如何使用这些代码
+> ⚠️ 代码仅为存档，不提供任何保证
+```bash
+# 克隆仓库
+git clone https://huggingface.co/TAI-Research/Lumina_Dev_Legacy
+cd Lumina_Dev_Legacy
+# 安装依赖（如需要）
+pip install -r requirements.txt
+# 尝试运行训练（仅测试代码是否可运行）
+python scripts/train.py --config configs/training/p4_optimized.yaml --dummy
+```
+---
+## 已知问题
+- [ ] 训练管道未完整验证
+- [ ] 数据处理模块有性能问题
+- [ ] 推理采样器可能有 bug
+- [ ] 缺少单元测试覆盖
+---
+## 后续
+如果你对这个项目感兴趣，建议使用成熟的开源方案：
+- [Stable Diffusion](https://github.com/CompVis/stable-diffusion)
+- [Hugging Face Diffusers](https://github.com/huggingface/diffusers)
+- [Flux](https://github.com/black-forest-labs/flux)
+---
+## 许可证
+MIT License — 代码可自由使用，但无任何保证。
+---
+## 相关链接
+- [TAI Research Hugging Face](https://huggingface.co/TAI-Research)
+- [GTC-Guard-0](https://huggingface.co/TAI-Research/GTC-Guard-0)
+---
+**最后更新**: 2026-05-23
+**归档者**: TAI Research

SOLUTION_SUMMARY.md ADDED Viewed

	@@ -0,0 +1,49 @@

+---
+license: mit
+tags:
+- image-generation
+- legacy
+- archived
+- research
+- pytorch
+---
+# 🎨 Lumina_Dev_Legacy — [ARCHIVED]
+> **⚠️ 项目状态：已归档 / 停止开发**
+>
+> 本项目为 TAI Research 早期探索性项目，开发阶段已终止，代码仅作存档用途。项目未完成完整训练，不提供预训练模型权重。
+| 项目信息 | |
+|---------|---|
+| **状态** | 🔴 已废弃 (Archived) |
+| **原因** | 项目方向调整，资源重新分配至其他研究领域 |
+| **最后更新** | 2026 |
+| **训练状态** | ❌ 未完成训练 |
+| **可用资源** | 仅代码框架，无预训练权重 |
+---
+## 项目背景
+Lumina 原本的目标是开发一个**针对有限硬件（特别是 NVIDIA P4，8GB 显存）优化的轻量级图像生成模型**，基于扩散模型架构，专注于文本到图像生成。
+### 原始设计目标
+- **极致的显存优化**：专门为 8GB 显存的 GPU 优化
+- **轻量级架构**：参数量 < 2000 万
+- **完整训练管道**：从数据预处理到模型训练
+- **高效推理**：支持多种采样器
+- **模块化设计**：易于扩展和定制
+### 为什么废弃？
+1. **方向调整**：TAI Research 将重心转向语言模型和 AI 安全领域
+2. **资源限制**：图像生成模型训练需要大量计算资源
+3. **竞争环境**：开源社区已有成熟方案（Stable Diffusion、Flux 等）
+---
+## 代码内容
+本仓库仅包含开发阶段的代码框架：

configs/data/laion_filtered.yaml ADDED Viewed

	@@ -0,0 +1,52 @@

+# 数据处理配置
+dataset:
+  name: "laion-aesthetic"
+  path: "./data/laion"  # 数据存放路径
+  metadata_file: "./data/laion/metadata.parquet"
+  # 过滤条件
+  filters:
+    aesthetic_score: 6.0
+    watermark_prob: 0.5
+    nsfw: false
+  # 数据拆分
+  split:
+    train: 0.95
+    val: 0.05
+    seed: 42
+  # 处理配置
+  max_samples: 2000000  # 最大样本数
+  shuffle: true
+  shuffle_seed: 42
+preprocessing:
+  # 图像处理
+  target_size: 512
+  resize_mode: "center_crop"  # "center_crop", "random_crop", "resize"
+  random_crop: true
+  random_flip: true
+  # 归一化
+  normalize:
+    mean: [0.5, 0.5, 0.5]
+    std: [0.5, 0.5, 0.5]
+  # 文本处理
+  tokenizer: "openai/clip-vit-base-patch32"
+  max_length: 77
+  truncation: true
+  padding: "max_length"
+  # 缓存
+  use_cache: true
+  cache_dir: "./data/cache"
+  cache_compression: true
+loader:
+  batch_size: 1
+  shuffle: true
+  num_workers: 2
+  prefetch_factor: 2
+  persistent_workers: true

configs/model/diffusion.yaml ADDED Viewed

	@@ -0,0 +1,19 @@

+# 扩散过程配置
+diffusion:
+  beta_schedule: "scaled_linear"
+  beta_start: 0.00085
+  beta_end: 0.012
+  num_train_timesteps: 1000
+  num_inference_timesteps: 50
+  # 损失函数
+  loss_type: "l2"  # 或 "l1"
+  snr_gamma: null
+  # 采样器
+  sampler_type: "ddim"  # 或 "dpm++_2m"
+  prediction_type: "epsilon"
+  # 训练参数
+  vae_scale_factor: 0.18215
+  offset_noise_strength: 0.1

configs/model/unet_light.yaml ADDED Viewed

	@@ -0,0 +1,22 @@

+# 轻量UNet配置
+model:
+  in_channels: 4  # 潜在空间通道数
+  out_channels: 4
+  base_channels: 64
+  channel_mults: [1, 2, 4, 8]  # 4次下采样
+  num_res_blocks: 2
+  attention_resolutions: [8]  # 仅在最低分辨率应用注意力
+  dropout: 0.0
+  use_checkpoint: true
+  num_heads: 4
+  # 文本条件
+  context_dim: 768  # CLIP文本编码维度
+  use_linear_projection: true
+  # 时间步嵌入
+  time_embed_dim: 256
+  # 优化配置
+  use_flash_attention: false  # P4不支持，但保留选项
+  gradient_checkpointing: true

configs/training/p4_optimized.yaml ADDED Viewed

	@@ -0,0 +1,69 @@

+# P4优化训练配置
+hardware:
+  gpu_memory: 8  # GB
+  batch_size: 1  # 实际批大小
+  gradient_accumulation_steps: 8
+  num_workers: 2
+  pin_memory: true
+  # 显存优化策略
+  mixed_precision: "fp16"
+  gradient_checkpointing: true
+  attention_slicing: "auto"
+  cpu_offload: true
+  tiled_vae: false  # 如果启用分块VAE解码
+  # 动态显存管理
+  memory_threshold_gb: 6.5
+  warning_threshold_gb: 6.0
+  cleanup_frequency: 100
+training:
+  max_epochs: 50
+  learning_rate: 1e-4
+  learning_rate_scheduler: "cosine"
+  warmup_steps: 1000
+  weight_decay: 0.01
+  adam_beta1: 0.9
+  adam_beta2: 0.999
+  adam_epsilon: 1e-8
+  # 训练策略
+  gradient_clip: 1.0
+  use_ema: true
+  ema_decay: 0.9999
+  save_checkpoint_every: 1000
+  save_best_model: true
+  # 验证和监控
+  validation_steps: 500
+  sample_steps: 500
+  log_steps: 50
+  # 优化器状态
+  optimizer_on_cpu: true
+data:
+  resolution: 512
+  center_crop: true
+  random_flip: true
+  cache_dataset: true
+  # 数据增强
+  augmentation:
+    random_crop: true
+    color_jitter: 0.05
+    random_rotation: 5.0  # 角度
+logging:
+  use_wandb: false
+  use_tensorboard: true
+  log_dir: "./logs"
+  project_name: "lumina"
+  run_name: "lumina-v0.1"
+checkpoint:
+  save_dir: "./checkpoints"
+  keep_last: 5
+  save_compressed: true
+  save_onnx: false

configs/training/schedule_256.yaml ADDED Viewed

	@@ -0,0 +1,33 @@

+# 256x256训练计划
+phases:
+  - name: "phase1_warmup"
+    epochs: 5
+    resolution: 256
+    learning_rate: 1e-5
+    batch_size: 1
+    gradient_accumulation: 8
+    description: "预热阶段，低分辨率"
+  - name: "phase2_main"
+    epochs: 20
+    resolution: 256
+    learning_rate: 1e-4
+    batch_size: 1
+    gradient_accumulation: 8
+    description: "主训练阶段"
+  - name: "phase3_refine"
+    epochs: 10
+    resolution: 256
+    learning_rate: 5e-5
+    batch_size: 1
+    gradient_accumulation: 8
+    description: "精细调优"
+  - name: "phase4_upscale"
+    epochs: 15
+    resolution: 512
+    learning_rate: 2e-5
+    batch_size: 1
+    gradient_accumulation: 4
+    description: "升级到512分辨率"

data/laion/dataset_info.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+  "total_samples": 10,
+  "columns": [
+    "url",
+    "caption",
+    "aesthetic_score",
+    "watermark_prob",
+    "NSFW",
+    "image_file"
+  ],
+  "dataset": "dummy",
+  "description": "\u865a\u62df\u6d4b\u8bd5\u6570\u636e\u96c6 (10\u6761\u8bb0\u5f55)",
+  "note": "\u4e0d\u5305\u542b\u5b9e\u9645\u56fe\u50cf\u6587\u4ef6\uff0c\u4ec5\u7528\u4e8e\u6d4b\u8bd5\u4ee3\u7801\u6d41\u7a0b"
+}

data/laion/metadata.parquet ADDED Viewed

Binary file (4.91 kB). View file

docs/LAION_DATASET_GUIDE.md ADDED Viewed

	@@ -0,0 +1,279 @@

+# LAION数据集下载指南
+## 问题描述
+运行代码时出现以下错误：
+```
+FileNotFoundError: [Errno 2] No such file or directory: './data/laion/metadata.parquet'
+```
+这是因为项目需要LAION数据集，但数据集文件不存在。
+## LAION数据集简介
+LAION（Large-scale Artificial Intelligence Open Network）是一个大规模的多模态数据集，包含：
+- **LAION-5B**: 58.5亿个图像-文本对
+- **LAION-Aesthetic**: 经过美学评分筛选的高质量子集
+- **LAION-400M**: 4亿个图像-文本对
+## 解决方案
+### 方案1：使用虚拟数据集测试（推荐）
+对于测试和开发，可以使用虚拟数据集：
+```bash
+# 创建虚拟数据集
+python scripts/download_laion.py --dummy --dummy-size 100
+# 或者直接运行
+python scripts/download_laion.py --dummy
+```
+这将创建一个包含100条虚拟记录的元数据文件，用于测试代码流程。
+### 方案2：下载LAION-Aesthetic子集
+LAION-Aesthetic是经过美学评分筛选的高质量数据集：
+```bash
+# 下载LAION-Aesthetic 6.5+子集（默认）
+python scripts/download_laion.py
+# 下载LAION-Aesthetic 7.0+子集（更高质量）
+python scripts/download_laion.py --subset "7.0+"
+```
+### 方案3：下载LAION-5B样本
+```bash
+# 下载10,000条记录的样本
+python scripts/download_laion.py --sample-size 10000
+```
+## 手动下载方法
+### 方法1：从Hugging Face下载
+1. **访问Hugging Face数据集页面**：
+   - LAION-Aesthetic 6.5+: https://huggingface.co/datasets/laion/laion-aesthetic-6.5plus
+   - LAION-Aesthetic 7.0+: https://huggingface.co/datasets/laion/laion-aesthetic-7.0plus
+   - LAION-5B: https://huggingface.co/datasets/laion/laion2b-en
+2. **下载元数据文件**：
+   ```bash
+   # 创建目录
+   mkdir -p data/laion
+   # 下载LAION-Aesthetic 6.5+元数据
+   wget https://huggingface.co/datasets/laion/laion-aesthetic-6.5plus/resolve/main/data/00000.parquet -O data/laion/metadata.parquet
+   # 或者使用curl
+   curl -L https://huggingface.co/datasets/laion/laion-aesthetic-6.5plus/resolve/main/data/00000.parquet -o data/laion/metadata.parquet
+   ```
+### 方法2：使用img2dataset工具
+`img2dataset`是一个专门用于下载LAION数据集的工具：
+```bash
+# 安装img2dataset
+pip install img2dataset
+# 下载LAION-400M子集
+img2dataset \
+  --url_list "path/to/laion-400m.parquet" \
+  --input_format "parquet" \
+  --url_col "URL" \
+  --caption_col "TEXT" \
+  --output_folder "data/laion/images" \
+  --processes_count 16 \
+  --thread_count 64 \
+  --image_size 512 \
+  --resize_mode "keep_ratio" \
+  --output_format "webdataset"
+```
+### 方法3：使用官方脚本
+LAION官方提供了一些下载脚本：
+```bash
+# 克隆LAION工具仓库
+git clone https://github.com/rom1504/img2dataset.git
+cd img2dataset
+# 查看使用说明
+python -m img2dataset --help
+```
+## 数据集结构
+下载后，数据集目录结构应为：
+```
+data/
+└── laion/
+    ├── metadata.parquet          # 元数据文件（必需）
+    ├── dataset_info.json         # 数据集信息文件
+    └── images/                   # 图像文件目录（可选）
+        ├── 00000.tar
+        ├── 00001.tar
+        └── ...
+```
+### 元数据文件格式
+`metadata.parquet`文件通常包含以下列：
+- `url`: 图像URL
+- `caption` 或 `text`: 图像描述文本
+- `aesthetic_score`: 美学评分（LAION-Aesthetic特有）
+- `watermark_prob`: 水印概率
+- `NSFW`: 成人内容标记
+- `width`/`height`: 图像尺寸
+## 验证数据集
+下载完成后，验证数据集是否正确：
+```python
+import pandas as pd
+import os
+# 检查文件是否存在
+metadata_path = "./data/laion/metadata.parquet"
+if os.path.exists(metadata_path):
+    print(f"元数据文件存在: {metadata_path}")
+    # 读取前几行
+    df = pd.read_parquet(metadata_path)
+    print(f"记录数: {len(df)}")
+    print(f"列名: {list(df.columns)}")
+    print("\n前5条记录:")
+    print(df.head())
+else:
+    print(f"错误: 文件不存在 {metadata_path}")
+```
+## 常见问题
+### 问题1：下载速度慢
+- **解决方案**：使用国内镜像或代理
+- 可以尝试使用清华镜像：`pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple`
+### 问题2：存储空间不足
+- **解决方案**：
+  1. 下载较小的子集（如LAION-Aesthetic）
+  2. 使用虚拟数据集进行测试
+  3. 只下载元数据，不下载图像
+### 问题3：网络连接问题
+- **解决方案**：
+  1. 使用`--dummy`参数创建虚拟数据集
+  2. 手动下载小样本文件
+  3. 使用现有的本地数据集
+### 问题4：Parquet文件读取错误
+- **解决方案**：
+  ```bash
+  # 安装正确版本的pandas和pyarrow
+  pip install pandas pyarrow fastparquet
+  # 或者使用dask读取
+  pip install dask[dataframe]
+  ```
+## 高级用法
+### 自定义数据集
+如��需要使用自定义数据集，可以修改配置文件：
+```yaml
+# configs/data/laion_filtered.yaml
+dataset:
+  name: "custom-dataset"
+  path: "./data/custom"  # 修改路径
+  metadata_file: "./data/custom/metadata.parquet"  # 修改元数据文件路径
+```
+### 数据集预处理
+项目包含数据预处理模块：
+```python
+from src.data.preprocessing import get_transform
+# 获取数据变换
+transform = get_transform(config, mode='train')
+# 创建数据集
+from src.data.dataset import LAIONDataset
+dataset = LAIONDataset(config, transform=transform, split='train')
+```
+### 批量下载图像
+如果需要下载实际图像文件：
+```python
+import requests
+from PIL import Image
+from io import BytesIO
+import pandas as pd
+# 读取元数据
+df = pd.read_parquet("./data/laion/metadata.parquet")
+# 下载前N张图像
+for i, row in df.head(10).iterrows():
+    try:
+        response = requests.get(row['url'], timeout=10)
+        img = Image.open(BytesIO(response.content))
+        img.save(f"./data/laion/images/image_{i:06d}.jpg")
+        print(f"下载完成: {i}")
+    except Exception as e:
+        print(f"下载失败 {row['url']}: {e}")
+```
+## 性能优化建议
+1. **使用缓存**：启用数据集缓存加速训练
+   ```yaml
+   preprocessing:
+     use_cache: true
+     cache_dir: "./data/cache"
+   ```
+2. **数据并行**：使用多个worker加载数据
+   ```yaml
+   loader:
+     num_workers: 4
+     prefetch_factor: 2
+   ```
+3. **内存映射**：对于大型数据集，使用内存映射文件
+   ```python
+   df = pd.read_parquet("metadata.parquet", memory_map=True)
+   ```
+## 参考资料
+1. [LAION官方网站](https://laion.ai/)
+2. [LAION数据集论文](https://arxiv.org/abs/2210.08402)
+3. [Hugging Face数据集](https://huggingface.co/datasets/laion)
+4. [img2dataset工具](https://github.com/rom1504/img2dataset)
+5. [WebDataset格式](https://github.com/webdataset/webdataset)
+## 技术支持
+如果遇到问题：
+1. 检查错误信息
+2. 查看日志文件
+3. 参考项目README
+4. 在GitHub Issues中搜索类似问题
+5. 创建新的Issue寻求帮助
+---
+**注意**：LAION数据集受版权法保护，请确保遵守使用条款和许可证要求。

requirements.txt ADDED Viewed

	@@ -0,0 +1,28 @@

+# 核心依赖
+torch>=2.0.0
+torchvision>=0.15.0
+transformers>=4.30.0
+diffusers>=0.20.0
+accelerate>=0.21.0
+# 数据处理
+Pillow>=10.0.0
+numpy>=1.24.0
+pandas>=2.0.0
+pyarrow>=12.0.0
+# 训练工具
+wandb>=0.15.0
+tqdm>=4.65.0
+matplotlib>=3.7.0
+tensorboard>=2.13.0
+# API和部署
+gradio>=3.41.0
+fastapi>=0.100.0
+uvicorn>=0.23.0
+# 开发工具
+black>=23.7.0
+flake8>=6.0.0
+isort>=5.12.0

scripts/benchmark.py ADDED Viewed

	@@ -0,0 +1,490 @@

+#!/usr/bin/env python3
+"""
+性能基准测试脚本
+测试模型训练和推理性能
+"""
+import os
+import sys
+import time
+import argparse
+import yaml
+import torch
+import torch.nn as nn
+from torch.utils.data import DataLoader
+import numpy as np
+from tqdm import tqdm
+# 添加项目根目录到Python路径
+sys.path.append(os.path.dirname(os.path.dirname(__file__)))
+from src.models.unet_light import UNetLight
+from src.models.diffusion import DiffusionProcess
+from src.data.dataset import create_data_loaders
+from src.inference.optimization import InferenceBenchmark
+from src.inference.sampler import DDIMSampler
+def load_config(config_path: str) -> dict:
+    """加载配置文件"""
+    with open(config_path, 'r') as f:
+        config = yaml.safe_load(f)
+    return config
+def benchmark_training(config: dict):
+    """训练性能基准测试"""
+    print("=" * 60)
+    print("训练性能基准测试")
+    print("=" * 60)
+    # 设备
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    # 加载模型配置
+    model_config = load_config('configs/model/unet_light.yaml')
+    # 创建模型
+    model = UNetLight(model_config).to(device)
+    # 创建扩散过程
+    diffusion_config = load_config('configs/model/diffusion.yaml')
+    diffusion = DiffusionProcess(diffusion_config)
+    # 创建数据加载器
+    data_config = load_config('configs/data/laion_filtered.yaml')
+    train_loader, _ = create_data_loaders(data_config)
+    # 优化器
+    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-4)
+    # 预热
+    print("预热...")
+    warmup_batches = 5
+    for i, batch in enumerate(train_loader):
+        if i >= warmup_batches:
+            break
+        images = batch['images'].to(device)
+        text_embeddings = batch['text_embeddings'].to(device)
+        # 前向传播
+        loss = diffusion.compute_loss(model, images, text_embeddings)
+        # 反向传播
+        loss.backward()
+        optimizer.zero_grad()
+    # 同步
+    if torch.cuda.is_available():
+        torch.cuda.synchronize()
+    # 基准测试
+    print("运行训练基准测试...")
+    num_batches = 20
+    batch_times = []
+    memory_usage = []
+    model.train()
+    for i, batch in enumerate(train_loader):
+        if i >= num_batches:
+            break
+        images = batch['images'].to(device)
+        text_embeddings = batch['text_embeddings'].to(device)
+        # 开始计时
+        start_time = time.time()
+        # 前向传播
+        loss = diffusion.compute_loss(model, images, text_embeddings)
+        # 反向传播
+        loss.backward()
+        optimizer.step()
+        optimizer.zero_grad()
+        # 同步
+        if torch.cuda.is_available():
+            torch.cuda.synchronize()
+        # 结束计时
+        end_time = time.time()
+        batch_time = end_time - start_time
+        batch_times.append(batch_time)
+        # 记录内存使用
+        if torch.cuda.is_available():
+            memory_allocated = torch.cuda.memory_allocated() / 1024**3
+            memory_usage.append(memory_allocated)
+        # 进度
+        print(f"批次 {i+1}/{num_batches}: {batch_time:.3f}s")
+    # 统计
+    batch_times = np.array(batch_times)
+    print("\n" + "=" * 60)
+    print("训练基准测试结果:")
+    print(f"  平均批次时间: {batch_times.mean():.3f} ± {batch_times.std():.3f} s")
+    print(f"  最小批次时间: {batch_times.min():.3f} s")
+    print(f"  最大批次时间: {batch_times.max():.3f} s")
+    print(f"  吞吐量: {1 / batch_times.mean():.2f} batches/s")
+    if memory_usage:
+        memory_usage = np.array(memory_usage)
+        print(f"  平均GPU内存使用: {memory_usage.mean():.2f} ± {memory_usage.std():.2f} GB")
+    print("=" * 60)
+    return batch_times.mean()
+def benchmark_inference(config: dict):
+    """推理性能基准测试"""
+    print("\n" + "=" * 60)
+    print("推理性能基准测试")
+    print("=" * 60)
+    # 设备
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    # 加载模型配置
+    model_config = load_config('configs/model/unet_light.yaml')
+    # 创建模型
+    model = UNetLight(model_config).to(device)
+    model.eval()
+    # 创建扩散过程
+    diffusion_config = load_config('configs/model/diffusion.yaml')
+    diffusion = DiffusionProcess(diffusion_config)
+    # 创建基准测试器
+    benchmark = InferenceBenchmark(model, device)
+    # 测试不同分辨率
+    resolutions = [(256, 256), (512, 512), (768, 768)]
+    results = {}
+    for height, width in resolutions:
+        print(f"\n测试分辨率: {width}x{height}")
+        # 潜在空间大小
+        latent_height = height // 8
+        latent_width = width // 8
+        # 运行基准测试
+        stats = benchmark.benchmark(
+            input_shape=(1, model.in_channels, latent_height, latent_width),
+            num_iterations=10,
+            warmup_iterations=3
+        )
+        results[f"{width}x{height}"] = stats
+    # 打印总结
+    print("\n" + "=" * 60)
+    print("推理基准测试总结:")
+    for resolution, stats in results.items():
+        print(f"\n  分辨率 {resolution}:")
+        print(f"    平均时间: {stats['mean_ms']:.1f} ms")
+        print(f"    FPS: {stats['fps']:.1f}")
+    print("=" * 60)
+    return results
+def benchmark_sampling(config: dict):
+    """采样性能基准测试"""
+    print("\n" + "=" * 60)
+    print("采样性能基准测试")
+    print("=" * 60)
+    # 设备
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    # 加载模型配置
+    model_config = load_config('configs/model/unet_light.yaml')
+    # 创建模型
+    model = UNetLight(model_config).to(device)
+    model.eval()
+    # 创建扩散过程
+    diffusion_config = load_config('configs/model/diffusion.yaml')
+    diffusion = DiffusionProcess(diffusion_config)
+    # 创建采样器
+    sampler = DDIMSampler(model, diffusion, num_inference_steps=50)
+    # 测试不同采样步数
+    step_configs = [20, 30, 50]
+    results = {}
+    for num_steps in step_configs:
+        print(f"\n测试采样步数: {num_steps}")
+        # 设置采样步数
+        sampler.set_timesteps(num_steps)
+        # 准备输入
+        prompt_embeds = torch.randn(1, 77, 768, device=device)
+        # 预热
+        print("  预热...")
+        with torch.no_grad():
+            for _ in range(3):
+                _ = sampler.sample(
+                    prompt_embeds=prompt_embeds,
+                    height=512,
+                    width=512,
+                    progress_bar=False
+                )
+        # 基准测试
+        print("  运行基准测试...")
+        times = []
+        for i in range(5):
+            start_time = time.time()
+            with torch.no_grad():
+                _ = sampler.sample(
+                    prompt_embeds=prompt_embeds,
+                    height=512,
+                    width=512,
+                    progress_bar=False
+                )
+            if torch.cuda.is_available():
+                torch.cuda.synchronize()
+            end_time = time.time()
+            times.append(end_time - start_time)
+            print(f"    迭代 {i+1}: {times[-1]:.2f}s")
+        # 统计
+        times = np.array(times)
+        results[num_steps] = {
+            'mean_time': times.mean(),
+            'std_time': times.std(),
+            'fps': 1 / times.mean()
+        }
+    # 打印总结
+    print("\n" + "=" * 60)
+    print("采样基准测试总结:")
+    for num_steps, stats in results.items():
+        print(f"\n  采样步数 {num_steps}:")
+        print(f"    平均时间: {stats['mean_time']:.2f} ± {stats['std_time']:.2f} s")
+        print(f"    FPS: {stats['fps']:.2f}")
+    print("=" * 60)
+    return results
+def benchmark_memory(config: dict):
+    """内存使用基准测试"""
+    print("\n" + "=" * 60)
+    print("内存使用基准测试")
+    print("=" * 60)
+    if not torch.cuda.is_available():
+        print("GPU不可用，跳过内存基准测试")
+        return {}
+    # 设备
+    device = torch.device('cuda')
+    # 加载模型配置
+    model_config = load_config('configs/model/unet_light.yaml')
+    # 测试不同批次大小
+    batch_sizes = [1, 2, 4, 8]
+    results = {}
+    for batch_size in batch_sizes:
+        print(f"\n测试批次大小: {batch_size}")
+        # 创建模型
+        model = UNetLight(model_config).to(device)
+        model.eval()
+        # 准备输入
+        input_shape = (batch_size, model.in_channels, 64, 64)
+        x = torch.randn(*input_shape, device=device)
+        t = torch.tensor([500] * batch_size, device=device)
+        context = torch.randn(batch_size, 77, 768, device=device)
+        # 清空缓存
+        torch.cuda.empty_cache()
+        # 记录初始内存
+        initial_memory = torch.cuda.memory_allocated()
+        # 前向传播
+        with torch.no_grad():
+            _ = model(x, t, context)
+        # 记录峰值内存
+        peak_memory = torch.cuda.max_memory_allocated()
+        current_memory = torch.cuda.memory_allocated()
+        # 计算内存使用
+        memory_used = peak_memory - initial_memory
+        results[batch_size] = {
+            'initial_memory_gb': initial_memory / 1024**3,
+            'peak_memory_gb': peak_memory / 1024**3,
+            'current_memory_gb': current_memory / 1024**3,
+            'memory_used_gb': memory_used / 1024**3,
+            'memory_per_sample_gb': memory_used / (batch_size * 1024**3)
+        }
+        print(f"  初始内存: {initial_memory / 1024**3:.2f} GB")
+        print(f"  峰值内存: {peak_memory / 1024**3:.2f} GB")
+        print(f"  当前内存: {current_memory / 1024**3:.2f} GB")
+        print(f"  内存使用: {memory_used / 1024**3:.2f} GB")
+        print(f"  每样本内存: {memory_used / (batch_size * 1024**3):.2f} GB")
+        # 清理
+        del model
+        torch.cuda.empty_cache()
+    # 打印总结
+    print("\n" + "=" * 60)
+    print("内存基准测试总结:")
+    for batch_size, stats in results.items():
+        print(f"\n  批次大小 {batch_size}:")
+        print(f"    总内存使用: {stats['memory_used_gb']:.2f} GB")
+        print(f"    每样本内存: {stats['memory_per_sample_gb']:.2f} GB")
+    print("=" * 60)
+    return results
+def generate_report(results: dict, output_file: str = "benchmark_report.md"):
+    """生成基准测试报告"""
+    print(f"\n生成报告: {output_file}")
+    with open(output_file, 'w') as f:
+        f.write("# Lumina 性能基准测试报告\n\n")
+        f.write(f"生成时间: {time.strftime('%Y-%m-%d %H:%M:%S')}\n\n")
+        f.write("## 系统信息\n")
+        f.write(f"- PyTorch版本: {torch.__version__}\n")
+        f.write(f"- CUDA可用: {torch.cuda.is_available()}\n")
+        if torch.cuda.is_available():
+            f.write(f"- GPU: {torch.cuda.get_device_name(0)}\n")
+            f.write(f"- CUDA版本: {torch.version.cuda}\n")
+        f.write(f"- 系统内存: {os.sysconf('SC_PAGE_SIZE') * os.sysconf('SC_PHYS_PAGES') / 1024**3:.1f} GB\n\n")
+        if 'training' in results:
+            f.write("## 训练性能\n")
+            f.write(f"- 平均批次时间: {results['training']:.3f} s\n")
+            f.write(f"- 吞吐量: {1/results['training']:.2f} batches/s\n\n")
+        if 'inference' in results:
+            f.write("## 推理性能\n")
+            for resolution, stats in results['inference'].items():
+                f.write(f"### 分辨率 {resolution}\n")
+                f.write(f"- 平均推理时间: {stats['mean_ms']:.1f} ms\n")
+                f.write(f"- FPS: {stats['fps']:.1f}\n\n")
+        if 'sampling' in results:
+            f.write("## 采样性能\n")
+            for num_steps, stats in results['sampling'].items():
+                f.write(f"### 采样步数 {num_steps}\n")
+                f.write(f"- 平均采样时间: {stats['mean_time']:.2f} s\n")
+                f.write(f"- FPS: {stats['fps']:.2f}\n\n")
+        if 'memory' in results:
+            f.write("## 内存使用\n")
+            for batch_size, stats in results['memory'].items():
+                f.write(f"### 批次大小 {batch_size}\n")
+                f.write(f"- 总内存使用: {stats['memory_used_gb']:.2f} GB\n")
+                f.write(f"- 每样本内存: {stats['memory_per_sample_gb']:.2f} GB\n\n")
+        f.write("## 建议\n")
+        f.write("1. 根据GPU内存选择适当的批次大小\n")
+        f.write("2. 推理时使用适当的采样步数平衡质量和速度\n")
+        f.write("3. 训练时使用梯度累积来模拟大批次训练\n")
+    print(f"报告已保存: {output_file}")
+def main():
+    """主函数"""
+    parser = argparse.ArgumentParser(description="Lumina性能基准测试")
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="configs/training/p4_optimized.yaml",
+        help="配置文件路径"
+    )
+    parser.add_argument(
+        "--test",
+        type=str,
+        nargs="+",
+        default=['all'],
+        choices=['training', 'inference', 'sampling', 'memory', 'all'],
+        help="测试项目"
+    )
+    parser.add_argument(
+        "--output",
+        type=str,
+        default="benchmark_report.md",
+        help="输出报告文件"
+    )
+    args = parser.parse_args()
+    # 加载配置
+    config = load_config(args.config)
+    # 运行基准测试
+    results = {}
+    if 'all' in args.test or 'training' in args.test:
+        try:
+            results['training'] = benchmark_training(config)
+        except Exception as e:
+            print(f"训练基准测试失败: {e}")
+    if 'all' in args.test or 'inference' in args.test:
+        try:
+            results['inference'] = benchmark_inference(config)
+        except Exception as e:
+            print(f"推理基准测试失败: {e}")
+    if 'all' in args.test or 'sampling' in args.test:
+        try:
+            results['sampling'] = benchmark_sampling(config)
+        except Exception as e:
+            print(f"采样基准测试失败: {e}")
+    if 'all' in args.test or 'memory' in args.test:
+        try:
+            results['memory'] = benchmark_memory(config)
+        except Exception as e:
+            print(f"内存基准测试失败: {e}")
+    # 生成报告
+    generate_report(results, args.output)
+if __name__ == "__main__":
+    main()

scripts/download_laion.py ADDED Viewed

	@@ -0,0 +1,272 @@

+#!/usr/bin/env python3
+"""
+LAION数据集下载脚本
+这个脚本帮助下载LAION数据集的不同版本。
+LAION数据集很大，通常需要下载元数据文件和图像文件。
+注意：完整LAION数据集非常大（数TB），建议下载子集或使用现有缓存。
+"""
+import os
+import argparse
+import subprocess
+import pandas as pd
+from pathlib import Path
+import requests
+import json
+from tqdm import tqdm
+import sys
+def download_file(url, output_path, chunk_size=8192):
+    """下载文件并显示进度条"""
+    response = requests.get(url, stream=True)
+    response.raise_for_status()
+    total_size = int(response.headers.get('content-length', 0))
+    with open(output_path, 'wb') as f, tqdm(
+        desc=os.path.basename(output_path),
+        total=total_size,
+        unit='B',
+        unit_scale=True,
+        unit_divisor=1024,
+    ) as pbar:
+        for chunk in response.iter_content(chunk_size=chunk_size):
+            f.write(chunk)
+            pbar.update(len(chunk))
+    return output_path
+def download_laion_aesthetic(output_dir="./data/laion", subset="6.5+"):
+    """
+    下载LAION-Aesthetic数据集
+    Args:
+        output_dir: 输出目录
+        subset: 子集版本，可选 "6.5+" (6.5分以上), "7.0+" (7.0分以上)
+    """
+    os.makedirs(output_dir, exist_ok=True)
+    # LAION-Aesthetic数据集信息
+    datasets = {
+        "6.5+": {
+            "metadata": "https://huggingface.co/datasets/laion/laion-aesthetic-6.5plus/resolve/main/data/00000.parquet",
+            "description": "LAION-Aesthetic 6.5+ (美学评分6.5分以上)"
+        },
+        "7.0+": {
+            "metadata": "https://huggingface.co/datasets/laion/laion-aesthetic-7.0plus/resolve/main/data/00000.parquet",
+            "description": "LAION-Aesthetic 7.0+ (美学评分7.0分以上)"
+        }
+    }
+    if subset not in datasets:
+        print(f"错误: 不支持的子集 {subset}")
+        print(f"可用子集: {list(datasets.keys())}")
+        return False
+    dataset_info = datasets[subset]
+    print(f"下载 {dataset_info['description']}")
+    # 下载元数据文件
+    metadata_url = dataset_info["metadata"]
+    metadata_path = os.path.join(output_dir, "metadata.parquet")
+    print(f"下载元数据文件到: {metadata_path}")
+    try:
+        download_file(metadata_url, metadata_path)
+        print(f"元数据文件下载完成: {metadata_path}")
+        # 验证文件
+        df = pd.read_parquet(metadata_path)
+        print(f"元数据包含 {len(df)} 条记录")
+        print(f"列名: {list(df.columns)}")
+        # 保存样本信息
+        sample_info = {
+            "total_samples": len(df),
+            "columns": list(df.columns),
+            "subset": subset,
+            "description": dataset_info["description"]
+        }
+        with open(os.path.join(output_dir, "dataset_info.json"), "w") as f:
+            json.dump(sample_info, f, indent=2)
+        print(f"数据集信息已保存到: {os.path.join(output_dir, 'dataset_info.json')}")
+        return True
+    except Exception as e:
+        print(f"下载失败: {e}")
+        return False
+def download_laion_5b_sample(output_dir="./data/laion", num_samples=10000):
+    """
+    下载LAION-5B数据集的样本
+    Args:
+        output_dir: 输出目录
+        num_samples: 样本数量
+    """
+    os.makedirs(output_dir, exist_ok=True)
+    print(f"下载LAION-5B数据集样本 ({num_samples}条记录)")
+    # LAION-5B数据集分片URL示例
+    # 注意：完整数据集有数万个分片，这里只下载一个样本分片
+    sample_shard = "https://huggingface.co/datasets/laion/laion2b-en/resolve/main/part-00000-5b54c5d5-bbcf-484d-a2ce-0d6f73df1a36-c000.snappy.parquet"
+    metadata_path = os.path.join(output_dir, "metadata_sample.parquet")
+    print(f"下载样本分片到: {metadata_path}")
+    try:
+        download_file(sample_shard, metadata_path)
+        print(f"样本分片下载完成: {metadata_path}")
+        # 读取并采样
+        df = pd.read_parquet(metadata_path)
+        if len(df) > num_samples:
+            df = df.sample(num_samples, random_state=42)
+        # 保存采样后的数据
+        sampled_path = os.path.join(output_dir, "metadata.parquet")
+        df.to_parquet(sampled_path)
+        print(f"采样数据保存到: {sampled_path}")
+        print(f"采样后包含 {len(df)} 条记录")
+        print(f"列名: {list(df.columns)}")
+        # 保存样本信息
+        sample_info = {
+            "total_samples": len(df),
+            "original_samples": len(pd.read_parquet(metadata_path)),
+            "columns": list(df.columns),
+            "dataset": "LAION-5B-sample",
+            "description": f"LAION-5B数据集样本 ({num_samples}条记录)"
+        }
+        with open(os.path.join(output_dir, "dataset_info.json"), "w") as f:
+            json.dump(sample_info, f, indent=2)
+        print(f"数据集信息已保存到: {os.path.join(output_dir, 'dataset_info.json')}")
+        return True
+    except Exception as e:
+        print(f"下载失败: {e}")
+        return False
+def create_dummy_dataset(output_dir="./data/laion", num_samples=100):
+    """
+    创建虚拟数据集用于测试
+    Args:
+        output_dir: 输出目录
+        num_samples: 样本数量
+    """
+    os.makedirs(output_dir, exist_ok=True)
+    print(f"创建虚拟数据集 ({num_samples}条记录)")
+    import numpy as np
+    # 创建虚拟数据
+    data = {
+        'url': [f'https://example.com/image_{i}.jpg' for i in range(num_samples)],
+        'caption': [f'A beautiful image number {i}' for i in range(num_samples)],
+        'aesthetic_score': np.random.uniform(5.0, 9.0, num_samples),
+        'watermark_prob': np.random.uniform(0.0, 1.0, num_samples),
+        'NSFW': ['UNLIKELY'] * num_samples,
+        'image_file': [f'image_{i:06d}.jpg' for i in range(num_samples)]
+    }
+    df = pd.DataFrame(data)
+    # 保存元数据
+    metadata_path = os.path.join(output_dir, "metadata.parquet")
+    df.to_parquet(metadata_path)
+    print(f"虚拟元数据创建完成: {metadata_path}")
+    print(f"包含 {len(df)} 条记录")
+    print(f"列名: {list(df.columns)}")
+    # 创建虚拟图像目录
+    images_dir = os.path.join(output_dir, "images")
+    os.makedirs(images_dir, exist_ok=True)
+    print(f"虚拟图像目录: {images_dir}")
+    print("注意：虚拟数据集不包含实际图像文件，仅用于测试代码流程")
+    # 保存数据集信息
+    sample_info = {
+        "total_samples": len(df),
+        "columns": list(df.columns),
+        "dataset": "dummy",
+        "description": f"虚拟测试数据集 ({num_samples}条记录)",
+        "note": "不包含实际图像文件，仅用于测试代码流程"
+    }
+    with open(os.path.join(output_dir, "dataset_info.json"), "w") as f:
+        json.dump(sample_info, f, indent=2)
+    print(f"数据集信息已保存到: {os.path.join(output_dir, 'dataset_info.json')}")
+    return True
+def main():
+    parser = argparse.ArgumentParser(description="下载LAION数据集")
+    parser.add_argument("--output-dir", default="./data/laion", help="输出目录")
+    parser.add_argument("--subset", default="6.5+", choices=["6.5+", "7.0+"],
+                       help="LAION-Aesthetic子集版本")
+    parser.add_argument("--sample-size", type=int, default=10000,
+                       help="LAION-5B样本大小")
+    parser.add_argument("--dummy", action="store_true",
+                       help="创建虚拟数据集用于测试")
+    parser.add_argument("--dummy-size", type=int, default=100,
+                       help="虚拟数据集大小")
+    args = parser.parse_args()
+    print("=" * 60)
+    print("LAION数据集下载工具")
+    print("=" * 60)
+    if args.dummy:
+        print("\n创建虚拟数据集模式...")
+        success = create_dummy_dataset(args.output_dir, args.dummy_size)
+    else:
+        print("\n下载LAION-Aesthetic数据集...")
+        print(f"输出目录: {args.output_dir}")
+        print(f"子集版本: {args.subset}")
+        print("\n注意：")
+        print("1. LAION数据集很大，下载需要时间和存储空间")
+        print("2. 元数据文件通常几百MB到几GB")
+        print("3. 图像文件需要额外下载")
+        print("4. 建议先使用虚拟数据集测试代码流程")
+        response = input("\n是否继续? (y/n): ")
+        if response.lower() != 'y':
+            print("取消下载")
+            return
+        success = download_laion_aesthetic(args.output_dir, args.subset)
+    if success:
+        print("\n" + "=" * 60)
+        print("下载完成！")
+        print("=" * 60)
+        print("\n下一步：")
+        print("1. 检查下载的文件:")
+        print(f"   ls -lh {args.output_dir}/")
+        print("2. 测试数据集加载:")
+        print("   python -c \"import pandas as pd; df=pd.read_parquet('{}'); print('记录数:', len(df))\"".format(
+            os.path.join(args.output_dir, "metadata.parquet")))
+        print("3. 运行测试脚本:")
+        print("   python src/data/dataset.py")
+    else:
+        print("\n下载失败，请检查错误信息")
+if __name__ == "__main__":
+    main()

scripts/export.py ADDED Viewed

	@@ -0,0 +1,270 @@

+#!/usr/bin/env python3
+"""
+模型导出脚本
+用于导出训练好的模型为不同格式
+"""
+import os
+import sys
+import argparse
+import yaml
+import torch
+import torch.nn as nn
+# 添加项目根目录到Python路径
+sys.path.append(os.path.dirname(os.path.dirname(__file__)))
+from src.models.unet_light import UNetLight
+from src.models.diffusion import DiffusionProcess
+from src.inference.optimization import ModelOptimizer, ONNXExporter, optimize_model_for_p4
+def load_config(config_path: str) -> dict:
+    """加载配置文件"""
+    with open(config_path, 'r') as f:
+        config = yaml.safe_load(f)
+    return config
+def load_model(checkpoint_path: str, config: dict, device: torch.device) -> nn.Module:
+    """加载模型"""
+    # 加载模型配置
+    model_config_path = config.get('model_config', 'configs/model/unet_light.yaml')
+    model_config = load_config(model_config_path)
+    # 创建模型
+    model = UNetLight(model_config)
+    # 加载检查点
+    print(f"加载检查点: {checkpoint_path}")
+    checkpoint = torch.load(checkpoint_path, map_location='cpu')
+    # 加载模型权重
+    if 'model_state_dict' in checkpoint:
+        model.load_state_dict(checkpoint['model_state_dict'])
+    elif 'state_dict' in checkpoint:
+        model.load_state_dict(checkpoint['state_dict'])
+    else:
+        model.load_state_dict(checkpoint)
+    # 移动到设备
+    model = model.to(device)
+    model.eval()
+    print(f"模型加载完成")
+    return model
+def export_torchscript(model: nn.Module, output_path: str):
+    """导出为TorchScript格式"""
+    print(f"导出为TorchScript: {output_path}")
+    # 创建示例输入
+    example_input = torch.randn(1, model.in_channels, 64, 64)
+    example_timestep = torch.tensor([500])
+    example_context = torch.randn(1, 77, 768)
+    # 跟踪模型
+    traced_model = torch.jit.trace(
+        model,
+        (example_input, example_timestep, example_context),
+        check_trace=False
+    )
+    # 保存
+    traced_model.save(output_path)
+    print(f"TorchScript模型已保存: {output_path}")
+    return traced_model
+def export_onnx(model: nn.Module, output_path: str, opset_version: int = 14):
+    """导出为ONNX格式"""
+    print(f"导出为ONNX: {output_path}")
+    # 创建示例输入
+    example_input = torch.randn(1, model.in_channels, 64, 64)
+    example_timestep = torch.tensor([500])
+    example_context = torch.randn(1, 77, 768)
+    # 设置动态轴
+    dynamic_axes = {
+        'input': {0: 'batch_size'},
+        'timestep': {0: 'batch_size'},
+        'context': {0: 'batch_size'},
+        'output': {0: 'batch_size'}
+    }
+    # 导出
+    torch.onnx.export(
+        model,
+        (example_input, example_timestep, example_context),
+        output_path,
+        input_names=['input', 'timestep', 'context'],
+        output_names=['output'],
+        dynamic_axes=dynamic_axes,
+        opset_version=opset_version,
+        do_constant_folding=True,
+        verbose=False
+    )
+    print(f"ONNX模型已保存: {output_path}")
+    # 验证ONNX模型
+    import onnx
+    onnx_model = onnx.load(output_path)
+    onnx.checker.check_model(onnx_model)
+    print("ONNX模型验证成功")
+def export_safetensors(model: nn.Module, output_path: str):
+    """导出为safetensors格式"""
+    try:
+        from safetensors.torch import save_file
+        # 转换为safetensors格式
+        state_dict = model.state_dict()
+        save_file(state_dict, output_path)
+        print(f"Safetensors模型已保存: {output_path}")
+    except ImportError:
+        print("safetensors未安装，跳过safetensors导出")
+        print("安装: pip install safetensors")
+def optimize_and_export(
+    checkpoint_path: str,
+    output_dir: str,
+    formats: list = ['torchscript', 'onnx', 'safetensors'],
+    optimize_for_p4: bool = True
+):
+    """优化并导出模型"""
+    # 创建输出目录
+    os.makedirs(output_dir, exist_ok=True)
+    # 加载配置
+    config_path = "configs/training/p4_optimized.yaml"
+    config = load_config(config_path)
+    # 设备
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    # 加载模型
+    model = load_model(checkpoint_path, config, device)
+    # 优化模型（针对P4）
+    if optimize_for_p4:
+        print("优化模型（针对P4）...")
+        model = optimize_model_for_p4(model)
+    # 获取模型信息
+    total_params = sum(p.numel() for p in model.parameters())
+    model_size_mb = total_params * 4 / 1024**2  # fp32
+    print(f"\n模型信息:")
+    print(f"  参数量: {total_params:,}")
+    print(f"  模型大小: {model_size_mb:.2f} MB (fp32)")
+    # 导���为不同格式
+    base_name = os.path.splitext(os.path.basename(checkpoint_path))[0]
+    for fmt in formats:
+        if fmt == 'torchscript':
+            output_path = os.path.join(output_dir, f"{base_name}.torchscript.pt")
+            export_torchscript(model, output_path)
+        elif fmt == 'onnx':
+            output_path = os.path.join(output_dir, f"{base_name}.onnx")
+            export_onnx(model, output_path)
+        elif fmt == 'safetensors':
+            output_path = os.path.join(output_dir, f"{base_name}.safetensors")
+            export_safetensors(model, output_path)
+        elif fmt == 'pth':
+            output_path = os.path.join(output_dir, f"{base_name}.pth")
+            torch.save(model.state_dict(), output_path)
+            print(f"PyTorch模型已保存: {output_path}")
+        else:
+            print(f"未知的格式: {fmt}")
+    print(f"\n所有模型已导出到: {output_dir}")
+def create_lite_version(model: nn.Module, reduction_factor: float = 0.5) -> nn.Module:
+    """创建轻量版本（通过减少通道数）"""
+    # 注意：这是一个示例，需要根据实际模型结构调整
+    print(f"创建轻量版本，减少因子: {reduction_factor}")
+    # 这里应该实现具体的轻量化逻辑
+    # 例如，减少UNet的通道数
+    return model
+def main():
+    """主函数"""
+    parser = argparse.ArgumentParser(description="导出Lumina模型")
+    parser.add_argument(
+        "--checkpoint",
+        type=str,
+        required=True,
+        help="模型检查点路径"
+    )
+    parser.add_argument(
+        "--output-dir",
+        type=str,
+        default="./exported_models",
+        help="输出目录"
+    )
+    parser.add_argument(
+        "--formats",
+        type=str,
+        nargs="+",
+        default=['torchscript', 'onnx'],
+        choices=['torchscript', 'onnx', 'safetensors', 'pth'],
+        help="导出格式"
+    )
+    parser.add_argument(
+        "--optimize",
+        action="store_true",
+        help="优化模型（针对P4）"
+    )
+    parser.add_argument(
+        "--lite",
+        action="store_true",
+        help="创建轻量版本"
+    )
+    parser.add_argument(
+        "--lite-factor",
+        type=float,
+        default=0.5,
+        help="轻量化减少因子"
+    )
+    args = parser.parse_args()
+    # 检查输入文件
+    if not os.path.exists(args.checkpoint):
+        print(f"错误: 检查点文件不存在: {args.checkpoint}")
+        return
+    # 优化并导出
+    optimize_and_export(
+        checkpoint_path=args.checkpoint,
+        output_dir=args.output_dir,
+        formats=args.formats,
+        optimize_for_p4=args.optimize
+    )
+if __name__ == "__main__":
+    main()

scripts/train.py ADDED Viewed

	@@ -0,0 +1,293 @@

+#!/usr/bin/env python3
+"""
+Lumina训练脚本
+用于训练轻量级图像生成模型
+"""
+import os
+import sys
+import argparse
+import yaml
+import torch
+import torch.nn as nn
+from torch.utils.data import DataLoader
+import warnings
+# 添加项目根目录到Python路径
+sys.path.append(os.path.dirname(os.path.dirname(__file__)))
+from src.models.unet_light import UNetLight
+from src.models.diffusion import DiffusionProcess, DiffusionModel
+from src.data.dataset import create_data_loaders
+from src.data.text_encoder import create_text_encoder
+from src.training.trainer_p4 import P4Trainer
+from src.training.memory_manager import MemoryOptimizer
+from src.training.callbacks import create_default_callbacks
+def load_config(config_path: str) -> dict:
+    """加载配置文件"""
+    with open(config_path, 'r') as f:
+        config = yaml.safe_load(f)
+    return config
+def setup_environment(config: dict):
+    """设置训练环境"""
+    # 设置随机种子
+    seed = config.get('seed', 42)
+    torch.manual_seed(seed)
+    if torch.cuda.is_available():
+        torch.cuda.manual_seed(seed)
+    # 设置CUDA设备
+    device = config.get('device', 'cuda' if torch.cuda.is_available() else 'cpu')
+    if device == 'cuda' and not torch.cuda.is_available():
+        warnings.warn("CUDA不可用，使用CPU")
+        device = 'cpu'
+    # 创建输出目录
+    output_dir = config.get('output_dir', './output')
+    os.makedirs(output_dir, exist_ok=True)
+    # 设置日志
+    log_dir = config.get('log_dir', './logs')
+    os.makedirs(log_dir, exist_ok=True)
+    print(f"环境设置完成:")
+    print(f"  设备: {device}")
+    print(f"  随机种子: {seed}")
+    print(f"  输出目录: {output_dir}")
+    print(f"  日志目录: {log_dir}")
+    return device
+def create_model(config: dict, device: torch.device) -> nn.Module:
+    """创建模型"""
+    # 加载模型配置
+    model_config_path = config.get('model_config', 'configs/model/unet_light.yaml')
+    model_config = load_config(model_config_path)
+    # 创建UNet模型
+    model = UNetLight(model_config)
+    # 加载预训练权重（如果有）
+    pretrained_path = config.get('pretrained_path')
+    if pretrained_path and os.path.exists(pretrained_path):
+        print(f"加载预训练权重: {pretrained_path}")
+        checkpoint = torch.load(pretrained_path, map_location='cpu')
+        model.load_state_dict(checkpoint['model_state_dict'])
+    # 移动到设备
+    model = model.to(device)
+    # 打印模型信息
+    total_params = sum(p.numel() for p in model.parameters())
+    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
+    print(f"模型创建完成:")
+    print(f"  总参数量: {total_params:,}")
+    print(f"  可训练参数量: {trainable_params:,}")
+    print(f"  模型大小: {total_params * 4 / 1024**2:.2f} MB (fp32)")
+    return model
+def create_diffusion(config: dict) -> DiffusionProcess:
+    """创建扩散过程"""
+    diffusion_config_path = config.get('diffusion_config', 'configs/model/diffusion.yaml')
+    diffusion_config = load_config(diffusion_config_path)
+    diffusion = DiffusionProcess(diffusion_config)
+    print(f"扩散过程创建完成:")
+    print(f"  训练时间步: {diffusion.num_train_timesteps}")
+    print(f"  推理时间步: {diffusion.num_inference_timesteps}")
+    print(f"  Beta调度: {diffusion.beta_schedule}")
+    return diffusion
+def create_data_pipeline(config: dict):
+    """创建数据管道"""
+    data_config_path = config.get('data_config', 'configs/data/laion_filtered.yaml')
+    data_config = load_config(data_config_path)
+    # 创建文本编码器
+    text_encoder = create_text_encoder(data_config)
+    # 创建数据加载器
+    train_loader, val_loader = create_data_loaders(data_config)
+    print(f"数据管道创建完成:")
+    print(f"  训练集大小: {len(train_loader.dataset)}")
+    print(f"  验证集大小: {len(val_loader.dataset) if val_loader else 0}")
+    print(f"  批次大小: {train_loader.batch_size}")
+    print(f"  梯度累积步数: {config.get('gradient_accumulation_steps', 8)}")
+    return train_loader, val_loader, text_encoder
+def create_optimizer(model: nn.Module, config: dict):
+    """创建优化器"""
+    optimizer_config = config.get('optimizer', {})
+    optimizer_type = optimizer_config.get('type', 'AdamW')
+    learning_rate = optimizer_config.get('learning_rate', 1e-4)
+    weight_decay = optimizer_config.get('weight_decay', 0.01)
+    if optimizer_type == 'AdamW':
+        optimizer = torch.optim.AdamW(
+            model.parameters(),
+            lr=learning_rate,
+            weight_decay=weight_decay,
+            betas=(0.9, 0.999),
+            eps=1e-8
+        )
+    elif optimizer_type == 'Adam':
+        optimizer = torch.optim.Adam(
+            model.parameters(),
+            lr=learning_rate,
+            weight_decay=weight_decay
+        )
+    else:
+        raise ValueError(f"未知的优化器类型: {optimizer_type}")
+    print(f"优化器创建完成:")
+    print(f"  类型: {optimizer_type}")
+    print(f"  学习率: {learning_rate}")
+    print(f"  权重衰减: {weight_decay}")
+    return optimizer
+def setup_memory_optimization(model: nn.Module, optimizer, config: dict):
+    """设置内存优化"""
+    memory_optimizer = MemoryOptimizer(config)
+    memory_optimizer.setup_model_optimizations(model, optimizer)
+    # 打印内存信息
+    if torch.cuda.is_available():
+        allocated = torch.cuda.memory_allocated() / 1024**3
+        reserved = torch.cuda.memory_reserved() / 1024**3
+        print(f"内存优化设置完成:")
+        print(f"  GPU已分配: {allocated:.2f} GB")
+        print(f"  GPU已保留: {reserved:.2f} GB")
+    return memory_optimizer
+def train(config_path: str, resume_from: str = None):
+    """训练主函数"""
+    print("=" * 60)
+    print("Lumina 训练开始")
+    print("=" * 60)
+    # 加载配置
+    config = load_config(config_path)
+    # 设置环境
+    device = setup_environment(config)
+    # 创建模型
+    model = create_model(config, device)
+    # 创建扩散过程
+    diffusion = create_diffusion(config)
+    # 创建扩散模型
+    diffusion_model = DiffusionModel(model, diffusion)
+    # 创建数据管道
+    train_loader, val_loader, text_encoder = create_data_pipeline(config)
+    # 创建优化器
+    optimizer = create_optimizer(model, config)
+    # 设置内存优化
+    memory_optimizer = setup_memory_optimization(model, optimizer, config)
+    # 创建训练器
+    trainer = P4Trainer(
+        model=model,
+        diffusion=diffusion,
+        optimizer=optimizer,
+        train_loader=train_loader,
+        val_loader=val_loader,
+        config=config,
+        device=device
+    )
+    # 创建回调
+    callbacks = create_default_callbacks(config)
+    # 加载检查点（如果存在）
+    if resume_from and os.path.exists(resume_from):
+        print(f"从检查点恢复训练: {resume_from}")
+        trainer.load_checkpoint(resume_from)
+    # 开始训练
+    try:
+        print("\n开始训练...")
+        trainer.train()
+        print("\n" + "=" * 60)
+        print("训练完成!")
+        print(f"最佳验证损失: {trainer.best_loss:.4f}")
+        print(f"总训练步数: {trainer.global_step}")
+        print("=" * 60)
+    except KeyboardInterrupt:
+        print("\n训练被中断")
+    except Exception as e:
+        print(f"\n训练出错: {e}")
+        import traceback
+        traceback.print_exc()
+    finally:
+        # 保存最终检查点
+        final_checkpoint = os.path.join(
+            config.get('checkpoint_dir', './checkpoints'),
+            'final_model.pt'
+        )
+        trainer.save_checkpoint(final_checkpoint)
+def main():
+    """主函数"""
+    parser = argparse.ArgumentParser(description="训练Lumina图像生成模型")
+    parser.add_argument(
+        "--config",
+        type=str,
+        default="configs/training/p4_optimized.yaml",
+        help="训练配置文件路径"
+    )
+    parser.add_argument(
+        "--resume",
+        type=str,
+        help="从检查点恢复训练"
+    )
+    parser.add_argument(
+        "--debug",
+        action="store_true",
+        help="调试模式"
+    )
+    args = parser.parse_args()
+    # 调试模式设置
+    if args.debug:
+        import warnings
+        warnings.filterwarnings("always")
+        torch.autograd.set_detect_anomaly(True)
+        print("调试模式已启用")
+    # 开始训练
+    train(args.config, args.resume)
+if __name__ == "__main__":
+    main()

src/data/__pycache__/dataset.cpython-313.pyc ADDED Viewed

Binary file (13.8 kB). View file

src/data/dataset.py ADDED Viewed

	@@ -0,0 +1,298 @@

+import torch
+from torch.utils.data import Dataset, DataLoader
+from PIL import Image
+import pandas as pd
+import os
+from typing import Dict, List, Optional, Tuple
+import numpy as np
+import json
+from pathlib import Path
+class LAIONDataset(Dataset):
+    """LAION数据集"""
+    def __init__(self, config: dict, transform=None, split: str = 'train'):
+        self.config = config
+        self.transform = transform
+        self.split = split
+        # 加载元数据
+        metadata_path = config['dataset'].get('metadata_file', './data/laion/metadata.parquet')
+        self.metadata = pd.read_parquet(metadata_path)
+        # 应用过滤条件
+        self._apply_filters(config.get('filters', {}))
+        # 数据拆分
+        self._split_data(config.get('split', {}))
+        # 缓存
+        self.use_cache = config.get('use_cache', False)
+        self.cache_dir = config.get('cache_dir', './data/cache')
+        if self.use_cache:
+            os.makedirs(self.cache_dir, exist_ok=True)
+        # 限制样本数
+        max_samples = config.get('max_samples', None)
+        if max_samples is not None and len(self.metadata) > max_samples:
+            self.metadata = self.metadata.sample(max_samples, random_state=42)
+        # 文本缓存
+        self.text_cache = {}
+        print(f"数据集加载完成: {len(self.metadata)} 个样本 ({split}集)")
+    def _apply_filters(self, filters: dict):
+        """应用过滤条件"""
+        if 'aesthetic_score' in filters:
+            threshold = filters['aesthetic_score']
+            if 'aesthetic_score' in self.metadata.columns:
+                self.metadata = self.metadata[self.metadata['aesthetic_score'] >= threshold]
+        if 'watermark_prob' in filters:
+            threshold = filters['watermark_prob']
+            if 'watermark_prob' in self.metadata.columns:
+                self.metadata = self.metadata[self.metadata['watermark_prob'] <= threshold]
+        if 'nsfw' in filters and not filters['nsfw']:
+            if 'NSFW' in self.metadata.columns:
+                self.metadata = self.metadata[self.metadata['NSFW'] != 'NSFW']
+    def _split_data(self, split_config: dict):
+        """拆分数据集"""
+        if self.split not in ['train', 'val']:
+            return
+        train_ratio = split_config.get('train', 0.95)
+        val_ratio = split_config.get('val', 0.05)
+        # 确保拆分比例之和为1
+        total = train_ratio + val_ratio
+        train_ratio /= total
+        val_ratio /= total
+        # 随机拆分
+        seed = split_config.get('seed', 42)
+        shuffled = self.metadata.sample(frac=1, random_state=seed).reset_index(drop=True)
+        if self.split == 'train':
+            split_point = int(len(shuffled) * train_ratio)
+            self.metadata = shuffled[:split_point]
+        else:
+            split_point = int(len(shuffled) * train_ratio)
+            self.metadata = shuffled[split_point:]
+    def __len__(self) -> int:
+        return len(self.metadata)
+    def _get_image_path(self, row) -> str:
+        """获取图像路径"""
+        # 尝试不同的列名
+        for col in ['image_file', 'filepath', 'path', 'url_local']:
+            if col in row:
+                path = row[col]
+                # 如果是相对路径，添加基础路径
+                if not os.path.isabs(path):
+                    base_path = self.config['dataset'].get('path', './data/laion')
+                    path = os.path.join(base_path, path)
+                return path
+        # 如果没有找到路径，使用URL哈希
+        if 'url' in row:
+            import hashlib
+            url_hash = hashlib.md5(row['url'].encode()).hexdigest()
+            base_path = self.config['dataset'].get('path', './data/laion')
+            path = os.path.join(base_path, f"{url_hash}.jpg")
+            return path
+        raise ValueError(f"无法找到图像路径: {row}")
+    def __getitem__(self, idx: int) -> Dict:
+        row = self.metadata.iloc[idx]
+        # 缓存键
+        cache_key = f"{self.split}_{idx}"
+        # 检查缓存
+        if self.use_cache and cache_key in self.text_cache:
+            text_embedding = self.text_cache[cache_key]
+        else:
+            # 获取文本描述
+            text = row.get('caption', row.get('text', row.get('description', '')))
+            # 这里应该调用文本编码器，但为了简化，我们返回原始文本
+            # 在实际使用中，应该使用预训练的CLIP编码器
+            text_embedding = text
+            # 缓存
+            if self.use_cache:
+                self.text_cache[cache_key] = text_embedding
+        # 获取图像
+        try:
+            image_path = self._get_image_path(row)
+            image = Image.open(image_path).convert('RGB')
+            # 应用变换
+            if self.transform:
+                image = self.transform(image)
+        except Exception as e:
+            # 如果图像加载失败，返回一个空白图像
+            print(f"加载图像失败 {image_path}: {e}")
+            image = torch.zeros(3, 512, 512)
+            text = "invalid image"
+            text_embedding = text
+        return {
+            'image': image,
+            'text': text_embedding if isinstance(text_embedding, str) else '',
+            'text_embedding': text_embedding if not isinstance(text_embedding, str) else None,
+            'image_path': image_path if 'image_path' in locals() else '',
+            'index': idx
+        }
+class TextImageDataset(Dataset):
+    """文本-图像对数据集"""
+    def __init__(self, image_dir: str, caption_file: str, transform=None):
+        self.image_dir = image_dir
+        self.transform = transform
+        # 加载标注文件
+        if caption_file.endswith('.json'):
+            with open(caption_file, 'r') as f:
+                self.captions = json.load(f)
+        elif caption_file.endswith('.csv'):
+            self.captions = pd.read_csv(caption_file)
+        else:
+            raise ValueError(f"不支持的标注文件格式: {caption_file}")
+        # 验证图像文件是否存在
+        self.valid_samples = []
+        for item in self.captions:
+            if isinstance(item, dict):
+                image_name = item.get('image_name', item.get('file_name', ''))
+                caption = item.get('caption', '')
+            else:
+                image_name = item[0]
+                caption = item[1]
+            image_path = os.path.join(self.image_dir, image_name)
+            if os.path.exists(image_path):
+                self.valid_samples.append((image_path, caption))
+        print(f"找到 {len(self.valid_samples)} 个有效样本")
+    def __len__(self) -> int:
+        return len(self.valid_samples)
+    def __getitem__(self, idx: int) -> Dict:
+        image_path, caption = self.valid_samples[idx]
+        # 加载图像
+        image = Image.open(image_path).convert('RGB')
+        # 应用变换
+        if self.transform:
+            image = self.transform(image)
+        return {
+            'image': image,
+            'text': caption,
+            'image_path': image_path
+        }
+class CachedDataset(Dataset):
+    """缓存数据集，加速训练"""
+    def __init__(self, dataset: Dataset, cache_dir: str = './cache'):
+        self.dataset = dataset
+        self.cache_dir = cache_dir
+        os.makedirs(cache_dir, exist_ok=True)
+        self.cache_files = []
+        for i in range(len(dataset)):
+            cache_file = os.path.join(cache_dir, f'sample_{i}.pt')
+            self.cache_files.append(cache_file)
+    def __len__(self) -> int:
+        return len(self.dataset)
+    def __getitem__(self, idx: int) -> Dict:
+        cache_file = self.cache_files[idx]
+        # 如果缓存存在，直接加载
+        if os.path.exists(cache_file):
+            try:
+                return torch.load(cache_file)
+            except:
+                pass
+        # 否则，从原始数据集加载并缓存
+        sample = self.dataset[idx]
+        torch.save(sample, cache_file)
+        return sample
+def create_data_loaders(config: dict) -> Tuple[DataLoader, Optional[DataLoader]]:
+    """创建数据加载器"""
+    from .preprocessing import get_transform
+    # 获取数据变换
+    train_transform = get_transform(config, mode='train')
+    val_transform = get_transform(config, mode='val')
+    # 创建数据集
+    train_dataset = LAIONDataset(config, transform=train_transform, split='train')
+    val_dataset = LAIONDataset(config, transform=val_transform, split='val')
+    # 可选：启用缓存
+    if config.get('cache_dataset', True):
+        train_dataset = CachedDataset(train_dataset, cache_dir='./data/cache/train')
+        val_dataset = CachedDataset(val_dataset, cache_dir='./data/cache/val')
+    # 创建数据加载器
+    train_loader = DataLoader(
+        train_dataset,
+        batch_size=config.get('batch_size', 1),
+        shuffle=config.get('shuffle', True),
+        num_workers=config.get('num_workers', 2),
+        pin_memory=config.get('pin_memory', True),
+        prefetch_factor=config.get('prefetch_factor', 2),
+        persistent_workers=config.get('persistent_workers', True)
+    )
+    val_loader = DataLoader(
+        val_dataset,
+        batch_size=1,  # 验证时批次大小为1
+        shuffle=False,
+        num_workers=config.get('num_workers', 2),
+        pin_memory=True
+    )
+    return train_loader, val_loader
+def test_dataset():
+    """测试数据集"""
+    import yaml
+    # 加载配置
+    with open('configs/data/laion_filtered.yaml', 'r') as f:
+        config = yaml.safe_load(f)
+    # 创建数据集
+    dataset = LAIONDataset(config, split='train')
+    # 测试样本
+    sample = dataset[0]
+    print(f"样本键: {list(sample.keys())}")
+    print(f"图像形状: {sample['image'].shape if hasattr(sample['image'], 'shape') else type(sample['image'])}")
+    print(f"文本: {sample['text'][:100]}...")
+    return dataset
+if __name__ == '__main__':
+    test_dataset()

src/data/preprocessing.py ADDED Viewed

	@@ -0,0 +1,295 @@

+import torch
+import torchvision.transforms as T
+from PIL import Image
+import numpy as np
+from typing import Dict, List, Optional, Tuple
+import random
+def get_transform(config: dict, mode: str = 'train') -> T.Compose:
+    """获取数据预处理变换"""
+    preprocessing = config.get('preprocessing', {})
+    target_size = preprocessing.get('target_size', 512)
+    resize_mode = preprocessing.get('resize_mode', 'center_crop')
+    transforms_list = []
+    # 训练和验证的不同变换
+    if mode == 'train':
+        # 随机裁剪
+        if preprocessing.get('random_crop', True):
+            transforms_list.append(T.RandomResizedCrop(
+                target_size,
+                scale=(0.8, 1.0),
+                ratio=(0.8, 1.2)
+            ))
+        else:
+            transforms_list.append(T.Resize(target_size, interpolation=T.InterpolationMode.BILINEAR))
+        # 随机水平翻转
+        if preprocessing.get('random_flip', True):
+            transforms_list.append(T.RandomHorizontalFlip(p=0.5))
+        # 颜色抖动
+        if preprocessing.get('color_jitter', 0.05) > 0:
+            color_jitter = preprocessing['color_jitter']
+            transforms_list.append(T.ColorJitter(
+                brightness=color_jitter,
+                contrast=color_jitter,
+                saturation=color_jitter,
+                hue=min(0.1, color_jitter)
+            ))
+        # 随机旋转
+        if preprocessing.get('random_rotation', 0.0) > 0:
+            max_angle = preprocessing['random_rotation']
+            transforms_list.append(T.RandomRotation(degrees=(-max_angle, max_angle)))
+    else:  # 验证/测试模式
+        # 中心裁剪
+        if resize_mode == 'center_crop':
+            transforms_list.extend([
+                T.Resize(target_size, interpolation=T.InterpolationMode.BILINEAR),
+                T.CenterCrop(target_size)
+            ])
+        elif resize_mode == 'resize':
+            transforms_list.append(T.Resize(target_size, interpolation=T.InterpolationMode.BILINEAR))
+        elif resize_mode == 'random_crop':
+            transforms_list.append(T.RandomCrop(target_size))
+        else:
+            raise ValueError(f"未知的resize_mode: {resize_mode}")
+    # 转换为Tensor
+    transforms_list.append(T.ToTensor())
+    # 归一化
+    normalize_config = preprocessing.get('normalize', {})
+    mean = normalize_config.get('mean', [0.5, 0.5, 0.5])
+    std = normalize_config.get('std', [0.5, 0.5, 0.5])
+    transforms_list.append(T.Normalize(mean=mean, std=std))
+    return T.Compose(transforms_list)
+class TextPreprocessor:
+    """文本预处理器"""
+    def __init__(self, config: dict):
+        self.config = config.get('preprocessing', {})
+        # 文本处理参数
+        self.max_length = self.config.get('max_length', 77)
+        self.truncation = self.config.get('truncation', True)
+        self.padding = self.config.get('padding', 'max_length')
+        # 尝试加载tokenizer
+        self.tokenizer = None
+        self._init_tokenizer()
+    def _init_tokenizer(self):
+        """初始化tokenizer"""
+        try:
+            from transformers import CLIPTokenizer
+            tokenizer_name = self.config.get('tokenizer', 'openai/clip-vit-base-patch32')
+            self.tokenizer = CLIPTokenizer.from_pretrained(tokenizer_name)
+        except ImportError:
+            print("警告: 未安装transformers，无法使用CLIP tokenizer")
+        except Exception as e:
+            print(f"加载tokenizer失败: {e}")
+    def preprocess_text(self, text: str) -> Dict:
+        """预处理文本"""
+        if self.tokenizer is not None:
+            # 使用CLIP tokenizer
+            inputs = self.tokenizer(
+                text,
+                max_length=self.max_length,
+                padding=self.padding,
+                truncation=self.truncation,
+                return_tensors="pt"
+            )
+            return {
+                'input_ids': inputs['input_ids'].squeeze(0),
+                'attention_mask': inputs['attention_mask'].squeeze(0)
+            }
+        else:
+            # 简单的文本处理
+            return {
+                'text': text,
+                'length': len(text)
+            }
+    def batch_preprocess(self, texts: List[str]) -> Dict:
+        """批量预处理文本"""
+        if self.tokenizer is not None:
+            inputs = self.tokenizer(
+                texts,
+                max_length=self.max_length,
+                padding=self.padding,
+                truncation=self.truncation,
+                return_tensors="pt"
+            )
+            return inputs
+        else:
+            return {'texts': texts}
+class ImagePreprocessor:
+    """图像预处理器"""
+    def __init__(self, config: dict):
+        self.config = config.get('preprocessing', {})
+        self.transform = get_transform(config, mode='train')
+    def preprocess_image(self, image: Image.Image) -> torch.Tensor:
+        """预处理单张图像"""
+        return self.transform(image)
+    def batch_preprocess(self, images: List[Image.Image]) -> torch.Tensor:
+        """批量预处理图像"""
+        return torch.stack([self.transform(img) for img in images])
+    def preprocess_for_vae(self, image: torch.Tensor) -> torch.Tensor:
+        """为VAE编码预处理图像"""
+        # VAE期望输入在[-1, 1]范围内
+        return image * 2.0 - 1.0
+    def postprocess_from_vae(self, latents: torch.Tensor) -> torch.Tensor:
+        """从VAE解码后处理图像"""
+        # 将[-1, 1]范围转换回[0, 1]
+        return (latents + 1.0) / 2.0
+class DataPreprocessor:
+    """数据预处理器（整合文本和图像处理）"""
+    def __init__(self, config: dict):
+        self.config = config
+        self.image_preprocessor = ImagePreprocessor(config)
+        self.text_preprocessor = TextPreprocessor(config)
+        # 文本编码器
+        self.text_encoder = None
+        self._init_text_encoder()
+    def _init_text_encoder(self):
+        """初始化文本编码器"""
+        try:
+            from transformers import CLIPTextModel
+            model_name = self.config.get('preprocessing', {}).get('text_encoder', 'openai/clip-vit-base-patch32')
+            self.text_encoder = CLIPTextModel.from_pretrained(model_name)
+            # 冻结参数
+            for param in self.text_encoder.parameters():
+                param.requires_grad = False
+            # 设置为评估模式
+            self.text_encoder.eval()
+            print(f"已加载文本编码器: {model_name}")
+        except Exception as e:
+            print(f"加载文本编码器失败: {e}")
+    def encode_text(self, text: str) -> torch.Tensor:
+        """编码文本为嵌入向量"""
+        if self.text_encoder is None:
+            raise ValueError("文本编码器未初始化")
+        # 预处理文本
+        inputs = self.text_preprocessor.preprocess_text(text)
+        # 编码
+        with torch.no_grad():
+            if 'input_ids' in inputs:
+                outputs = self.text_encoder(
+                    input_ids=inputs['input_ids'].unsqueeze(0),
+                    attention_mask=inputs['attention_mask'].unsqueeze(0) if 'attention_mask' in inputs else None
+                )
+                return outputs.last_hidden_state.squeeze(0)
+            else:
+                # 回退到简单的嵌入
+                return torch.randn(77, 768)  # 默认维度
+    def batch_encode_text(self, texts: List[str]) -> torch.Tensor:
+        """批量编码文本"""
+        if self.text_encoder is None:
+            raise ValueError("文本编码器未初始化")
+        # 预处理文本
+        inputs = self.text_preprocessor.batch_preprocess(texts)
+        # 编码
+        with torch.no_grad():
+            if 'input_ids' in inputs:
+                outputs = self.text_encoder(
+                    input_ids=inputs['input_ids'],
+                    attention_mask=inputs.get('attention_mask', None)
+                )
+                return outputs.last_hidden_state
+            else:
+                # 回退到简单的嵌入
+                batch_size = len(texts)
+                return torch.randn(batch_size, 77, 768)
+    def preprocess_batch(self, batch: List[Dict]) -> Dict:
+        """预处理批次数据"""
+        images = [item['image'] for item in batch]
+        texts = [item['text'] for item in batch]
+        # 预处理图像
+        image_tensors = self.image_preprocessor.batch_preprocess(images)
+        # 编码文本
+        if self.text_encoder is not None:
+            text_embeddings = self.batch_encode_text(texts)
+        else:
+            text_embeddings = None
+        return {
+            'images': image_tensors,
+            'text_embeddings': text_embeddings,
+            'texts': texts,
+            'image_paths': [item.get('image_path', '') for item in batch]
+        }
+def test_preprocessing():
+    """测试预处理"""
+    import yaml
+    from PIL import Image
+    # 创建测试图像
+    test_image = Image.new('RGB', (512, 512), color='red')
+    # 加载配置
+    with open('configs/data/laion_filtered.yaml', 'r') as f:
+        config = yaml.safe_load(f)
+    # 测试图像预处理
+    image_preprocessor = ImagePreprocessor(config)
+    processed_image = image_preprocessor.preprocess_image(test_image)
+    print(f"原始图像: {test_image.size}")
+    print(f"处理后图像形状: {processed_image.shape}")
+    # 测试文本预处理
+    text_preprocessor = TextPreprocessor(config)
+    processed_text = text_preprocessor.preprocess_text("A red square image")
+    print(f"文本处理结果: {processed_text}")
+    # 测试数据预处理器
+    data_preprocessor = DataPreprocessor(config)
+    test_batch = [
+        {'image': test_image, 'text': "A red square"},
+        {'image': test_image, 'text': "A blue circle"}
+    ]
+    processed_batch = data_preprocessor.preprocess_batch(test_batch)
+    print(f"批次图像形状: {processed_batch['images'].shape}")
+    print(f"文本嵌入形状: {processed_batch['text_embeddings'].shape if processed_batch['text_embeddings'] is not None else 'None'}")
+    return processed_batch
+if __name__ == '__main__':
+    test_preprocessing()

src/data/text_encoder.py ADDED Viewed

	@@ -0,0 +1,227 @@

+import torch
+import torch.nn as nn
+from transformers import CLIPTextModel, CLIPTokenizer, CLIPConfig
+from typing import Optional, Tuple, List
+import os
+class LightTextEncoder(nn.Module):
+    """轻量级文本编码器"""
+    def __init__(self, config: dict):
+        super().__init__()
+        self.config = config
+        # 编码器参数
+        self.vocab_size = config.get('vocab_size', 49408)
+        self.hidden_size = config.get('hidden_size', 512)
+        self.num_hidden_layers = config.get('num_hidden_layers', 8)
+        self.num_attention_heads = config.get('num_attention_heads', 8)
+        self.max_position_embeddings = config.get('max_position_embeddings', 77)
+        # 构建编码器
+        self.token_embedding = nn.Embedding(self.vocab_size, self.hidden_size)
+        self.position_embedding = nn.Embedding(self.max_position_embeddings, self.hidden_size)
+        # Transformer层
+        self.layers = nn.ModuleList([
+            TransformerLayer(self.hidden_size, self.num_attention_heads)
+            for _ in range(self.num_hidden_layers)
+        ])
+        self.final_layer_norm = nn.LayerNorm(self.hidden_size)
+    def forward(self, input_ids: torch.Tensor, attention_mask: Optional[torch.Tensor] = None) -> torch.Tensor:
+        # 嵌入
+        token_embeddings = self.token_embedding(input_ids)
+        position_ids = torch.arange(input_ids.shape[1], device=input_ids.device).unsqueeze(0)
+        position_embeddings = self.position_embedding(position_ids)
+        hidden_states = token_embeddings + position_embeddings
+        # Transformer层
+        for layer in self.layers:
+            hidden_states = layer(hidden_states, attention_mask)
+        # 最终层归一化
+        hidden_states = self.final_layer_norm(hidden_states)
+        return hidden_states
+class TransformerLayer(nn.Module):
+    """Transformer层"""
+    def __init__(self, hidden_size: int, num_heads: int):
+        super().__init__()
+        self.attention = nn.MultiheadAttention(hidden_size, num_heads, batch_first=True)
+        self.attention_norm = nn.LayerNorm(hidden_size)
+        self.mlp = nn.Sequential(
+            nn.Linear(hidden_size, hidden_size * 4),
+            nn.GELU(),
+            nn.Linear(hidden_size * 4, hidden_size)
+        )
+        self.mlp_norm = nn.LayerNorm(hidden_size)
+    def forward(self, x: torch.Tensor, attention_mask: Optional[torch.Tensor] = None) -> torch.Tensor:
+        # 自注意力
+        attn_output, _ = self.attention(x, x, x, key_padding_mask=attention_mask)
+        x = self.attention_norm(x + attn_output)
+        # 前馈网络
+        mlp_output = self.mlp(x)
+        x = self.mlp_norm(x + mlp_output)
+        return x
+class CLIPTextEncoderWrapper:
+    """CLIP文本编码器包装器"""
+    def __init__(self, model_name: str = 'openai/clip-vit-base-patch32', device: str = 'cuda'):
+        self.model_name = model_name
+        self.device = device
+        # 加载tokenizer和模型
+        self.tokenizer = CLIPTokenizer.from_pretrained(model_name)
+        # 只加载文本模型
+        self.text_model = CLIPTextModel.from_pretrained(model_name).to(device)
+        # 冻结参数
+        for param in self.text_model.parameters():
+            param.requires_grad = False
+        # 设置为评估模式
+        self.text_model.eval()
+        print(f"已加载CLIP文本编码器: {model_name}")
+    def encode(self, texts: List[str], return_tensors: str = 'pt') -> torch.Tensor:
+        """编码文本"""
+        # Tokenize
+        inputs = self.tokenizer(
+            texts,
+            padding=True,
+            truncation=True,
+            max_length=77,
+            return_tensors=return_tensors
+        )
+        # 移动到设备
+        inputs = {k: v.to(self.device) for k, v in inputs.items()}
+        # 编码
+        with torch.no_grad():
+            outputs = self.text_model(**inputs)
+            return outputs.last_hidden_state
+    def encode_batch(self, texts: List[str], batch_size: int = 32) -> torch.Tensor:
+        """分批编码文本"""
+        all_embeddings = []
+        for i in range(0, len(texts), batch_size):
+            batch_texts = texts[i:i + batch_size]
+            batch_embeddings = self.encode(batch_texts)
+            all_embeddings.append(batch_embeddings.cpu())
+        return torch.cat(all_embeddings, dim=0)
+    def save_embeddings(self, texts: List[str], save_path: str):
+        """保存文本嵌入"""
+        embeddings = self.encode_batch(texts)
+        torch.save(embeddings, save_path)
+        print(f"嵌入已保存到: {save_path}")
+class CachedTextEncoder:
+    """带缓存的文本编码器"""
+    def __init__(self, encoder, cache_dir: str = './text_cache'):
+        self.encoder = encoder
+        self.cache_dir = cache_dir
+        os.makedirs(cache_dir, exist_ok=True)
+        # 内存缓存
+        self.memory_cache = {}
+    def encode(self, text: str) -> torch.Tensor:
+        """编码文本，使用缓存"""
+        # 生成缓存键
+        import hashlib
+        cache_key = hashlib.md5(text.encode()).hexdigest()
+        # 检查内存缓存
+        if cache_key in self.memory_cache:
+            return self.memory_cache[cache_key]
+        # 检查磁盘缓存
+        cache_file = os.path.join(self.cache_dir, f"{cache_key}.pt")
+        if os.path.exists(cache_file):
+            embedding = torch.load(cache_file)
+            self.memory_cache[cache_key] = embedding
+            return embedding
+        # 编码并缓存
+        embedding = self.encoder.encode([text])[0]
+        # 保存到内存缓存
+        self.memory_cache[cache_key] = embedding
+        # 保存到磁盘缓存
+        torch.save(embedding, cache_file)
+        return embedding
+    def encode_batch(self, texts: List[str]) -> torch.Tensor:
+        """批量编码文本"""
+        embeddings = []
+        for text in texts:
+            embedding = self.encode(text)
+            embeddings.append(embedding.unsqueeze(0))
+        return torch.cat(embeddings, dim=0)
+def create_text_encoder(config: dict) -> CLIPTextEncoderWrapper:
+    """创建文本编码器"""
+    model_name = config.get('preprocessing', {}).get('tokenizer', 'openai/clip-vit-base-patch32')
+    device = config.get('device', 'cuda' if torch.cuda.is_available() else 'cpu')
+    encoder = CLIPTextEncoderWrapper(model_name, device)
+    # 如果需要缓存，包装一层
+    if config.get('use_cache', True):
+        cache_dir = config.get('cache_dir', './data/text_cache')
+        encoder = CachedTextEncoder(encoder, cache_dir)
+    return encoder
+def test_text_encoder():
+    """测试文本编码器"""
+    config = {
+        'preprocessing': {
+            'tokenizer': 'openai/clip-vit-base-patch32'
+        },
+        'device': 'cuda' if torch.cuda.is_available() else 'cpu'
+    }
+    encoder = create_text_encoder(config)
+    # 测试编码
+    texts = [
+        "A beautiful sunset over the mountains",
+        "A cute cat playing with a ball",
+        "An astronaut riding a horse on Mars"
+    ]
+    embeddings = encoder.encode(texts)
+    print(f"文本数量: {len(texts)}")
+    print(f"嵌入形状: {embeddings.shape}")
+    print(f"嵌入范围: [{embeddings.min():.4f}, {embeddings.max():.4f}]")
+    return encoder, embeddings
+if __name__ == '__main__':
+    encoder, embeddings = test_text_encoder()

src/inference/api.py ADDED Viewed

	@@ -0,0 +1,631 @@

+from fastapi import FastAPI, UploadFile, File, Form, HTTPException
+from fastapi.responses import JSONResponse, FileResponse, StreamingResponse
+from pydantic import BaseModel
+from typing import Optional, List, Dict, Any
+import torch
+import io
+from PIL import Image
+import base64
+import json
+import time
+from datetime import datetime
+import asyncio
+from concurrent.futures import ThreadPoolExecutor
+import uuid
+import os
+from .sampler import TextToImagePipeline, SamplerFactory
+from ..data.text_encoder import CLIPTextEncoderWrapper
+# 请求/响应模型
+class TextToImageRequest(BaseModel):
+    prompt: str
+    negative_prompt: Optional[str] = ""
+    width: int = 512
+    height: int = 512
+    num_steps: int = 50
+    guidance_scale: float = 7.5
+    num_images: int = 1
+    seed: Optional[int] = None
+    sampler: str = "ddim"
+class ImageResponse(BaseModel):
+    images: List[str]  # base64编码的图像
+    metadata: Dict[str, Any]
+    request_id: str
+    generation_time: float
+class BatchRequest(BaseModel):
+    requests: List[TextToImageRequest]
+    priority: int = 0
+class StatusResponse(BaseModel):
+    status: str
+    queue_length: int
+    active_tasks: int
+    gpu_memory_usage: float
+    uptime: float
+# API应用
+class LuminaAPI:
+    """Lumina API服务器"""
+    def __init__(
+        self,
+        model,
+        diffusion,
+        text_encoder,
+        vae_decoder=None,
+        host: str = "0.0.0.0",
+        port: int = 8000,
+        max_queue_size: int = 100,
+        max_workers: int = 2
+    ):
+        self.model = model
+        self.diffusion = diffusion
+        self.text_encoder = text_encoder
+        self.vae_decoder = vae_decoder
+        self.host = host
+        self.port = port
+        # 创建FastAPI应用
+        self.app = FastAPI(
+            title="Lumina Image Generation API",
+            description="轻量级图像生成模型API",
+            version="1.0.0"
+        )
+        # 任务队列
+        self.task_queue = asyncio.Queue(maxsize=max_queue_size)
+        self.executor = ThreadPoolExecutor(max_workers=max_workers)
+        self.active_tasks = 0
+        # 请求历史
+        self.request_history = []
+        self.max_history = 1000
+        # 统计信息
+        self.start_time = time.time()
+        self.total_requests = 0
+        self.total_images = 0
+        # 初始化管道
+        self.pipeline = TextToImagePipeline(
+            model=model,
+            diffusion=diffusion,
+            text_encoder=text_encoder,
+            vae_decoder=vae_decoder,
+            sampler_type="ddim"
+        )
+        # 设置路由
+        self._setup_routes()
+    def _setup_routes(self):
+        """设置API路由"""
+        @self.app.get("/")
+        async def root():
+            return {
+                "message": "Lumina Image Generation API",
+                "version": "1.0.0",
+                "docs": "/docs",
+                "endpoints": [
+                    "/generate",
+                    "/batch_generate",
+                    "/status",
+                    "/health"
+                ]
+            }
+        @self.app.get("/health")
+        async def health_check():
+            """健康检查"""
+            gpu_available = torch.cuda.is_available()
+            gpu_memory = torch.cuda.memory_allocated() / 1024**3 if gpu_available else 0
+            return {
+                "status": "healthy",
+                "gpu_available": gpu_available,
+                "gpu_memory_gb": gpu_memory,
+                "model_loaded": self.model is not None,
+                "text_encoder_loaded": self.text_encoder is not None
+            }
+        @self.app.get("/status")
+        async def get_status():
+            """获取服务状态"""
+            uptime = time.time() - self.start_time
+            # GPU内存使用
+            if torch.cuda.is_available():
+                gpu_memory = torch.cuda.memory_allocated() / 1024**3
+            else:
+                gpu_memory = 0
+            return StatusResponse(
+                status="running",
+                queue_length=self.task_queue.qsize(),
+                active_tasks=self.active_tasks,
+                gpu_memory_usage=gpu_memory,
+                uptime=uptime
+            )
+        @self.app.post("/generate", response_model=ImageResponse)
+        async def generate_image(request: TextToImageRequest):
+            """生成单个图像"""
+            request_id = str(uuid.uuid4())
+            # 记录请求
+            self.request_history.append({
+                "request_id": request_id,
+                "prompt": request.prompt,
+                "timestamp": datetime.now().isoformat()
+            })
+            # 限制历史记录大小
+            if len(self.request_history) > self.max_history:
+                self.request_history = self.request_history[-self.max_history:]
+            # 生成图像
+            start_time = time.time()
+            try:
+                # 在线程池中运行生成任务
+                loop = asyncio.get_event_loop()
+                images = await loop.run_in_executor(
+                    self.executor,
+                    self._generate_sync,
+                    request
+                )
+                generation_time = time.time() - start_time
+                # 转换为base64
+                image_b64_list = []
+                for img_tensor in images:
+                    if isinstance(img_tensor, torch.Tensor):
+                        # 转换为PIL图像
+                        if img_tensor.dim() == 4:
+                            img_tensor = img_tensor.squeeze(0)
+                        # 归一化到[0, 255]
+                        img_tensor = torch.clamp(img_tensor * 255, 0, 255).byte()
+                        # 转换为numpy数组
+                        if img_tensor.shape[0] == 3:  # CHW格式
+                            img_array = img_tensor.permute(1, 2, 0).cpu().numpy()
+                        else:
+                            img_array = img_tensor.cpu().numpy()
+                        # 转换为PIL图像
+                        img = Image.fromarray(img_array)
+                        # 转换为base64
+                        buffered = io.BytesIO()
+                        img.save(buffered, format="PNG")
+                        img_b64 = base64.b64encode(buffered.getvalue()).decode()
+                        image_b64_list.append(img_b64)
+                    else:
+                        image_b64_list.append("")
+                # 更新统计信息
+                self.total_requests += 1
+                self.total_images += len(images)
+                return ImageResponse(
+                    images=image_b64_list,
+                    metadata={
+                        "prompt": request.prompt,
+                        "negative_prompt": request.negative_prompt,
+                        "width": request.width,
+                        "height": request.height,
+                        "num_steps": request.num_steps,
+                        "guidance_scale": request.guidance_scale,
+                        "seed": request.seed,
+                        "sampler": request.sampler
+                    },
+                    request_id=request_id,
+                    generation_time=generation_time
+                )
+            except Exception as e:
+                raise HTTPException(status_code=500, detail=str(e))
+        @self.app.post("/batch_generate")
+        async def batch_generate(batch_request: BatchRequest):
+            """批量生成图像"""
+            request_ids = [str(uuid.uuid4()) for _ in batch_request.requests]
+            results = []
+            # 为每个请求生成任务
+            tasks = []
+            for req, req_id in zip(batch_request.requests, request_ids):
+                task = asyncio.create_task(self._process_single_request(req, req_id))
+                tasks.append(task)
+            # 等待所有任务完成
+            results = await asyncio.gather(*tasks, return_exceptions=True)
+            # 处理结果
+            successful_results = []
+            failed_results = []
+            for result in results:
+                if isinstance(result, Exception):
+                    failed_results.append({"error": str(result)})
+                else:
+                    successful_results.append(result)
+            return {
+                "successful": successful_results,
+                "failed": failed_results,
+                "total_requests": len(batch_request.requests),
+                "successful_count": len(successful_results),
+                "failed_count": len(failed_results)
+            }
+        @self.app.get("/history")
+        async def get_history(limit: int = 50):
+            """获取请求历史"""
+            return self.request_history[-limit:]
+        @self.app.post("/txt2img")  # Stable Diffusion兼容端点
+        async def txt2img(
+            prompt: str = Form(...),
+            negative_prompt: str = Form(""),
+            width: int = Form(512),
+            height: int = Form(512),
+            steps: int = Form(50),
+            cfg_scale: float = Form(7.5),
+            seed: int = Form(-1)
+        ):
+            """兼容Stable Diffusion WebUI的端点"""
+            request = TextToImageRequest(
+                prompt=prompt,
+                negative_prompt=negative_prompt,
+                width=width,
+                height=height,
+                num_steps=steps,
+                guidance_scale=cfg_scale,
+                seed=seed if seed != -1 else None
+            )
+            response = await generate_image(request)
+            # 返回第一个图像
+            if response.images:
+                return StreamingResponse(
+                    io.BytesIO(base64.b64decode(response.images[0])),
+                    media_type="image/png"
+                )
+            else:
+                raise HTTPException(status_code=500, detail="生成失败")
+        @self.app.get("/stats")
+        async def get_stats():
+            """获取统计信息"""
+            uptime = time.time() - self.start_time
+            return {
+                "total_requests": self.total_requests,
+                "total_images": self.total_images,
+                "requests_per_minute": self.total_requests / (uptime / 60) if uptime > 0 else 0,
+                "avg_generation_time": None,  # 可以添加计时逻辑
+                "uptime_seconds": uptime,
+                "queue_size": self.task_queue.qsize(),
+                "active_workers": self.active_tasks
+            }
+    def _generate_sync(self, request: TextToImageRequest) -> List[torch.Tensor]:
+        """同步生成图像（在单独的线程中运行）"""
+        # 更新管道采样器
+        self.pipeline.sampler = SamplerFactory.create_sampler(
+            request.sampler,
+            self.model,
+            self.diffusion,
+            request.num_steps
+        )
+        # 生成图像
+        images = self.pipeline(
+            prompt=request.prompt,
+            negative_prompt=request.negative_prompt,
+            height=request.height,
+            width=request.width,
+            num_inference_steps=request.num_steps,
+            guidance_scale=request.guidance_scale,
+            num_images=request.num_images,
+            seed=request.seed,
+            progress_bar=False
+        )
+        return images
+    async def _process_single_request(self, request: TextToImageRequest, request_id: str) -> Dict:
+        """处理单个请求"""
+        try:
+            # 在队列中添加任务
+            await self.task_queue.put((request, request_id))
+            # 更新活动任务计数
+            self.active_tasks += 1
+            # 处理任务
+            loop = asyncio.get_event_loop()
+            images = await loop.run_in_executor(
+                self.executor,
+                self._generate_sync,
+                request
+            )
+            # 转换图像
+            image_b64_list = []
+            for img_tensor in images:
+                # 简化的转换逻辑
+                img_b64 = "placeholder"  # 实际应该转换为base64
+                image_b64_list.append(img_b64)
+            # 更新活动任务计数
+            self.active_tasks -= 1
+            return {
+                "request_id": request_id,
+                "images": image_b64_list,
+                "success": True
+            }
+        except Exception as e:
+            self.active_tasks -= 1
+            return {
+                "request_id": request_id,
+                "error": str(e),
+                "success": False
+            }
+    def run(self):
+        """运行API服务器"""
+        import uvicorn
+        print(f"启动Lumina API服务器在 http://{self.host}:{self.port}")
+        print(f"API文档: http://{self.host}:{self.port}/docs")
+        uvicorn.run(
+            self.app,
+            host=self.host,
+            port=self.port,
+            log_level="info"
+        )
+class SimpleWebUI:
+    """简单的Web UI（使用Gradio）"""
+    def __init__(self, pipeline: TextToImagePipeline):
+        self.pipeline = pipeline
+    def create_interface(self):
+        """创建Gradio界面"""
+        try:
+            import gradio as gr
+        except ImportError:
+            print("Gradio未安装，无法创建Web UI")
+            return None
+        def generate_image_ui(
+            prompt,
+            negative_prompt,
+            width,
+            height,
+            num_steps,
+            guidance_scale,
+            seed,
+            sampler
+        ):
+            """UI生成函数"""
+            # 设置种子
+            if seed and seed > 0:
+                torch.manual_seed(seed)
+                if torch.cuda.is_available():
+                    torch.cuda.manual_seed(seed)
+            # 生成图像
+            images = self.pipeline(
+                prompt=prompt,
+                negative_prompt=negative_prompt,
+                height=height,
+                width=width,
+                num_inference_steps=num_steps,
+                guidance_scale=guidance_scale,
+                num_images=1,
+                seed=seed if seed > 0 else None,
+                progress_bar=False
+            )
+            # 转换为PIL图像
+            if images:
+                img_tensor = images[0]
+                if isinstance(img_tensor, torch.Tensor):
+                    if img_tensor.dim() == 4:
+                        img_tensor = img_tensor.squeeze(0)
+                    # 归一化到[0, 255]
+                    img_tensor = torch.clamp(img_tensor * 255, 0, 255).byte()
+                    # 转换为numpy数组
+                    if img_tensor.shape[0] == 3:  # CHW格式
+                        img_array = img_tensor.permute(1, 2, 0).cpu().numpy()
+                    else:
+                        img_array = img_tensor.cpu().numpy()
+                    # 转换为PIL图像
+                    from PIL import Image
+                    img = Image.fromarray(img_array)
+                    return img
+            return None
+        # 创建界面
+        with gr.Blocks(title="Lumina Image Generator") as interface:
+            gr.Markdown("# 🎨 Lumina - 轻量级图像生成")
+            gr.Markdown("基于扩散模型的文本到图像生成系统")
+            with gr.Row():
+                with gr.Column():
+                    prompt = gr.Textbox(
+                        label="提示词",
+                        placeholder="输入描述图像的文本...",
+                        lines=3
+                    )
+                    negative_prompt = gr.Textbox(
+                        label="负面提示词",
+                        placeholder="不想在图像中出现的内容...",
+                        lines=2
+                    )
+                    with gr.Row():
+                        width = gr.Slider(
+                            minimum=256,
+                            maximum=1024,
+                            value=512,
+                            step=64,
+                            label="宽度"
+                        )
+                        height = gr.Slider(
+                            minimum=256,
+                            maximum=1024,
+                            value=512,
+                            step=64,
+                            label="高度"
+                        )
+                    with gr.Row():
+                        num_steps = gr.Slider(
+                            minimum=1,
+                            maximum=100,
+                            value=30,
+                            step=1,
+                            label="采样步数"
+                        )
+                        guidance_scale = gr.Slider(
+                            minimum=1.0,
+                            maximum=20.0,
+                            value=7.5,
+                            step=0.5,
+                            label="引导强度"
+                        )
+                    with gr.Row():
+                        seed = gr.Number(
+                            value=-1,
+                            label="随机种子 (-1为随机)"
+                        )
+                        sampler = gr.Dropdown(
+                            choices=["ddim", "dpm", "lcm"],
+                            value="ddim",
+                            label="采样器"
+                        )
+                    generate_btn = gr.Button("生成图像", variant="primary")
+                with gr.Column():
+                    output_image = gr.Image(
+                        label="生成的图像",
+                        type="pil"
+                    )
+            # 示例
+            gr.Markdown("### 示例提示词")
+            examples = gr.Examples(
+                examples=[
+                    ["A beautiful sunset over mountains, digital art", "", 512, 512, 30, 7.5, -1],
+                    ["A cute cat playing with a ball of yarn", "blurry, deformed", 512, 512, 25, 8.0, -1],
+                    ["An astronaut riding a horse on Mars", "cartoon, anime", 512, 512, 40, 7.0, -1]
+                ],
+                inputs=[prompt, negative_prompt, width, height, num_steps, guidance_scale, seed]
+            )
+            # 事件处理
+            generate_btn.click(
+                fn=generate_image_ui,
+                inputs=[prompt, negative_prompt, width, height, num_steps, guidance_scale, seed, sampler],
+                outputs=output_image
+            )
+        return interface
+    def launch(self, share: bool = False, server_name: str = "0.0.0.0", server_port: int = 7860):
+        """启动Web UI"""
+        interface = self.create_interface()
+        if interface:
+            interface.launch(
+                share=share,
+                server_name=server_name,
+                server_port=server_port
+            )
+def create_api_server(config: dict, model, diffusion, text_encoder, vae_decoder=None):
+    """创建API服务器"""
+    # 确定主机和端口
+    host = config.get('host', '0.0.0.0')
+    port = config.get('port', 8000)
+    # 创建API服务器
+    api_server = LuminaAPI(
+        model=model,
+        diffusion=diffusion,
+        text_encoder=text_encoder,
+        vae_decoder=vae_decoder,
+        host=host,
+        port=port,
+        max_queue_size=config.get('max_queue_size', 100),
+        max_workers=config.get('max_workers', 2)
+    )
+    return api_server
+def test_api():
+    """测试API"""
+    import torch.nn as nn
+    # 创建模拟组件
+    class MockModel(nn.Module):
+        def forward(self, x, t, context):
+            return torch.randn_like(x)
+    class MockDiffusion:
+        pass
+    class MockTextEncoder:
+        def encode(self, texts):
+            return torch.randn(len(texts), 77, 768)
+    model = MockModel()
+    diffusion = MockDiffusion()
+    text_encoder = MockTextEncoder()
+    # 创建API服务器
+    api = LuminaAPI(
+        model=model,
+        diffusion=diffusion,
+        text_encoder=text_encoder
+    )
+    print("API服务器创建成功")
+    print("端点:")
+    print("  POST /generate - 生成图像")
+    print("  GET  /health   - 健康检查")
+    print("  GET  /status   - 状态信息")
+    return api
+if __name__ == '__main__':
+    # 测试API
+    api = test_api()
+    # 注意：实际运行需要调用 api.run()

src/inference/optimization.py ADDED Viewed

	@@ -0,0 +1,427 @@

+import torch
+import torch.nn as nn
+from typing import Optional, Dict, Any
+import onnx
+import onnxruntime as ort
+import os
+class ModelOptimizer:
+    """模型优化器"""
+    def __init__(self, model: nn.Module, device: str = 'cuda'):
+        self.model = model
+        self.device = device
+        self.optimized_model = None
+    def optimize_for_inference(self, use_jit: bool = True, use_cuda_graph: bool = False):
+        """优化模型用于推理"""
+        self.model.eval()
+        # 应用一系列优化
+        if use_jit:
+            self._jit_compile()
+        if use_cuda_graph and torch.cuda.is_available():
+            self._capture_cuda_graph()
+        # 应用其他优化
+        self._apply_inference_optimizations()
+        return self.optimized_model or self.model
+    def _jit_compile(self):
+        """使用TorchScript编译模型"""
+        try:
+            # 创建示例输入
+            example_input = torch.randn(1, 4, 64, 64, device=self.device)
+            example_timestep = torch.tensor([500], device=self.device)
+            example_context = torch.randn(1, 77, 768, device=self.device)
+            # 脚本编译
+            scripted_model = torch.jit.trace(
+                self.model,
+                (example_input, example_timestep, example_context),
+                check_trace=False
+            )
+            self.optimized_model = scripted_model
+            print("模型已使用TorchScript编译")
+        except Exception as e:
+            print(f"TorchScript编译失败: {e}")
+    def _capture_cuda_graph(self):
+        """捕获CUDA图（用于重复推理）"""
+        if not torch.cuda.is_available():
+            return
+        # 创建静态输入
+        static_input = torch.randn(1, 4, 64, 64, device='cuda', dtype=torch.float16)
+        static_timestep = torch.tensor([500], device='cuda')
+        static_context = torch.randn(1, 77, 768, device='cuda', dtype=torch.float16)
+        # 预热
+        with torch.no_grad():
+            for _ in range(3):
+                _ = self.model(static_input, static_timestep, static_context)
+        # 捕获图
+        graph = torch.cuda.CUDAGraph()
+        with torch.cuda.graph(graph):
+            static_output = self.model(static_input, static_timestep, static_context)
+        # 创建包装函数
+        def graph_executor(input_tensor, timestep, context):
+            static_input.copy_(input_tensor)
+            static_timestep.copy_(timestep)
+            static_context.copy_(context)
+            graph.replay()
+            return static_output.clone()
+        self.optimized_model = graph_executor
+        print("已捕获CUDA图")
+    def _apply_inference_optimizations(self):
+        """应用推理优化"""
+        # 设置为评估模式
+        self.model.eval()
+        # 融合操作（如果可用）
+        if hasattr(torch, 'compile'):
+            try:
+                self.model = torch.compile(self.model, mode='max-autotune')
+                print("模型已使用torch.compile优化")
+            except:
+                pass
+        # 使用半精度
+        if self.device == 'cuda':
+            self.model.half()
+            print("模型已转换为半精度")
+    def quantize(self, quantization_mode: str = 'dynamic'):
+        """量化模型"""
+        if quantization_mode == 'dynamic':
+            # 动态量化
+            quantized_model = torch.quantization.quantize_dynamic(
+                self.model,
+                {nn.Linear, nn.Conv2d},
+                dtype=torch.qint8
+            )
+            self.optimized_model = quantized_model
+            print("模型已动态量化")
+        elif quantization_mode == 'static':
+            # 静态量化需要校准数据
+            print("静态量化需要校准数据，暂未实现")
+        else:
+            raise ValueError(f"未知的量化模式: {quantization_mode}")
+    def prune(self, pruning_rate: float = 0.2):
+        """剪枝模型"""
+        from torch.nn.utils import prune
+        # 对线性层和卷积层进行剪枝
+        for name, module in self.model.named_modules():
+            if isinstance(module, (nn.Linear, nn.Conv2d)):
+                prune.l1_unstructured(module, name='weight', amount=pruning_rate)
+                prune.remove(module, 'weight')
+        print(f"模型已剪枝，剪枝率: {pruning_rate}")
+    def get_model_size(self) -> Dict[str, float]:
+        """获取模型大小"""
+        # 计算参数量
+        total_params = sum(p.numel() for p in self.model.parameters())
+        trainable_params = sum(p.numel() for p in self.model.parameters() if p.requires_grad)
+        # 计算模型大小（MB）
+        param_size = 0
+        for param in self.model.parameters():
+            param_size += param.nelement() * param.element_size()
+        buffer_size = 0
+        for buffer in self.model.buffers():
+            buffer_size += buffer.nelement() * buffer.element_size()
+        size_mb = (param_size + buffer_size) / 1024**2
+        return {
+            'total_params': total_params,
+            'trainable_params': trainable_params,
+            'size_mb': size_mb
+        }
+class ONNXExporter:
+    """ONNX导出器"""
+    def __init__(self, model: nn.Module):
+        self.model = model
+    def export(
+        self,
+        output_path: str,
+        input_shape: tuple = (1, 4, 64, 64),
+        opset_version: int = 14,
+        dynamic_axes: Optional[Dict] = None
+    ):
+        """导出为ONNX格式"""
+        # 设置为评估模式
+        self.model.eval()
+        # 创建示例输入
+        dummy_input = torch.randn(*input_shape)
+        dummy_timestep = torch.tensor([500])
+        dummy_context = torch.randn(1, 77, 768)
+        # 默认动态轴
+        if dynamic_axes is None:
+            dynamic_axes = {
+                'input': {0: 'batch_size'},
+                'timestep': {0: 'batch_size'},
+                'context': {0: 'batch_size'},
+                'output': {0: 'batch_size'}
+            }
+        # 导出
+        torch.onnx.export(
+            self.model,
+            (dummy_input, dummy_timestep, dummy_context),
+            output_path,
+            input_names=['input', 'timestep', 'context'],
+            output_names=['output'],
+            dynamic_axes=dynamic_axes,
+            opset_version=opset_version,
+            do_constant_folding=True,
+            verbose=False
+        )
+        print(f"模型已导出为ONNX: {output_path}")
+        # 验证ONNX模型
+        self._validate_onnx(output_path)
+    def _validate_onnx(self, onnx_path: str):
+        """验证ONNX模型"""
+        try:
+            onnx_model = onnx.load(onnx_path)
+            onnx.checker.check_model(onnx_model)
+            print("ONNX模型验证成功")
+        except Exception as e:
+            print(f"ONNX模型验证失败: {e}")
+    def optimize_onnx(self, onnx_path: str, optimized_path: str):
+        """优化ONNX模型"""
+        try:
+            # 使用ONNX Runtime优化
+            sess_options = ort.SessionOptions()
+            sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
+            # 创建优化会话
+            ort_session = ort.InferenceSession(onnx_path, sess_options)
+            # 保存优化后的模型
+            optimized_model = ort_session.get_model()
+            onnx.save(optimized_model, optimized_path)
+            print(f"ONNX模型已优化: {optimized_path}")
+        except Exception as e:
+            print(f"ONNX优化失败: {e}")
+class MemoryEfficientInference:
+    """内存高效推理"""
+    def __init__(self, model: nn.Module, chunk_size: int = 32):
+        self.model = model
+        self.chunk_size = chunk_size
+    def chunked_inference(self, x: torch.Tensor, t: torch.Tensor, context: torch.Tensor) -> torch.Tensor:
+        """分块推理，减少内存使用"""
+        B, C, H, W = x.shape
+        output = torch.zeros_like(x)
+        # 分块处理
+        for i in range(0, H, self.chunk_size):
+            for j in range(0, W, self.chunk_size):
+                # 提取块
+                chunk = x[:, :, i:i+self.chunk_size, j:j+self.chunk_size]
+                # 推理
+                with torch.no_grad():
+                    chunk_output = self.model(chunk, t, context)
+                # 存储结果
+                output[:, :, i:i+self.chunk_size, j:j+self.chunk_size] = chunk_output
+        return output
+    def tiled_inference(self, x: torch.Tensor, t: torch.Tensor, context: torch.Tensor, tile_size: int = 512) -> torch.Tensor:
+        """平铺推理，用于大图像"""
+        B, C, H, W = x.shape
+        # 如果图像不大，直接推理
+        if H <= tile_size and W <= tile_size:
+            with torch.no_grad():
+                return self.model(x, t, context)
+        # 计算平铺数量
+        n_tiles_h = (H + tile_size - 1) // tile_size
+        n_tiles_w = (W + tile_size - 1) // tile_size
+        output = torch.zeros_like(x)
+        # 处理每个平铺
+        for i in range(n_tiles_h):
+            for j in range(n_tiles_w):
+                # 计算平铺位置
+                h_start = i * tile_size
+                w_start = j * tile_size
+                h_end = min(h_start + tile_size, H)
+                w_end = min(w_start + tile_size, W)
+                # 提取平铺
+                tile = x[:, :, h_start:h_end, w_start:w_end]
+                # 推理
+                with torch.no_grad():
+                    tile_output = self.model(tile, t, context)
+                # 存储结果
+                output[:, :, h_start:h_end, w_start:w_end] = tile_output
+        return output
+class InferenceBenchmark:
+    """推理基准测试"""
+    def __init__(self, model: nn.Module, device: str = 'cuda'):
+        self.model = model
+        self.device = device
+    def benchmark(
+        self,
+        input_shape: tuple = (1, 4, 64, 64),
+        num_iterations: int = 100,
+        warmup_iterations: int = 10
+    ) -> Dict[str, float]:
+        """运行基准测试"""
+        # 准备输入
+        x = torch.randn(*input_shape, device=self.device)
+        t = torch.tensor([500], device=self.device)
+        context = torch.randn(1, 77, 768, device=self.device)
+        # 预热
+        print("预热...")
+        with torch.no_grad():
+            for _ in range(warmup_iterations):
+                _ = self.model(x, t, context)
+        # 同步
+        if torch.cuda.is_available():
+            torch.cuda.synchronize()
+        # 基准测试
+        print("运行基准测试...")
+        import time
+        times = []
+        for i in range(num_iterations):
+            start_time = time.time()
+            with torch.no_grad():
+                _ = self.model(x, t, context)
+            if torch.cuda.is_available():
+                torch.cuda.synchronize()
+            end_time = time.time()
+            times.append(end_time - start_time)
+        # 统计
+        times = torch.tensor(times)
+        stats = {
+            'mean_ms': times.mean().item() * 1000,
+            'std_ms': times.std().item() * 1000,
+            'min_ms': times.min().item() * 1000,
+            'max_ms': times.max().item() * 1000,
+            'fps': 1 / times.mean().item(),
+            'num_iterations': num_iterations
+        }
+        # 打印结果
+        print("\n" + "="*50)
+        print("推理基准测试结果:")
+        print(f"平均推理时间: {stats['mean_ms']:.2f} ms")
+        print(f"标准差: {stats['std_ms']:.2f} ms")
+        print(f"最小推理时间: {stats['min_ms']:.2f} ms")
+        print(f"最大推理时间: {stats['max_ms']:.2f} ms")
+        print(f"FPS: {stats['fps']:.2f}")
+        print("="*50)
+        return stats
+def optimize_model_for_p4(model: nn.Module) -> nn.Module:
+    """为P4优化模型"""
+    optimizer = ModelOptimizer(model)
+    # 获取模型大小
+    size_info = optimizer.get_model_size()
+    print(f"优化前模型大小: {size_info['size_mb']:.2f} MB")
+    # 应用优化
+    optimized_model = optimizer.optimize_for_inference(
+        use_jit=True,
+        use_cuda_graph=False  # P4可能不支持
+    )
+    # 量化（可选）
+    if size_info['size_mb'] > 500:  # 如果模型大于500MB，进行量化
+        optimizer.quantize('dynamic')
+    # 获取优化后的模型大小
+    size_info_after = optimizer.get_model_size()
+    print(f"优化后模型大小: {size_info_after['size_mb']:.2f} MB")
+    print(f"压缩比: {size_info['size_mb'] / size_info_after['size_mb']:.2f}x")
+    return optimized_model
+def test_optimization():
+    """测试优化"""
+    import torch.nn as nn
+    # 创建模拟模型
+    class MockModel(nn.Module):
+        def __init__(self):
+            super().__init__()
+            self.conv1 = nn.Conv2d(4, 64, 3, padding=1)
+            self.conv2 = nn.Conv2d(64, 4, 3, padding=1)
+        def forward(self, x, t, context):
+            x = self.conv1(x)
+            x = nn.functional.relu(x)
+            x = self.conv2(x)
+            return x
+    model = MockModel()
+    # 测试优化器
+    optimizer = ModelOptimizer(model)
+    optimized_model = optimizer.optimize_for_inference()
+    # 测试基准测试
+    benchmark = InferenceBenchmark(model)
+    stats = benchmark.benchmark(num_iterations=10)
+    # 测试ONNX导出
+    exporter = ONNXExporter(model)
+    exporter.export('./test_model.onnx', input_shape=(1, 4, 64, 64))
+    return optimized_model, stats
+if __name__ == '__main__':
+    optimized_model, stats = test_optimization()

src/inference/sampler.py ADDED Viewed

	@@ -0,0 +1,428 @@

+import torch
+import torch.nn as nn
+from typing import Optional, Tuple, List, Union
+import numpy as np
+from tqdm import tqdm
+import math
+class DDIMSampler:
+    """DDIM采样器"""
+    def __init__(self, model: nn.Module, diffusion, num_inference_steps: int = 50):
+        self.model = model
+        self.diffusion = diffusion
+        self.num_inference_steps = num_inference_steps
+        # 设置时间步
+        self.set_timesteps(num_inference_steps)
+    def set_timesteps(self, num_inference_steps: int):
+        """设置推理时间步"""
+        self.num_inference_steps = num_inference_steps
+        # 选择时间步
+        if self.diffusion.num_train_timesteps == num_inference_steps:
+            timesteps = np.arange(0, self.diffusion.num_train_timesteps)
+        else:
+            step_ratio = self.diffusion.num_train_timesteps // num_inference_steps
+            timesteps = np.arange(0, self.diffusion.num_train_timesteps, step_ratio)
+        self.timesteps = torch.from_numpy(timesteps).long().flip(0)
+    @torch.no_grad()
+    def step(
+        self,
+        model_output: torch.Tensor,
+        timestep: int,
+        sample: torch.Tensor,
+        eta: float = 0.0,
+        use_clipped_model_output: bool = False
+    ) -> torch.Tensor:
+        """DDIM单步采样"""
+        # 获取当前和上一个时间步
+        prev_timestep = timestep - self.diffusion.num_train_timesteps // self.num_inference_steps
+        # 提取alpha参数
+        alpha_prod_t = self.diffusion.extract(self.diffusion.alphas_cumprod, timestep, sample.shape)
+        alpha_prod_t_prev = self.diffusion.extract(
+            self.diffusion.alphas_cumprod,
+            prev_timestep,
+            sample.shape
+        ) if prev_timestep >= 0 else torch.ones_like(alpha_prod_t)
+        # 根据预测类型处理模型输出
+        if self.diffusion.prediction_type == "epsilon":
+            pred_original_sample = (sample - (1 - alpha_prod_t) ** 0.5 * model_output) / alpha_prod_t ** 0.5
+            pred_epsilon = model_output
+        elif self.diffusion.prediction_type == "sample":
+            pred_original_sample = model_output
+            pred_epsilon = (sample - alpha_prod_t ** 0.5 * pred_original_sample) / (1 - alpha_prod_t) ** 0.5
+        elif self.diffusion.prediction_type == "v_prediction":
+            pred_original_sample = (alpha_prod_t ** 0.5) * sample - (1 - alpha_prod_t) ** 0.5 * model_output
+            pred_epsilon = (alpha_prod_t ** 0.5) * model_output + (1 - alpha_prod_t) ** 0.5 * sample
+        else:
+            raise ValueError(f"Unsupported prediction type: {self.diffusion.prediction_type}")
+        # 裁剪预测的原始样本
+        if use_clipped_model_output:
+            pred_original_sample = torch.clamp(pred_original_sample, -1, 1)
+        # 计算x_t-1的方差
+        variance = (1 - alpha_prod_t_prev) / (1 - alpha_prod_t) * (1 - alpha_prod_t / alpha_prod_t_prev)
+        std_dev_t = eta * variance ** 0.5
+        # 当eta > 0时，使用随机采样
+        if eta > 0:
+            noise = torch.randn_like(model_output)
+            variance = std_dev_t ** 2
+        else:
+            noise = 0
+            variance = 0
+        # 计算x_t-1的均值
+        pred_sample_direction = (1 - alpha_prod_t_prev - variance) ** 0.5 * pred_epsilon
+        prev_sample = alpha_prod_t_prev ** 0.5 * pred_original_sample + pred_sample_direction
+        # 添加噪声
+        if eta > 0:
+            prev_sample = prev_sample + std_dev_t * noise
+        return prev_sample
+    @torch.no_grad()
+    def sample(
+        self,
+        prompt_embeds: torch.Tensor,
+        negative_prompt_embeds: Optional[torch.Tensor] = None,
+        height: int = 512,
+        width: int = 512,
+        num_images_per_prompt: int = 1,
+        guidance_scale: float = 7.5,
+        eta: float = 0.0,
+        generator: Optional[torch.Generator] = None,
+        progress_bar: bool = True
+    ) -> torch.Tensor:
+        """生成样本"""
+        # 设置模型为评估模式
+        self.model.eval()
+        # 批次大小
+        batch_size = prompt_embeds.shape[0]
+        # 初始化潜在表示
+        latents = torch.randn(
+            (batch_size * num_images_per_prompt, self.model.in_channels, height // 8, width // 8),
+            device=prompt_embeds.device,
+            generator=generator
+        )
+        # 准备额外的条件
+        if negative_prompt_embeds is not None:
+            prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds])
+        # 分类器自由引导的缩放因子
+        do_classifier_free_guidance = guidance_scale > 1.0
+        if do_classifier_free_guidance:
+            latents = torch.cat([latents] * 2)
+        # 采样循环
+        timesteps = self.timesteps.to(latents.device)
+        if progress_bar:
+            timesteps = tqdm(timesteps, desc="DDIM Sampling")
+        for i, t in enumerate(timesteps):
+            # 扩展潜在表示以匹配引导的批次大小
+            latent_model_input = torch.cat([latents] * 2) if do_classifier_free_guidance else latents
+            latent_model_input = self.diffusion.scale_model_input(latent_model_input, t)
+            # 预测噪声
+            noise_pred = self.model(latent_model_input, t, prompt_embeds)
+            # 执行分类器自由引导
+            if do_classifier_free_guidance:
+                noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
+                noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
+            # 计算上一个样本
+            latents = self.step(noise_pred, t, latents, eta)
+        return latents
+class DPMSampler:
+    """DPM采样器（更快）"""
+    def __init__(self, model: nn.Module, diffusion, num_inference_steps: int = 20):
+        self.model = model
+        self.diffusion = diffusion
+        self.num_inference_steps = num_inference_steps
+    @torch.no_grad()
+    def sample(
+        self,
+        prompt_embeds: torch.Tensor,
+        height: int = 512,
+        width: int = 512,
+        guidance_scale: float = 7.5,
+        progress_bar: bool = True
+    ) -> torch.Tensor:
+        """DPM采样"""
+        self.model.eval()
+        # 初始化潜在表示
+        latents = torch.randn(
+            (1, self.model.in_channels, height // 8, width // 8),
+            device=prompt_embeds.device
+        )
+        # 简化的DPM采样
+        timesteps = torch.linspace(1, 0, self.num_inference_steps + 1, device=latents.device)
+        if progress_bar:
+            timesteps_iter = tqdm(enumerate(timesteps[:-1]), total=len(timesteps)-1, desc="DPM Sampling")
+        else:
+            timesteps_iter = enumerate(timesteps[:-1])
+        for i, t in timesteps_iter:
+            # 预测噪声
+            noise_pred = self.model(latents, t.unsqueeze(0) * 999, prompt_embeds)
+            # 应用引导
+            if guidance_scale > 1.0:
+                # 简单引导
+                noise_pred = noise_pred * guidance_scale
+            # DPM更新步骤
+            dt = timesteps[i + 1] - t
+            latents = latents + dt * noise_pred
+        return latents
+class LCMSampler:
+    """LCM（潜在一致性模型）采样器，极快"""
+    def __init__(self, model: nn.Module, diffusion, num_inference_steps: int = 4):
+        self.model = model
+        self.diffusion = diffusion
+        self.num_inference_steps = num_inference_steps
+        # LCM特定的参数
+        self.c_skip = 1.0
+        self.c_out = 1.0
+        self.c_in = 1.0
+        self.c_noise = 1.0
+    @torch.no_grad()
+    def sample(
+        self,
+        prompt_embeds: torch.Tensor,
+        height: int = 512,
+        width: int = 512,
+        guidance_scale: float = 7.5,
+        progress_bar: bool = True
+    ) -> torch.Tensor:
+        """LCM采样（极快，只需要4-8步）"""
+        self.model.eval()
+        # 初始化潜在表示
+        latents = torch.randn(
+            (1, self.model.in_channels, height // 8, width // 8),
+            device=prompt_embeds.device
+        )
+        # LCM采样循环
+        timesteps = torch.linspace(1, 0, self.num_inference_steps + 1, device=latents.device)
+        if progress_bar:
+            timesteps_iter = tqdm(enumerate(timesteps[:-1]), total=len(timesteps)-1, desc="LCM Sampling")
+        else:
+            timesteps_iter = enumerate(timesteps[:-1])
+        for i, t in timesteps_iter:
+            # LCM特定的缩放
+            c_skip = self.c_skip
+            c_out = self.c_out
+            c_in = self.c_in
+            c_noise = self.c_noise
+            # 缩放输入
+            scaled_latents = c_in * latents
+            # 预测
+            noise_pred = self.model(scaled_latents, c_noise * t.unsqueeze(0), prompt_embeds)
+            # LCM更新规则
+            denoised = c_skip * latents + c_out * noise_pred
+            # 更新潜在表示
+            dt = timesteps[i + 1] - t
+            latents = denoised + dt * noise_pred
+        return latents
+class SamplerFactory:
+    """采样器工厂"""
+    @staticmethod
+    def create_sampler(
+        sampler_type: str,
+        model: nn.Module,
+        diffusion,
+        num_inference_steps: int = 50
+    ):
+        """创建采样器"""
+        if sampler_type == "ddim":
+            return DDIMSampler(model, diffusion, num_inference_steps)
+        elif sampler_type == "dpm":
+            return DPMSampler(model, diffusion, num_inference_steps)
+        elif sampler_type == "lcm":
+            return LCMSampler(model, diffusion, num_inference_steps)
+        else:
+            raise ValueError(f"未知的采样器类型: {sampler_type}")
+class TextToImagePipeline:
+    """文本到图像管道"""
+    def __init__(
+        self,
+        model: nn.Module,
+        diffusion,
+        text_encoder,
+        vae_decoder,
+        sampler_type: str = "ddim",
+        device: str = "cuda"
+    ):
+        self.model = model.to(device)
+        self.diffusion = diffusion
+        self.text_encoder = text_encoder
+        self.vae_decoder = vae_decoder
+        self.sampler_type = sampler_type
+        self.device = device
+        # 创建采样器
+        self.sampler = SamplerFactory.create_sampler(
+            sampler_type, model, diffusion
+        )
+        # 设置为评估模式
+        self.model.eval()
+        if self.vae_decoder is not None:
+            self.vae_decoder.eval()
+    @torch.no_grad()
+    def __call__(
+        self,
+        prompt: str,
+        negative_prompt: str = "",
+        height: int = 512,
+        width: int = 512,
+        num_inference_steps: int = 50,
+        guidance_scale: float = 7.5,
+        num_images: int = 1,
+        seed: Optional[int] = None,
+        progress_bar: bool = True
+    ) -> List:
+        """生成图像"""
+        # 设置随机种子
+        if seed is not None:
+            torch.manual_seed(seed)
+            if torch.cuda.is_available():
+                torch.cuda.manual_seed(seed)
+        # 编码提示
+        prompt_embeds = self.text_encoder.encode([prompt]).to(self.device)
+        negative_prompt_embeds = None
+        if negative_prompt:
+            negative_prompt_embeds = self.text_encoder.encode([negative_prompt]).to(self.device)
+        # 生成潜在表示
+        latents = self.sampler.sample(
+            prompt_embeds=prompt_embeds,
+            negative_prompt_embeds=negative_prompt_embeds,
+            height=height,
+            width=width,
+            num_images_per_prompt=num_images,
+            guidance_scale=guidance_scale,
+            progress_bar=progress_bar
+        )
+        # 解码为图像
+        images = []
+        for i in range(num_images):
+            latent = latents[i:i+1]
+            if self.vae_decoder is not None:
+                image = self.vae_decoder(latent)
+            else:
+                # 如果没有VAE解码器，返回潜在表示
+                image = latent
+            images.append(image.cpu())
+        return images
+    def generate_grid(
+        self,
+        prompts: List[str],
+        grid_size: Tuple[int, int] = (2, 2),
+        **kwargs
+    ) -> torch.Tensor:
+        """生成图像网格"""
+        images = []
+        for prompt in prompts[:grid_size[0] * grid_size[1]]:
+            image = self(prompt, **kwargs)[0]
+            images.append(image)
+        # 创建网格
+        from torchvision.utils import make_grid
+        grid = make_grid(torch.cat(images, dim=0), nrow=grid_size[1])
+        return grid
+def test_sampler():
+    """测试采样器"""
+    import torch.nn as nn
+    # 创建模拟模型
+    class MockModel(nn.Module):
+        def __init__(self):
+            super().__init__()
+            self.in_channels = 4
+        def forward(self, x, t, context):
+            # 返回随机噪声
+            return torch.randn_like(x)
+    # 创建模拟扩散过程
+    class MockDiffusion:
+        def __init__(self):
+            self.num_train_timesteps = 1000
+            self.alphas_cumprod = torch.ones(1000)
+            self.prediction_type = "epsilon"
+        def extract(self, a, t, x_shape):
+            return torch.ones(x_shape[0], 1, 1, 1)
+        def scale_model_input(self, x, t):
+            return x
+    model = MockModel()
+    diffusion = MockDiffusion()
+    # 测试DDIM采样器
+    sampler = DDIMSampler(model, diffusion, num_inference_steps=10)
+    # 测试采样
+    prompt_embeds = torch.randn(1, 77, 768)
+    latents = sampler.sample(prompt_embeds, height=64, width=64, progress_bar=False)
+    print(f"DDIM采样完成，潜在表示形状: {latents.shape}")
+    return sampler, latents
+if __name__ == '__main__':
+    test_sampler()

src/models/attention.py ADDED Viewed

	@@ -0,0 +1,143 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import math
+from typing import Optional, Tuple
+class MemoryEfficientAttention(nn.Module):
+    """内存高效的多头注意力"""
+    def __init__(self, dim: int, num_heads: int = 8, qkv_bias: bool = False, dropout: float = 0.0):
+        super().__init__()
+        self.dim = dim
+        self.num_heads = num_heads
+        self.head_dim = dim // num_heads
+        self.scale = self.head_dim ** -0.5
+        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
+        self.proj = nn.Linear(dim, dim)
+        self.dropout = nn.Dropout(dropout)
+    def forward(self, x: torch.Tensor, mask: Optional[torch.Tensor] = None) -> torch.Tensor:
+        B, N, C = x.shape
+        # 计算QKV
+        qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, self.head_dim).permute(2, 0, 3, 1, 4)
+        q, k, v = qkv[0], qkv[1], qkv[2]
+        # 分块计算注意力，避免OOM
+        chunk_size = min(32, N)
+        attn_output = torch.zeros(B, self.num_heads, N, self.head_dim, device=x.device)
+        for i in range(0, N, chunk_size):
+            q_chunk = q[:, :, i:i+chunk_size, :]
+            # 计算注意力分数
+            attn_scores = torch.matmul(q_chunk, k.transpose(-2, -1)) * self.scale
+            if mask is not None:
+                attn_scores = attn_scores + mask
+            attn_probs = F.softmax(attn_scores, dim=-1)
+            attn_probs = self.dropout(attn_probs)
+            # 计算输出
+            attn_output[:, :, i:i+chunk_size, :] = torch.matmul(attn_probs, v)
+        # 合并多头
+        attn_output = attn_output.transpose(1, 2).reshape(B, N, C)
+        # 输出投影
+        output = self.proj(attn_output)
+        output = self.dropout(output)
+        return output
+class CrossAttention(nn.Module):
+    """交叉注意力（用于文本条件）"""
+    def __init__(self, query_dim: int, context_dim: int, num_heads: int = 8, dropout: float = 0.0):
+        super().__init__()
+        self.query_dim = query_dim
+        self.context_dim = context_dim
+        self.num_heads = num_heads
+        self.head_dim = query_dim // num_heads
+        self.scale = self.head_dim ** -0.5
+        self.to_q = nn.Linear(query_dim, query_dim)
+        self.to_k = nn.Linear(context_dim, query_dim)
+        self.to_v = nn.Linear(context_dim, query_dim)
+        self.proj = nn.Linear(query_dim, query_dim)
+        self.dropout = nn.Dropout(dropout)
+    def forward(self, x: torch.Tensor, context: torch.Tensor, mask: Optional[torch.Tensor] = None) -> torch.Tensor:
+        B, N, C = x.shape
+        # 计算Q
+        q = self.to_q(x).reshape(B, N, self.num_heads, self.head_dim).transpose(1, 2)
+        # 计算K, V
+        k = self.to_k(context).reshape(B, -1, self.num_heads, self.head_dim).transpose(1, 2)
+        v = self.to_v(context).reshape(B, -1, self.num_heads, self.head_dim).transpose(1, 2)
+        # 分块计算注意力
+        chunk_size = min(32, N)
+        attn_output = torch.zeros(B, self.num_heads, N, self.head_dim, device=x.device)
+        for i in range(0, N, chunk_size):
+            q_chunk = q[:, :, i:i+chunk_size, :]
+            # 计算注意力分数
+            attn_scores = torch.matmul(q_chunk, k.transpose(-2, -1)) * self.scale
+            if mask is not None:
+                attn_scores = attn_scores + mask
+            attn_probs = F.softmax(attn_scores, dim=-1)
+            attn_probs = self.dropout(attn_probs)
+            # 计算输出
+            attn_output[:, :, i:i+chunk_size, :] = torch.matmul(attn_probs, v)
+        # 合并多头
+        attn_output = attn_output.transpose(1, 2).reshape(B, N, C)
+        # 输出投影
+        output = self.proj(attn_output)
+        return output
+class FlashAttentionWrapper(nn.Module):
+    """FlashAttention包装器（如果可用）"""
+    def __init__(self, dim: int, num_heads: int = 8):
+        super().__init__()
+        self.dim = dim
+        self.num_heads = num_heads
+        try:
+            from flash_attn import flash_attn_qkvpacked_func
+            self.use_flash = True
+        except ImportError:
+            self.use_flash = False
+        if not self.use_flash:
+            self.attention = MemoryEfficientAttention(dim, num_heads)
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        if self.use_flash:
+            return self._flash_attention(x)
+        else:
+            return self.attention(x)
+    def _flash_attention(self, x: torch.Tensor) -> torch.Tensor:
+        # FlashAttention实现
+        B, N, C = x.shape
+        qkv = x.reshape(B, N, 3, self.num_heads, C // self.num_heads)
+        qkv = qkv.permute(2, 0, 3, 1, 4)  # [3, B, num_heads, N, head_dim]
+        from flash_attn import flash_attn_qkvpacked_func
+        output = flash_attn_qkvpacked_func(qkv)
+        output = output.reshape(B, N, C)
+        return output

src/models/diffusion.py ADDED Viewed

	@@ -0,0 +1,263 @@

+import torch
+import torch.nn as nn
+import numpy as np
+from typing import Optional, Tuple, Union
+import math
+class BetaScheduler:
+    """Beta调度器"""
+    @staticmethod
+    def linear(num_timesteps: int, beta_start: float = 0.0001, beta_end: float = 0.02) -> np.ndarray:
+        """线性调度"""
+        return np.linspace(beta_start, beta_end, num_timesteps, dtype=np.float32)
+    @staticmethod
+    def cosine(num_timesteps: int, s: float = 0.008) -> np.ndarray:
+        """余弦调度"""
+        steps = num_timesteps + 1
+        x = np.linspace(0, num_timesteps, steps)
+        alphas_cumprod = np.cos(((x / num_timesteps) + s) / (1 + s) * np.pi * 0.5) ** 2
+        alphas_cumprod = alphas_cumprod / alphas_cumprod[0]
+        betas = 1 - (alphas_cumprod[1:] / alphas_cumprod[:-1])
+        return np.clip(betas, 0, 0.999)
+    @staticmethod
+    def scaled_linear(num_timesteps: int) -> np.ndarray:
+        """缩放线性调度（Stable Diffusion默认）"""
+        beta_start = 0.00085
+        beta_end = 0.012
+        return np.linspace(beta_start**0.5, beta_end**0.5, num_timesteps) ** 2
+class DiffusionProcess:
+    """扩散过程管理"""
+    def __init__(self, config: dict):
+        self.config = config
+        diff_config = config.get('diffusion', {})
+        self.num_train_timesteps = diff_config.get('num_train_timesteps', 1000)
+        self.num_inference_timesteps = diff_config.get('num_inference_timesteps', 50)
+        self.beta_start = diff_config.get('beta_start', 0.00085)
+        self.beta_end = diff_config.get('beta_end', 0.012)
+        self.beta_schedule = diff_config.get('beta_schedule', 'scaled_linear')
+        self.prediction_type = diff_config.get('prediction_type', 'epsilon')
+        # 初始化调度参数
+        self._init_schedule()
+    def _init_schedule(self):
+        """初始化扩散调度参数"""
+        # 计算betas
+        if self.beta_schedule == "linear":
+            betas = BetaScheduler.linear(
+                self.num_train_timesteps,
+                self.beta_start,
+                self.beta_end
+            )
+        elif self.beta_schedule == "cosine":
+            betas = BetaScheduler.cosine(self.num_train_timesteps)
+        elif self.beta_schedule == "scaled_linear":
+            betas = BetaScheduler.scaled_linear(self.num_train_timesteps)
+        else:
+            raise ValueError(f"Unknown beta schedule: {self.beta_schedule}")
+        self.betas = torch.from_numpy(betas).float()
+        # 计算alphas
+        self.alphas = 1.0 - self.betas
+        self.alphas_cumprod = torch.cumprod(self.alphas, dim=0)
+        self.alphas_cumprod_prev = F.pad(self.alphas_cumprod[:-1], (1, 0), value=1.0)
+        # 计算扩散后验方差
+        self.variance = self.betas * (1.0 - self.alphas_cumprod_prev) / (1.0 - self.alphas_cumprod)
+        # 注册为buffer
+        self.register_buffer = lambda name, tensor: setattr(self, name, tensor)
+        self.register_buffer('betas', self.betas)
+        self.register_buffer('alphas', self.alphas)
+        self.register_buffer('alphas_cumprod', self.alphas_cumprod)
+        self.register_buffer('alphas_cumprod_prev', self.alphas_cumprod_prev)
+        self.register_buffer('variance', self.variance)
+        # 计算采样系数
+        self.register_buffer('sqrt_alphas_cumprod', torch.sqrt(self.alphas_cumprod))
+        self.register_buffer('sqrt_one_minus_alphas_cumprod', torch.sqrt(1.0 - self.alphas_cumprod))
+        self.register_buffer('log_one_minus_alphas_cumprod', torch.log(1.0 - self.alphas_cumprod))
+        self.register_buffer('sqrt_recip_alphas_cumprod', torch.sqrt(1.0 / self.alphas_cumprod))
+        self.register_buffer('sqrt_recipm1_alphas_cumprod', torch.sqrt(1.0 / self.alphas_cumprod - 1))
+    def q_sample(self, x_start: torch.Tensor, t: torch.Tensor, noise: Optional[torch.Tensor] = None) -> torch.Tensor:
+        """前向扩散过程：加噪"""
+        if noise is None:
+            noise = torch.randn_like(x_start)
+        sqrt_alphas_cumprod_t = self.extract(self.sqrt_alphas_cumprod, t, x_start.shape)
+        sqrt_one_minus_alphas_cumprod_t = self.extract(self.sqrt_one_minus_alphas_cumprod, t, x_start.shape)
+        return sqrt_alphas_cumprod_t * x_start + sqrt_one_minus_alphas_cumprod_t * noise
+    def extract(self, a: torch.Tensor, t: torch.Tensor, x_shape: Tuple[int, ...]) -> torch.Tensor:
+        """从张量a中提取索引t处的值"""
+        batch_size = t.shape[0]
+        out = a.gather(-1, t.cpu())
+        return out.reshape(batch_size, *((1,) * (len(x_shape) - 1))).to(t.device)
+    def get_loss_weight(self, snr: torch.Tensor, gamma: float = 5.0) -> torch.Tensor:
+        """根据SNR计算损失权重"""
+        if gamma is None:
+            return torch.ones_like(snr)
+        snr = torch.clamp(snr, min=1e-8)
+        min_snr = torch.tensor(gamma, device=snr.device)
+        weight = torch.minimum(snr, min_snr) / snr
+        return weight
+    def compute_snr(self, timesteps: torch.Tensor) -> torch.Tensor:
+        """计算信噪比(SNR)"""
+        alphas_cumprod = self.extract(self.alphas_cumprod, timesteps, timesteps.shape)
+        snr = alphas_cumprod / (1 - alphas_cumprod)
+        return snr
+class DDIMScheduler:
+    """DDIM采样器"""
+    def __init__(self, diffusion: DiffusionProcess):
+        self.diffusion = diffusion
+        self.num_train_timesteps = diffusion.num_train_timesteps
+        self.num_inference_timesteps = diffusion.num_inference_timesteps
+        # 设置时间步
+        self.set_timesteps(self.num_inference_timesteps)
+    def set_timesteps(self, num_inference_timesteps: int):
+        """设置推理时间步"""
+        self.num_inference_timesteps = num_inference_timesteps
+        # 选择时间步
+        if self.num_train_timesteps == self.num_inference_timesteps:
+            self.timesteps = torch.arange(0, self.num_train_timesteps).long()
+        else:
+            step_ratio = self.num_train_timesteps // self.num_inference_timesteps
+            self.timesteps = torch.arange(0, self.num_train_timesteps, step_ratio).long()
+        self.timesteps = self.timesteps.flip(0)  # 从T到0
+    @torch.no_grad()
+    def step(self, model_output: torch.Tensor, timestep: int, sample: torch.Tensor, eta: float = 0.0) -> torch.Tensor:
+        """DDIM单步采样"""
+        # 获取当前时间步的参数
+        prev_timestep = timestep - self.num_train_timesteps // self.num_inference_timesteps
+        # 提取alpha参数
+        alpha_prod_t = self.diffusion.extract(self.diffusion.alphas_cumprod, timestep, sample.shape)
+        alpha_prod_t_prev = self.diffusion.extract(
+            self.diffusion.alphas_cumprod,
+            prev_timestep,
+            sample.shape
+        ) if prev_timestep >= 0 else torch.ones_like(alpha_prod_t)
+        # 根据预测类型处理模型输出
+        if self.diffusion.prediction_type == "epsilon":
+            pred_original_sample = (sample - (1 - alpha_prod_t) ** 0.5 * model_output) / alpha_prod_t ** 0.5
+            pred_epsilon = model_output
+        elif self.diffusion.prediction_type == "sample":
+            pred_original_sample = model_output
+            pred_epsilon = (sample - alpha_prod_t ** 0.5 * pred_original_sample) / (1 - alpha_prod_t) ** 0.5
+        elif self.diffusion.prediction_type == "v_prediction":
+            pred_original_sample = (alpha_prod_t ** 0.5) * sample - (1 - alpha_prod_t) ** 0.5 * model_output
+            pred_epsilon = (alpha_prod_t ** 0.5) * model_output + (1 - alpha_prod_t) ** 0.5 * sample
+        else:
+            raise ValueError(f"Unsupported prediction type: {self.diffusion.prediction_type}")
+        # 计算x_t-1的方差
+        variance = (1 - alpha_prod_t_prev) / (1 - alpha_prod_t) * (1 - alpha_prod_t / alpha_prod_t_prev)
+        std_dev_t = eta * variance ** 0.5
+        # 计算x_t-1的均值
+        pred_sample_direction = (1 - alpha_prod_t_prev - std_dev_t**2) ** 0.5 * pred_epsilon
+        prev_sample = alpha_prod_t_prev ** 0.5 * pred_original_sample + pred_sample_direction
+        # 添加噪声
+        if eta > 0:
+            noise = torch.randn_like(model_output)
+            prev_sample = prev_sample + std_dev_t * noise
+        return prev_sample
+class DiffusionModel(nn.Module):
+    """扩散模型封装"""
+    def __init__(self, unet: nn.Module, diffusion: DiffusionProcess):
+        super().__init__()
+        self.unet = unet
+        self.diffusion = diffusion
+        self.scheduler = DDIMScheduler(diffusion)
+    def forward(self, x: torch.Tensor, timesteps: torch.Tensor, context: torch.Tensor) -> torch.Tensor:
+        """前向传播：预测噪声"""
+        return self.unet(x, timesteps, context)
+    def compute_loss(self, x_start: torch.Tensor, context: torch.Tensor, noise: Optional[torch.Tensor] = None) -> torch.Tensor:
+        """计算扩散损失"""
+        if noise is None:
+            noise = torch.randn_like(x_start)
+        # 随机采样时间步
+        batch_size = x_start.shape[0]
+        timesteps = torch.randint(
+            0, self.diffusion.num_train_timesteps,
+            (batch_size,), device=x_start.device
+        ).long()
+        # 前向扩散
+        x_noisy = self.diffusion.q_sample(x_start, timesteps, noise)
+        # 预测噪声
+        predicted_noise = self.unet(x_noisy, timesteps, context)
+        # 计算损失
+        loss = F.mse_loss(predicted_noise, noise)
+        return loss
+    @torch.no_grad()
+    def generate(
+        self,
+        context: torch.Tensor,
+        num_samples: int = 1,
+        height: int = 512,
+        width: int = 512,
+        guidance_scale: float = 7.5
+    ) -> torch.Tensor:
+        """生成图像"""
+        # 初始化噪声
+        latents = torch.randn(
+            (num_samples, self.unet.in_channels, height // 8, width // 8),
+            device=next(self.unet.parameters()).device
+        )
+        # DDIM采样
+        self.scheduler.set_timesteps(self.diffusion.num_inference_timesteps)
+        for t in self.scheduler.timesteps:
+            # 扩展latents以匹配批大小
+            latent_model_input = torch.cat([latents] * 2) if guidance_scale > 1.0 else latents
+            latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
+            # 预测噪声
+            timesteps = torch.full((num_samples,), t, device=latents.device).long()
+            if guidance_scale > 1.0:
+                timesteps = torch.cat([timesteps] * 2)
+            noise_pred = self.unet(latent_model_input, timesteps, context)
+            # 应用分类器自由引导
+            if guidance_scale > 1.0:
+                noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
+                noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
+            # 计算上一个样本
+            latents = self.scheduler.step(noise_pred, t, latents).prev_sample
+        return latents

src/models/unet_light.py ADDED Viewed

	@@ -0,0 +1,379 @@

+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from typing import Optional, Tuple, Union
+import math
+class TimestepEmbedding(nn.Module):
+    """时间步嵌入"""
+    def __init__(self, embedding_dim: int, time_embed_dim: int):
+        super().__init__()
+        self.embedding_dim = embedding_dim
+        self.time_embed_dim = time_embed_dim
+        self.mlp = nn.Sequential(
+            nn.Linear(embedding_dim, time_embed_dim),
+            nn.SiLU(),
+            nn.Linear(time_embed_dim, time_embed_dim)
+        )
+    def forward(self, timestep: torch.Tensor) -> torch.Tensor:
+        # 正弦位置编码
+        half_dim = self.embedding_dim // 2
+        emb = math.log(10000) / (half_dim - 1)
+        emb = torch.exp(torch.arange(half_dim, device=timestep.device) * -emb)
+        emb = timestep[:, None] * emb[None, :]
+        emb = torch.cat([torch.sin(emb), torch.cos(emb)], dim=1)
+        if self.embedding_dim % 2 == 1:
+            emb = F.pad(emb, (0, 1, 0, 0))
+        return self.mlp(emb)
+class AttentionBlock(nn.Module):
+    """内存高效的注意力块"""
+    def __init__(self, channels: int, num_heads: int = 4, use_checkpoint: bool = True):
+        super().__init__()
+        self.channels = channels
+        self.num_heads = num_heads
+        self.use_checkpoint = use_checkpoint
+        self.norm = nn.GroupNorm(32, channels)
+        self.qkv = nn.Conv2d(channels, channels * 3, 1)
+        self.proj_out = nn.Conv2d(channels, channels, 1)
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        if self.use_checkpoint and self.training:
+            return torch.utils.checkpoint.checkpoint(self._forward, x)
+        return self._forward(x)
+    def _forward(self, x: torch.Tensor) -> torch.Tensor:
+        B, C, H, W = x.shape
+        qkv = self.qkv(self.norm(x))
+        q, k, v = qkv.chunk(3, dim=1)
+        # 分块计算注意力，避免OOM
+        chunk_size = min(32, H * W)
+        q = q.view(B, self.num_heads, C // self.num_heads, H * W).permute(0, 1, 3, 2)
+        k = k.view(B, self.num_heads, C // self.num_heads, H * W).permute(0, 1, 2, 3)
+        v = v.view(B, self.num_heads, C // self.num_heads, H * W).permute(0, 1, 3, 2)
+        # 分块计算
+        attn_output = torch.zeros_like(q)
+        for i in range(0, H * W, chunk_size):
+            q_chunk = q[:, :, i:i+chunk_size, :]
+            scores = torch.matmul(q_chunk, k) / math.sqrt(C // self.num_heads)
+            attn = F.softmax(scores, dim=-1)
+            attn_output[:, :, i:i+chunk_size, :] = torch.matmul(attn, v)
+        attn_output = attn_output.permute(0, 1, 3, 2).reshape(B, C, H, W)
+        return x + self.proj_out(attn_output)
+class ResNetBlock(nn.Module):
+    """残差块"""
+    def __init__(self, in_channels: int, out_channels: int, time_embed_dim: int, dropout: float = 0.0):
+        super().__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        # 第一个归一化+激活+卷积
+        self.norm1 = nn.GroupNorm(32, in_channels)
+        self.act1 = nn.SiLU()
+        self.conv1 = nn.Conv2d(in_channels, out_channels, 3, padding=1)
+        # 时间嵌入投影
+        self.time_emb_proj = nn.Linear(time_embed_dim, out_channels)
+        # 第二个归一化+激活+卷积
+        self.norm2 = nn.GroupNorm(32, out_channels)
+        self.act2 = nn.SiLU()
+        self.dropout = nn.Dropout(dropout)
+        self.conv2 = nn.Conv2d(out_channels, out_channels, 3, padding=1)
+        # 如果需要调整通道数
+        if in_channels != out_channels:
+            self.skip_conv = nn.Conv2d(in_channels, out_channels, 1)
+        else:
+            self.skip_conv = nn.Identity()
+    def forward(self, x: torch.Tensor, time_emb: torch.Tensor) -> torch.Tensor:
+        h = self.conv1(self.act1(self.norm1(x)))
+        # 添加时间嵌入
+        time_emb = self.time_emb_proj(F.silu(time_emb))
+        h = h + time_emb[:, :, None, None]
+        h = self.conv2(self.dropout(self.act2(self.norm2(h))))
+        return h + self.skip_conv(x)
+class CrossAttentionBlock(nn.Module):
+    """交叉注意力块（文本条件）"""
+    def __init__(self, query_dim: int, context_dim: int, num_heads: int = 4, use_checkpoint: bool = True):
+        super().__init__()
+        self.query_dim = query_dim
+        self.context_dim = context_dim
+        self.num_heads = num_heads
+        self.use_checkpoint = use_checkpoint
+        self.norm = nn.GroupNorm(32, query_dim)
+        self.to_q = nn.Linear(query_dim, query_dim)
+        self.to_k = nn.Linear(context_dim, query_dim)
+        self.to_v = nn.Linear(context_dim, query_dim)
+        self.to_out = nn.Linear(query_dim, query_dim)
+    def forward(self, x: torch.Tensor, context: torch.Tensor) -> torch.Tensor:
+        if self.use_checkpoint and self.training:
+            return torch.utils.checkpoint.checkpoint(self._forward, x, context)
+        return self._forward(x, context)
+    def _forward(self, x: torch.Tensor, context: torch.Tensor) -> torch.Tensor:
+        B, C, H, W = x.shape
+        x_reshaped = x.reshape(B, C, H * W).permute(0, 2, 1)
+        x_norm = self.norm(x_reshaped)
+        q = self.to_q(x_norm)
+        k = self.to_k(context)
+        v = self.to_v(context)
+        # 分头
+        q = q.view(B, -1, self.num_heads, C // self.num_heads).transpose(1, 2)
+        k = k.view(B, -1, self.num_heads, C // self.num_heads).transpose(1, 2)
+        v = v.view(B, -1, self.num_heads, C // self.num_heads).transpose(1, 2)
+        # 分块计算注意力
+        chunk_size = min(32, q.shape[2])
+        attn_output = torch.zeros_like(q)
+        for i in range(0, q.shape[2], chunk_size):
+            q_chunk = q[:, :, i:i+chunk_size, :]
+            scores = torch.matmul(q_chunk, k.transpose(-2, -1)) / math.sqrt(C // self.num_heads)
+            attn = F.softmax(scores, dim=-1)
+            attn_output[:, :, i:i+chunk_size, :] = torch.matmul(attn, v)
+        attn_output = attn_output.transpose(1, 2).reshape(B, -1, C)
+        attn_output = self.to_out(attn_output)
+        return (x_reshaped + attn_output).permute(0, 2, 1).reshape(B, C, H, W)
+class DownsampleBlock(nn.Module):
+    """下采样块"""
+    def __init__(self, channels: int, use_conv: bool = True):
+        super().__init__()
+        if use_conv:
+            self.op = nn.Conv2d(channels, channels, 3, stride=2, padding=1)
+        else:
+            self.op = nn.AvgPool2d(2)
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        return self.op(x)
+class UpsampleBlock(nn.Module):
+    """上采样块"""
+    def __init__(self, channels: int, use_conv: bool = True):
+        super().__init__()
+        self.channels = channels
+        if use_conv:
+            self.conv = nn.Conv2d(channels, channels, 3, padding=1)
+        else:
+            self.conv = nn.Identity()
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = F.interpolate(x, scale_factor=2, mode="nearest")
+        return self.conv(x)
+class UNetLight(nn.Module):
+    """轻量级UNet模型"""
+    def __init__(self, config: dict):
+        super().__init__()
+        self.config = config
+        model_config = config.get('model', {})
+        # 基本参数
+        self.in_channels = model_config.get('in_channels', 4)
+        self.out_channels = model_config.get('out_channels', 4)
+        self.base_channels = model_config.get('base_channels', 64)
+        self.channel_mults = model_config.get('channel_mults', [1, 2, 4, 8])
+        self.num_res_blocks = model_config.get('num_res_blocks', 2)
+        self.attention_resolutions = model_config.get('attention_resolutions', [8])
+        self.dropout = model_config.get('dropout', 0.0)
+        self.use_checkpoint = model_config.get('use_checkpoint', True)
+        self.num_heads = model_config.get('num_heads', 4)
+        # 条件参数
+        self.context_dim = model_config.get('context_dim', 768)
+        self.use_linear_projection = model_config.get('use_linear_projection', True)
+        # 时间嵌入
+        self.time_embed_dim = model_config.get('time_embed_dim', 256)
+        self.time_embed = TimestepEmbedding(self.base_channels, self.time_embed_dim)
+        # 输入卷积
+        self.input_conv = nn.Conv2d(self.in_channels, self.base_channels, 3, padding=1)
+        # 构建下采样路径
+        self.down_blocks = nn.ModuleList()
+        self.down_attention_blocks = nn.ModuleList()
+        self.downsample_blocks = nn.ModuleList()
+        in_ch = self.base_channels
+        resolution = 1
+        for i, mult in enumerate(self.channel_mults):
+            out_ch = self.base_channels * mult
+            resolution *= 2
+            # 残差块
+            for _ in range(self.num_res_blocks):
+                block = ResNetBlock(in_ch, out_ch, self.time_embed_dim, self.dropout)
+                self.down_blocks.append(block)
+                # 在指定分辨率添加注意力
+                if resolution in self.attention_resolutions:
+                    attn = CrossAttentionBlock(out_ch, self.context_dim, self.num_heads, self.use_checkpoint)
+                    self.down_attention_blocks.append(attn)
+                else:
+                    self.down_attention_blocks.append(None)
+                in_ch = out_ch
+            # 如果不是最后一层，添加下采样
+            if i != len(self.channel_mults) - 1:
+                downsample = DownsampleBlock(in_ch, use_conv=True)
+                self.downsample_blocks.append(downsample)
+        # 中间层
+        self.mid_block1 = ResNetBlock(in_ch, in_ch, self.time_embed_dim, self.dropout)
+        self.mid_attention = CrossAttentionBlock(in_ch, self.context_dim, self.num_heads, self.use_checkpoint)
+        self.mid_block2 = ResNetBlock(in_ch, in_ch, self.time_embed_dim, self.dropout)
+        # 构建上采样路径
+        self.up_blocks = nn.ModuleList()
+        self.up_attention_blocks = nn.ModuleList()
+        self.upsample_blocks = nn.ModuleList()
+        for i, mult in enumerate(reversed(self.channel_mults)):
+            out_ch = self.base_channels * mult
+            resolution //= 2
+            # 上采样块
+            if i > 0:
+                upsample = UpsampleBlock(in_ch, use_conv=True)
+                self.upsample_blocks.append(upsample)
+            # 残差块
+            for j in range(self.num_res_blocks + 1):
+                block_in_ch = in_ch * 2 if j == 0 else out_ch
+                block = ResNetBlock(block_in_ch, out_ch, self.time_embed_dim, self.dropout)
+                self.up_blocks.append(block)
+                # 在指定分辨率添加注意力
+                if resolution in self.attention_resolutions:
+                    attn = CrossAttentionBlock(out_ch, self.context_dim, self.num_heads, self.use_checkpoint)
+                    self.up_attention_blocks.append(attn)
+                else:
+                    self.up_attention_blocks.append(None)
+                in_ch = out_ch
+        # 输出层
+        self.norm_out = nn.GroupNorm(32, self.base_channels)
+        self.act_out = nn.SiLU()
+        self.output_conv = nn.Conv2d(self.base_channels, self.out_channels, 3, padding=1)
+        # 如果需要，初始化权重
+        self.apply(self._init_weights)
+    def _init_weights(self, module):
+        if isinstance(module, (nn.Conv2d, nn.Linear)):
+            nn.init.kaiming_normal_(module.weight, mode='fan_out', nonlinearity='relu')
+            if module.bias is not None:
+                nn.init.constant_(module.bias, 0)
+        elif isinstance(module, nn.GroupNorm):
+            nn.init.constant_(module.weight, 1)
+            nn.init.constant_(module.bias, 0)
+    def forward(self, x: torch.Tensor, timesteps: torch.Tensor, context: torch.Tensor) -> torch.Tensor:
+        # 时间嵌入
+        time_emb = self.time_embed(timesteps)
+        # 初始卷积
+        h = self.input_conv(x)
+        # 存储跳跃连接
+        down_h = []
+        # 下采样路径
+        down_idx = 0
+        attn_idx = 0
+        for i, mult in enumerate(self.channel_mults):
+            for j in range(self.num_res_blocks):
+                block = self.down_blocks[down_idx]
+                h = block(h, time_emb)
+                down_idx += 1
+                # 注意力
+                if self.down_attention_blocks[attn_idx] is not None:
+                    h = self.down_attention_blocks[attn_idx](h, context)
+                attn_idx += 1
+                down_h.append(h)
+            # 下采样（除了最后一层）
+            if i != len(self.channel_mults) - 1:
+                downsample = self.downsample_blocks[i]
+                h = downsample(h)
+        # 中间层
+        h = self.mid_block1(h, time_emb)
+        h = self.mid_attention(h, context)
+        h = self.mid_block2(h, time_emb)
+        # 上采样路径
+        up_idx = 0
+        attn_up_idx = 0
+        upsample_idx = 0
+        for i, mult in enumerate(reversed(self.channel_mults)):
+            # 上采样（除了第一层）
+            if i > 0:
+                upsample = self.upsample_blocks[upsample_idx]
+                h = upsample(h)
+                upsample_idx += 1
+            for j in range(self.num_res_blocks + 1):
+                # 拼接跳跃连接
+                if j == 0:
+                    skip = down_h.pop()
+                    h = torch.cat([h, skip], dim=1)
+                block = self.up_blocks[up_idx]
+                h = block(h, time_emb)
+                up_idx += 1
+                # 注意力
+                if self.up_attention_blocks[attn_up_idx] is not None:
+                    h = self.up_attention_blocks[attn_up_idx](h, context)
+                attn_up_idx += 1
+        # 输出层
+        h = self.norm_out(h)
+        h = self.act_out(h)
+        h = self.output_conv(h)
+        return h
+    def enable_gradient_checkpointing(self):
+        """启用梯度检查点"""
+        self.use_checkpoint = True
+        for module in self.modules():
+            if hasattr(module, 'use_checkpoint'):
+                module.use_checkpoint = True

src/training/callbacks.py ADDED Viewed

	@@ -0,0 +1,324 @@

+import torch
+from typing import Dict, Any, Optional, List
+import os
+from datetime import datetime
+import numpy as np
+from PIL import Image
+import torchvision.transforms as T
+class Callback:
+    """回调基类"""
+    def on_train_begin(self, trainer):
+        pass
+    def on_train_end(self, trainer):
+        pass
+    def on_epoch_begin(self, trainer, epoch):
+        pass
+    def on_epoch_end(self, trainer, epoch, train_loss, val_loss):
+        pass
+    def on_batch_begin(self, trainer, batch_idx, batch):
+        pass
+    def on_batch_end(self, trainer, batch_idx, batch, loss):
+        pass
+    def on_validation_begin(self, trainer):
+        pass
+    def on_validation_end(self, trainer, val_loss):
+        pass
+class EarlyStopping(Callback):
+    """早停回调"""
+    def __init__(self, patience: int = 10, min_delta: float = 1e-4):
+        self.patience = patience
+        self.min_delta = min_delta
+        self.best_loss = float('inf')
+        self.counter = 0
+        self.should_stop = False
+    def on_validation_end(self, trainer, val_loss):
+        if val_loss < self.best_loss - self.min_delta:
+            self.best_loss = val_loss
+            self.counter = 0
+        else:
+            self.counter += 1
+        if self.counter >= self.patience:
+            self.should_stop = True
+            print(f"早停触发，最佳损失: {self.best_loss:.4f}")
+class ModelCheckpoint(Callback):
+    """模型检查点回调"""
+    def __init__(
+        self,
+        save_dir: str = './checkpoints',
+        save_best_only: bool = True,
+        save_freq: int = 1,
+        monitor: str = 'val_loss',
+        mode: str = 'min'
+    ):
+        self.save_dir = save_dir
+        self.save_best_only = save_best_only
+        self.save_freq = save_freq
+        self.monitor = monitor
+        self.mode = mode
+        os.makedirs(save_dir, exist_ok=True)
+        self.best_value = float('inf') if mode == 'min' else -float('inf')
+    def on_epoch_end(self, trainer, epoch, train_loss, val_loss):
+        if epoch % self.save_freq != 0:
+            return
+        # 获取监控的值
+        if self.monitor == 'val_loss':
+            value = val_loss
+        elif self.monitor == 'train_loss':
+            value = train_loss
+        else:
+            value = val_loss
+        # 检查是否需要保存
+        should_save = False
+        if self.save_best_only:
+            if self.mode == 'min' and value < self.best_value:
+                self.best_value = value
+                should_save = True
+            elif self.mode == 'max' and value > self.best_value:
+                self.best_value = value
+                should_save = True
+        else:
+            should_save = True
+        if should_save:
+            # 保存检查点
+            checkpoint = {
+                'epoch': epoch,
+                'model_state_dict': trainer.model.state_dict(),
+                'optimizer_state_dict': trainer.optimizer.state_dict(),
+                'train_loss': train_loss,
+                'val_loss': val_loss,
+            }
+            if trainer.use_ema:
+                checkpoint['ema_model_state_dict'] = trainer.ema_model.state_dict()
+            filename = f'checkpoint_epoch_{epoch}.pt' if not self.save_best_only else 'best_model.pt'
+            save_path = os.path.join(self.save_dir, filename)
+            torch.save(checkpoint, save_path)
+            print(f"检查点已保存: {save_path}")
+class LearningRateSchedulerCallback(Callback):
+    """学习率调度回调"""
+    def __init__(self, scheduler, update_on: str = 'epoch'):
+        self.scheduler = scheduler
+        self.update_on = update_on  # 'epoch' 或 'batch'
+    def on_epoch_end(self, trainer, epoch, train_loss, val_loss):
+        if self.update_on == 'epoch':
+            self.scheduler.step()
+    def on_batch_end(self, trainer, batch_idx, batch, loss):
+        if self.update_on == 'batch':
+            self.scheduler.step()
+class TensorBoardLogger(Callback):
+    """TensorBoard日志记录器"""
+    def __init__(self, log_dir: str = './logs'):
+        from torch.utils.tensorboard import SummaryWriter
+        self.writer = SummaryWriter(log_dir=log_dir)
+        self.global_step = 0
+    def on_batch_end(self, trainer, batch_idx, batch, loss):
+        self.writer.add_scalar('train/loss', loss, self.global_step)
+        self.writer.add_scalar('train/lr', trainer.optimizer.param_groups[0]['lr'], self.global_step)
+        self.global_step += 1
+    def on_epoch_end(self, trainer, epoch, train_loss, val_loss):
+        self.writer.add_scalar('epoch/train_loss', train_loss, epoch)
+        self.writer.add_scalar('epoch/val_loss', val_loss, epoch)
+    def on_train_end(self, trainer):
+        self.writer.close()
+class SampleGeneratorCallback(Callback):
+    """样本生成回调"""
+    def __init__(
+        self,
+        sample_freq: int = 500,
+        num_samples: int = 4,
+        save_dir: str = './samples'
+    ):
+        self.sample_freq = sample_freq
+        self.num_samples = num_samples
+        self.save_dir = save_dir
+        os.makedirs(save_dir, exist_ok=True)
+    def on_batch_end(self, trainer, batch_idx, batch, loss):
+        if trainer.global_step % self.sample_freq != 0:
+            return
+        # 生成样本
+        trainer.model.eval()
+        with torch.no_grad():
+            # 使用验证集的提示
+            sample_batch = next(iter(trainer.val_loader))
+            text_embeddings = sample_batch['text_embeddings'][:self.num_samples].to(trainer.device)
+            # 生成潜在表示
+            latents = trainer.diffusion.generate(
+                context=text_embeddings,
+                num_samples=self.num_samples,
+                guidance_scale=7.5
+            )
+            # 保存样本
+            for i in range(self.num_samples):
+                sample_path = os.path.join(
+                    self.save_dir,
+                    f'step_{trainer.global_step}_sample_{i}.pt'
+                )
+                torch.save(latents[i].cpu(), sample_path)
+        trainer.model.train()
+class MemoryMonitorCallback(Callback):
+    """内存监控回调"""
+    def __init__(self, monitor_freq: int = 100):
+        self.monitor_freq = monitor_freq
+    def on_batch_end(self, trainer, batch_idx, batch, loss):
+        if trainer.global_step % self.monitor_freq == 0:
+            if hasattr(trainer, 'memory_manager'):
+                trainer.memory_manager.print_memory_stats()
+class GradientMonitorCallback(Callback):
+    """梯度监控回调"""
+    def __init__(self, monitor_freq: int = 100):
+        self.monitor_freq = monitor_freq
+    def on_batch_end(self, trainer, batch_idx, batch, loss):
+        if trainer.global_step % self.monitor_freq == 0:
+            grad_norm = self._compute_gradient_norm(trainer.model)
+            if hasattr(trainer, 'writer'):
+                trainer.writer.add_scalar('train/grad_norm', grad_norm, trainer.global_step)
+    def _compute_gradient_norm(self, model) -> float:
+        total_norm = 0.0
+        for p in model.parameters():
+            if p.grad is not None:
+                param_norm = p.grad.data.norm(2)
+                total_norm += param_norm.item() ** 2
+        return total_norm ** 0.5
+class CallbackHandler:
+    """回调处理器"""
+    def __init__(self):
+        self.callbacks = []
+    def add_callback(self, callback: Callback):
+        self.callbacks.append(callback)
+    def on_train_begin(self, trainer):
+        for callback in self.callbacks:
+            callback.on_train_begin(trainer)
+    def on_train_end(self, trainer):
+        for callback in self.callbacks:
+            callback.on_train_end(trainer)
+    def on_epoch_begin(self, trainer, epoch):
+        for callback in self.callbacks:
+            callback.on_epoch_begin(trainer, epoch)
+    def on_epoch_end(self, trainer, epoch, train_loss, val_loss):
+        for callback in self.callbacks:
+            callback.on_epoch_end(trainer, epoch, train_loss, val_loss)
+    def on_batch_begin(self, trainer, batch_idx, batch):
+        for callback in self.callbacks:
+            callback.on_batch_begin(trainer, batch_idx, batch)
+    def on_batch_end(self, trainer, batch_idx, batch, loss):
+        for callback in self.callbacks:
+            callback.on_batch_end(trainer, batch_idx, batch, loss)
+    def on_validation_begin(self, trainer):
+        for callback in self.callbacks:
+            callback.on_validation_begin(trainer)
+    def on_validation_end(self, trainer, val_loss):
+        for callback in self.callbacks:
+            callback.on_validation_end(trainer, val_loss)
+def create_default_callbacks(config: dict) -> CallbackHandler:
+    """创建默认回调"""
+    handler = CallbackHandler()
+    # 模型检查点
+    checkpoint_callback = ModelCheckpoint(
+        save_dir=config.get('checkpoint_dir', './checkpoints'),
+        save_best_only=config.get('save_best_model', True),
+        save_freq=config.get('save_checkpoint_every', 1),
+        monitor='val_loss',
+        mode='min'
+    )
+    handler.add_callback(checkpoint_callback)
+    # TensorBoard日志
+    if config.get('use_tensorboard', True):
+        tb_logger = TensorBoardLogger(
+            log_dir=config.get('log_dir', './logs')
+        )
+        handler.add_callback(tb_logger)
+    # 样本生成
+    if config.get('sample_steps', 500) > 0:
+        sample_callback = SampleGeneratorCallback(
+            sample_freq=config.get('sample_steps', 500),
+            num_samples=4,
+            save_dir=config.get('sample_dir', './samples')
+        )
+        handler.add_callback(sample_callback)
+    # 内存监控
+    memory_callback = MemoryMonitorCallback(
+        monitor_freq=config.get('log_steps', 50)
+    )
+    handler.add_callback(memory_callback)
+    # 梯度监控
+    grad_callback = GradientMonitorCallback(
+        monitor_freq=config.get('log_steps', 50)
+    )
+    handler.add_callback(grad_callback)
+    # 早停
+    if config.get('early_stopping', False):
+        early_stop = EarlyStopping(
+            patience=config.get('early_stopping_patience', 10),
+            min_delta=config.get('early_stopping_min_delta', 1e-4)
+        )
+        handler.add_callback(early_stop)
+    return handler

src/training/memory_manager.py ADDED Viewed

	@@ -0,0 +1,245 @@

+import torch
+import gc
+from typing import Optional
+import psutil
+import os
+class CPUMemoryManager:
+    """CPU内存管理器"""
+    def __init__(self, warning_threshold: float = 0.9):
+        """
+        参数:
+            warning_threshold: 内存使用率警告阈值 (0-1)
+        """
+        self.warning_threshold = warning_threshold
+    def get_memory_usage(self) -> tuple:
+        """获取内存使用情况"""
+        process = psutil.Process(os.getpid())
+        memory_info = process.memory_info()
+        # 获取系统内存信息
+        system_memory = psutil.virtual_memory()
+        return {
+            'process_rss_mb': memory_info.rss / 1024 / 1024,
+            'process_vms_mb': memory_info.vms / 1024 / 1024,
+            'system_total_mb': system_memory.total / 1024 / 1024,
+            'system_available_mb': system_memory.available / 1024 / 1024,
+            'system_used_percent': system_memory.percent
+        }
+    def check_memory(self) -> bool:
+        """检查内存使用是否安全"""
+        memory_info = self.get_memory_usage()
+        if memory_info['system_used_percent'] > self.warning_threshold * 100:
+            print(f"警告: 系统内存使用率过高: {memory_info['system_used_percent']:.1f}%")
+            return False
+        return True
+class OptimizerCPUOffload:
+    """优化器状态CPU卸载"""
+    def __init__(self, optimizer: torch.optim.Optimizer):
+        self.optimizer = optimizer
+        self.original_states = {}
+    def offload_to_cpu(self):
+        """将优化器状态卸载到CPU"""
+        for param_group in self.optimizer.param_groups:
+            for param in param_group['params']:
+                if param in self.optimizer.state:
+                    state = self.optimizer.state[param]
+                    for key in list(state.keys()):
+                        if torch.is_tensor(state[key]):
+                            # 移动到CPU并保留引用
+                            self.original_states[(param, key)] = state[key]
+                            state[key] = state[key].cpu()
+    def load_to_gpu(self, device: torch.device):
+        """将优化器状态加载回GPU"""
+        for param_group in self.optimizer.param_groups:
+            for param in param_group['params']:
+                if param in self.optimizer.state:
+                    state = self.optimizer.state[param]
+                    for key in list(state.keys()):
+                        if (param, key) in self.original_states:
+                            state[key] = self.original_states[(param, key)].to(device)
+                            del self.original_states[(param, key)]
+class ActivationCPUOffload:
+    """激活值CPU卸载"""
+    def __init__(self, model: torch.nn.Module):
+        self.model = model
+        self.hooks = []
+    def register_hooks(self):
+        """注册前向钩子来卸载激活值"""
+        def hook_fn(module, input, output):
+            if torch.is_tensor(output):
+                return output.cpu()
+            elif isinstance(output, tuple):
+                return tuple(x.cpu() if torch.is_tensor(x) else x for x in output)
+            return output
+        # 为每个模块注册钩子
+        for name, module in self.model.named_modules():
+            if isinstance(module, (torch.nn.Conv2d, torch.nn.Linear, torch.nn.GroupNorm)):
+                hook = module.register_forward_hook(hook_fn)
+                self.hooks.append(hook)
+    def remove_hooks(self):
+        """移除所有钩子"""
+        for hook in self.hooks:
+            hook.remove()
+        self.hooks = []
+class MemoryOptimizer:
+    """综合内存优化器"""
+    def __init__(self, config: dict):
+        self.config = config
+        # GPU内存管理
+        self.gpu_warning_threshold = config.get('warning_threshold_gb', 6.0) * 1024**3
+        self.gpu_critical_threshold = config.get('memory_threshold_gb', 6.5) * 1024**3
+        # CPU内存管理
+        self.cpu_manager = CPUMemoryManager(
+            warning_threshold=config.get('cpu_warning_threshold', 0.85)
+        )
+        # 清理频率
+        self.cleanup_frequency = config.get('cleanup_frequency', 100)
+        # 状态跟踪
+        self.optimizer_offloader = None
+        self.activation_offloader = None
+    def setup_model_optimizations(self, model: torch.nn.Module, optimizer: Optional[torch.optim.Optimizer] = None):
+        """设置模型优化"""
+        # 启用梯度检查点
+        if hasattr(model, 'enable_gradient_checkpointing'):
+            model.enable_gradient_checkpointing()
+        # 设置优化器CPU卸载
+        if optimizer is not None and self.config.get('optimizer_on_cpu', True):
+            self.optimizer_offloader = OptimizerCPUOffload(optimizer)
+        # 设置激活值CPU卸载
+        if self.config.get('cpu_offload', True):
+            self.activation_offloader = ActivationCPUOffload(model)
+            self.activation_offloader.register_hooks()
+        # 设置注意力分片
+        if self.config.get('attention_slicing', 'auto') == 'auto':
+            self._enable_attention_slicing(model)
+    def _enable_attention_slicing(self, model: torch.nn.Module):
+        """启用注意力分片"""
+        for module in model.modules():
+            if hasattr(module, 'set_attention_slice'):
+                module.set_attention_slice('auto')
+    def step_start(self):
+        """训练步骤开始时的内存管理"""
+        # 将优化器状态加载到GPU（如果需要）
+        if self.optimizer_offloader is not None:
+            device = next(self.optimizer_offloader.optimizer.param_groups[0]['params'][0].device)
+            self.optimizer_offloader.load_to_gpu(device)
+        # 检查内存
+        self.check_all_memory()
+    def step_end(self, step: int):
+        """训练步骤结束时的内存管理"""
+        # 将优化器状态卸载到CPU
+        if self.optimizer_offloader is not None:
+            self.optimizer_offloader.offload_to_cpu()
+        # 定期清理
+        if step % self.cleanup_frequency == 0:
+            self.cleanup()
+        # 检查内存
+        self.check_all_memory()
+    def check_all_memory(self):
+        """检查所有内存"""
+        # 检查GPU内存
+        gpu_allocated = torch.cuda.memory_allocated()
+        if gpu_allocated > self.gpu_critical_threshold:
+            self._handle_gpu_oom()
+        elif gpu_allocated > self.gpu_warning_threshold:
+            print(f"GPU内存警告: {gpu_allocated / 1024**3:.2f} GB")
+        # 检查CPU内存
+        if not self.cpu_manager.check_memory():
+            self._handle_cpu_oom()
+    def _handle_gpu_oom(self):
+        """处理GPU OOM"""
+        print("GPU内存不足，尝试清理...")
+        self.cleanup(force=True)
+        # 如果仍然不足，抛出异常
+        if torch.cuda.memory_allocated() > self.gpu_critical_threshold:
+            raise RuntimeError("GPU内存不足，无法继续训练")
+    def _handle_cpu_oom(self):
+        """处理CPU OOM"""
+        print("CPU内存不足，尝试清理...")
+        gc.collect()
+    def cleanup(self, force: bool = False):
+        """清理内存"""
+        gc.collect()
+        if torch.cuda.is_available():
+            torch.cuda.empty_cache()
+            # 如果强制清理，尝试更激进的清理
+            if force:
+                torch.cuda.synchronize()
+                torch.cuda.ipc_collect()
+    def get_memory_stats(self) -> dict:
+        """获取内存统计信息"""
+        stats = {}
+        # GPU统计
+        if torch.cuda.is_available():
+            stats['gpu'] = {
+                'allocated_gb': torch.cuda.memory_allocated() / 1024**3,
+                'reserved_gb': torch.cuda.memory_reserved() / 1024**3,
+                'max_allocated_gb': torch.cuda.max_memory_allocated() / 1024**3,
+            }
+        # CPU统计
+        cpu_stats = self.cpu_manager.get_memory_usage()
+        stats['cpu'] = cpu_stats
+        return stats
+    def print_memory_stats(self):
+        """打印内存统计信息"""
+        stats = self.get_memory_stats()
+        print("=" * 50)
+        print("内存使用统计:")
+        if 'gpu' in stats:
+            gpu = stats['gpu']
+            print(f"GPU - 已分配: {gpu['allocated_gb']:.2f} GB, "
+                  f"已保留: {gpu['reserved_gb']:.2f} GB, "
+                  f"最大分配: {gpu['max_allocated_gb']:.2f} GB")
+        if 'cpu' in stats:
+            cpu = stats['cpu']
+            print(f"CPU - 进程RSS: {cpu['process_rss_mb']:.1f} MB, "
+                  f"系统使用率: {cpu['system_used_percent']:.1f}%")
+        print("=" * 50)

src/training/trainer_p4.py ADDED Viewed

	@@ -0,0 +1,378 @@

+import torch
+import torch.nn as nn
+from torch.cuda.amp import autocast, GradScaler
+from torch.utils.data import DataLoader
+from typing import Optional, Dict, Any
+import wandb
+import os
+from tqdm import tqdm
+import numpy as np
+class MemoryManager:
+    """P4显存管理器"""
+    def __init__(self, config: dict):
+        self.config = config
+        self.warning_threshold = config.get('warning_threshold_gb', 6.0) * 1024**3
+        self.critical_threshold = config.get('memory_threshold_gb', 6.5) * 1024**3
+        self.cleanup_frequency = config.get('cleanup_frequency', 100)
+    def check_memory(self, step: int):
+        """检查显存使用情况"""
+        if step % self.cleanup_frequency == 0:
+            self.cleanup()
+        allocated = torch.cuda.memory_allocated()
+        if allocated > self.critical_threshold:
+            raise RuntimeError(f"显存超出临界阈值: {allocated / 1024**3:.2f} GB")
+        elif allocated > self.warning_threshold:
+            print(f"警告: 显存使用较高: {allocated / 1024**3:.2f} GB")
+    def cleanup(self):
+        """清理显存"""
+        import gc
+        gc.collect()
+        torch.cuda.empty_cache()
+    def auto_adjust_batch_size(self, model: nn.Module, data_shape: tuple) -> int:
+        """自动调整批次大小"""
+        max_batch = 1
+        device = next(model.parameters()).device
+        for batch_size in [1, 2, 4, 8]:
+            try:
+                # 测试内存
+                dummy_input = torch.randn(batch_size, *data_shape, device=device)
+                dummy_timestep = torch.randint(0, 1000, (batch_size,), device=device)
+                dummy_context = torch.randn(batch_size, 77, 768, device=device)
+                with torch.no_grad():
+                    _ = model(dummy_input, dummy_timestep, dummy_context)
+                torch.cuda.empty_cache()
+                max_batch = batch_size
+            except RuntimeError as e:
+                if "CUDA out of memory" in str(e):
+                    break
+                else:
+                    raise e
+        return max_batch
+class GradientAccumulationScheduler:
+    """梯度累积调度器"""
+    def __init__(self, config: dict):
+        self.initial_steps = config.get('gradient_accumulation_steps', 8)
+        self.current_steps = self.initial_steps
+        self.warmup_epochs = config.get('warmup_epochs', 5)
+    def update(self, epoch: int):
+        """根据epoch更新累积步数"""
+        if epoch < self.warmup_epochs:
+            self.current_steps = self.initial_steps
+        else:
+            # 逐步减少累积步数以加快训练
+            self.current_steps = max(4, self.current_steps // 2)
+class P4Trainer:
+    """针对P4优化的训练器"""
+    def __init__(
+        self,
+        model: nn.Module,
+        diffusion: DiffusionProcess,
+        optimizer: torch.optim.Optimizer,
+        train_loader: DataLoader,
+        val_loader: Optional[DataLoader],
+        config: dict,
+        device: torch.device
+    ):
+        self.model = model
+        self.diffusion = diffusion
+        self.optimizer = optimizer
+        self.train_loader = train_loader
+        self.val_loader = val_loader
+        self.config = config
+        self.device = device
+        # 训练状态
+        self.current_epoch = 0
+        self.global_step = 0
+        self.best_loss = float('inf')
+        # 初始化工具
+        self.memory_manager = MemoryManager(config)
+        self.grad_scheduler = GradientAccumulationScheduler(config)
+        # 混合精度训练
+        self.use_amp = config.get('mixed_precision', 'fp16') != 'no'
+        self.scaler = GradScaler(enabled=self.use_amp)
+        # 学习率调度器
+        self.lr_scheduler = self._create_lr_scheduler(config)
+        # EMA模型
+        self.use_ema = config.get('use_ema', True)
+        if self.use_ema:
+            self.ema_model = self._create_ema_model(model, config.get('ema_decay', 0.9999))
+        # 日志记录
+        self.use_wandb = config.get('use_wandb', False)
+        self.log_dir = config.get('log_dir', './logs')
+        os.makedirs(self.log_dir, exist_ok=True)
+        # 检查点
+        self.checkpoint_dir = config.get('checkpoint_dir', './checkpoints')
+        os.makedirs(self.checkpoint_dir, exist_ok=True)
+    def _create_lr_scheduler(self, config: dict):
+        """创建学习率调度器"""
+        scheduler_type = config.get('learning_rate_scheduler', 'cosine')
+        warmup_steps = config.get('warmup_steps', 1000)
+        if scheduler_type == 'cosine':
+            scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
+                self.optimizer,
+                T_max=config.get('max_epochs', 50) * len(self.train_loader),
+                eta_min=1e-6
+            )
+        elif scheduler_type == 'linear':
+            scheduler = torch.optim.lr_scheduler.LinearLR(
+                self.optimizer,
+                start_factor=0.01,
+                total_iters=warmup_steps
+            )
+        else:
+            scheduler = torch.optim.lr_scheduler.ConstantLR(self.optimizer, factor=1.0)
+        return scheduler
+    def _create_ema_model(self, model: nn.Module, decay: float):
+        """创建EMA模型"""
+        from torch.optim.swa_utils import AveragedModel
+        return AveragedModel(model, device=self.device, avg_fn=lambda avg, new, decay=decay: decay * avg + (1 - decay) * new)
+    def train_epoch(self) -> float:
+        """训练一个epoch"""
+        self.model.train()
+        total_loss = 0.0
+        num_batches = len(self.train_loader)
+        # 梯度累积
+        accumulation_steps = self.grad_scheduler.current_steps
+        self.optimizer.zero_grad()
+        pbar = tqdm(self.train_loader, desc=f"Epoch {self.current_epoch}")
+        for batch_idx, batch in enumerate(pbar):
+            # 将数据移到设备
+            images = batch['images'].to(self.device)
+            text_embeddings = batch['text_embeddings'].to(self.device)
+            # 混合精度前向传播
+            with autocast(enabled=self.use_amp):
+                loss = self.diffusion.compute_loss(images, text_embeddings)
+                loss = loss / accumulation_steps
+            # 反向传播
+            self.scaler.scale(loss).backward()
+            # 梯度累积更新
+            if (batch_idx + 1) % accumulation_steps == 0:
+                # 梯度裁剪
+                self.scaler.unscale_(self.optimizer)
+                torch.nn.utils.clip_grad_norm_(
+                    self.model.parameters(),
+                    max_norm=self.config.get('gradient_clip', 1.0)
+                )
+                # 更新参数
+                self.scaler.step(self.optimizer)
+                self.scaler.update()
+                self.optimizer.zero_grad()
+                # 更新EMA模型
+                if self.use_ema:
+                    self.ema_model.update_parameters(self.model)
+                # 更新学习率
+                self.lr_scheduler.step()
+                self.global_step += 1
+            # 记录损失
+            total_loss += loss.item() * accumulation_steps
+            current_loss = total_loss / (batch_idx + 1)
+            # 更新进度条
+            pbar.set_postfix({
+                'loss': f'{current_loss:.4f}',
+                'lr': f'{self.optimizer.param_groups[0]["lr"]:.2e}'
+            })
+            # 记录日志
+            if self.global_step % self.config.get('log_steps', 50) == 0:
+                self._log_metrics({
+                    'train/loss': current_loss,
+                    'train/lr': self.optimizer.param_groups[0]['lr'],
+                    'train/grad_norm': self._get_grad_norm(),
+                })
+            # 生成样本
+            if self.global_step % self.config.get('sample_steps', 500) == 0:
+                self._generate_samples()
+            # 显存管理
+            self.memory_manager.check_memory(self.global_step)
+        epoch_loss = total_loss / num_batches
+        return epoch_loss
+    @torch.no_grad()
+    def validate(self) -> float:
+        """验证"""
+        if self.val_loader is None:
+            return float('inf')
+        self.model.eval()
+        total_loss = 0.0
+        for batch in tqdm(self.val_loader, desc="Validation"):
+            images = batch['images'].to(self.device)
+            text_embeddings = batch['text_embeddings'].to(self.device)
+            with autocast(enabled=self.use_amp):
+                loss = self.diffusion.compute_loss(images, text_embeddings)
+            total_loss += loss.item()
+        val_loss = total_loss / len(self.val_loader)
+        # 记录验证指标
+        self._log_metrics({'val/loss': val_loss})
+        return val_loss
+    def train(self, num_epochs: Optional[int] = None):
+        """训练循环"""
+        if num_epochs is None:
+            num_epochs = self.config.get('max_epochs', 50)
+        for epoch in range(self.current_epoch, num_epochs):
+            self.current_epoch = epoch
+            # 更新梯度累积策略
+            self.grad_scheduler.update(epoch)
+            # 训练一个epoch
+            train_loss = self.train_epoch()
+            # 验证
+            val_loss = self.validate()
+            # 保存最佳模型
+            if val_loss < self.best_loss:
+                self.best_loss = val_loss
+                self.save_checkpoint('best_model.pt')
+            # 定期保存检查点
+            if (epoch + 1) % self.config.get('save_checkpoint_every', 5) == 0:
+                self.save_checkpoint(f'checkpoint_epoch_{epoch+1}.pt')
+            # 记录epoch指标
+            self._log_metrics({
+                'epoch/train_loss': train_loss,
+                'epoch/val_loss': val_loss,
+                'epoch': epoch
+            })
+            print(f"Epoch {epoch+1}: Train Loss = {train_loss:.4f}, Val Loss = {val_loss:.4f}")
+    def _get_grad_norm(self) -> float:
+        """计算梯度范数"""
+        total_norm = 0.0
+        for p in self.model.parameters():
+            if p.grad is not None:
+                param_norm = p.grad.data.norm(2)
+                total_norm += param_norm.item() ** 2
+        return total_norm ** 0.5
+    @torch.no_grad()
+    def _generate_samples(self):
+        """生成样本用于监控"""
+        self.model.eval()
+        # 使用验证集的前几个提示
+        sample_batch = next(iter(self.val_loader))
+        text_embeddings = sample_batch['text_embeddings'][:4].to(self.device)
+        # 生成样本
+        with autocast(enabled=self.use_amp):
+            latents = self.diffusion.generate(
+                context=text_embeddings,
+                num_samples=4,
+                guidance_scale=7.5
+            )
+        # 解码为图像
+        # 这里需要VAE解码器，暂时保存潜在表示
+        if self.use_wandb:
+            wandb.log({
+                'samples/latents': wandb.Image(latents[0].cpu().numpy())
+            })
+    def _log_metrics(self, metrics: Dict[str, Any]):
+        """记录指标"""
+        if self.use_wandb:
+            wandb.log(metrics)
+        # 同时记录到本地文件
+        log_file = os.path.join(self.log_dir, 'training_log.csv')
+        with open(log_file, 'a') as f:
+            if self.global_step == 0:
+                header = ','.join(['step'] + list(metrics.keys()))
+                f.write(header + '\n')
+            values = ','.join([str(self.global_step)] + [str(v) for v in metrics.values()])
+            f.write(values + '\n')
+    def save_checkpoint(self, filename: str):
+        """保存检查点"""
+        checkpoint = {
+            'epoch': self.current_epoch,
+            'global_step': self.global_step,
+            'model_state_dict': self.model.state_dict(),
+            'optimizer_state_dict': self.optimizer.state_dict(),
+            'scaler_state_dict': self.scaler.state_dict(),
+            'best_loss': self.best_loss,
+            'config': self.config
+        }
+        if self.use_ema:
+            checkpoint['ema_model_state_dict'] = self.ema_model.state_dict()
+        save_path = os.path.join(self.checkpoint_dir, filename)
+        torch.save(checkpoint, save_path)
+        # 如果启用压缩，保存压缩版本
+        if self.config.get('save_compressed', True):
+            torch.save(checkpoint, save_path.replace('.pt', '_compressed.pt'), _use_new_zipfile_serialization=True)
+        print(f"检查点已保存: {save_path}")
+    def load_checkpoint(self, checkpoint_path: str):
+        """加载检查点"""
+        checkpoint = torch.load(checkpoint_path, map_location=self.device)
+        self.model.load_state_dict(checkpoint['model_state_dict'])
+        self.optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
+        self.scaler.load_state_dict(checkpoint['scaler_state_dict'])
+        if self.use_ema and 'ema_model_state_dict' in checkpoint:
+            self.ema_model.load_state_dict(checkpoint['ema_model_state_dict'])
+        self.current_epoch = checkpoint['epoch']
+        self.global_step = checkpoint['global_step']
+        self.best_loss = checkpoint['best_loss']
+        print(f"已加载检查点: {checkpoint_path}")

tests/test_basic.py ADDED Viewed

	@@ -0,0 +1,250 @@

+#!/usr/bin/env python3
+"""
+基础测试
+测试项目的基本功能
+"""
+import os
+import sys
+import torch
+import torch.nn as nn
+import numpy as np
+# 添加项目根目录到Python路径
+sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
+from src.models.unet_light import UNetLight, TimestepEmbedding, ResNetBlock
+from src.models.attention import MemoryEfficientAttention
+from src.models.diffusion import DiffusionProcess
+def test_timestep_embedding():
+    """测试时间步嵌入"""
+    print("测试时间步嵌入...")
+    embedding_dim = 256
+    time_embed_dim = 512
+    embedder = TimestepEmbedding(embedding_dim, time_embed_dim)
+    # 测试前向传播
+    timesteps = torch.tensor([100, 200, 300])
+    embeddings = embedder(timesteps)
+    assert embeddings.shape == (3, time_embed_dim)
+    print(f"  形状正确: {embeddings.shape}")
+    return embedder
+def test_resnet_block():
+    """测试残差块"""
+    print("测试残差块...")
+    in_channels = 64
+    out_channels = 128
+    time_embed_dim = 256
+    block = ResNetBlock(in_channels, out_channels, time_embed_dim)
+    # 测试前向传播
+    x = torch.randn(2, in_channels, 32, 32)
+    time_emb = torch.randn(2, time_embed_dim)
+    output = block(x, time_emb)
+    assert output.shape == (2, out_channels, 32, 32)
+    print(f"  形状正确: {output.shape}")
+    # 测试跳跃连接
+    block_same = ResNetBlock(in_channels, in_channels, time_embed_dim)
+    output_same = block_same(x, time_emb)
+    assert output_same.shape == x.shape
+    print(f"  跳跃连接正确")
+    return block
+def test_attention():
+    """测试注意力机制"""
+    print("测试注意力机制...")
+    dim = 256
+    num_heads = 8
+    attention = MemoryEfficientAttention(dim, num_heads)
+    # 测试前向传播
+    x = torch.randn(2, 16, dim)  # [batch, seq_len, dim]
+    output = attention(x)
+    assert output.shape == x.shape
+    print(f"  形状正确: {output.shape}")
+    return attention
+def test_unet_light():
+    """测试轻量UNet"""
+    print("测试轻量UNet...")
+    config = {
+        'model': {
+            'in_channels': 4,
+            'out_channels': 4,
+            'base_channels': 32,  # 测试用小模型
+            'channel_mults': [1, 2, 4],
+            'num_res_blocks': 1,
+            'attention_resolutions': [8],
+            'dropout': 0.0,
+            'use_checkpoint': False,
+            'num_heads': 4,
+            'context_dim': 256,
+            'use_linear_projection': True,
+            'time_embed_dim': 128
+        }
+    }
+    model = UNetLight(config)
+    # 测试前向传播
+    batch_size = 2
+    x = torch.randn(batch_size, 4, 64, 64)
+    timesteps = torch.randint(0, 1000, (batch_size,))
+    context = torch.randn(batch_size, 77, 256)
+    output = model(x, timesteps, context)
+    assert output.shape == x.shape
+    print(f"  形状正确: {output.shape}")
+    # 测试梯度检查点
+    model.enable_gradient_checkpointing()
+    print(f"  梯度检查点已启用")
+    return model
+def test_diffusion_process():
+    """测试扩散过程"""
+    print("测试扩散过程...")
+    config = {
+        'diffusion': {
+            'beta_schedule': 'linear',
+            'beta_start': 0.0001,
+            'beta_end': 0.02,
+            'num_train_timesteps': 100,
+            'num_inference_timesteps': 20
+        }
+    }
+    diffusion = DiffusionProcess(config)
+    # 测试前向扩散
+    x_start = torch.randn(2, 3, 32, 32)
+    t = torch.randint(0, 100, (2,))
+    x_noisy = diffusion.q_sample(x_start, t)
+    assert x_noisy.shape == x_start.shape
+    print(f"  前向扩散形状正确: {x_noisy.shape}")
+    # 测试参数提取
+    extracted = diffusion.extract(diffusion.sqrt_alphas_cumprod, t, x_start.shape)
+    assert extracted.shape == (2, 1, 1, 1)
+    print(f"  参数提取形状正确: {extracted.shape}")
+    return diffusion
+def test_memory_efficiency():
+    """测试内存效率"""
+    print("测试内存效率...")
+    # 测试模型在不同批次大小下的内存使用
+    config = {
+        'model': {
+            'in_channels': 4,
+            'out_channels': 4,
+            'base_channels': 32,
+            'channel_mults': [1, 2],
+            'num_res_blocks': 1,
+            'attention_resolutions': [],
+            'dropout': 0.0,
+            'use_checkpoint': False,
+            'num_heads': 4,
+            'context_dim': 256,
+            'use_linear_projection': True,
+            'time_embed_dim': 128
+        }
+    }
+    model = UNetLight(config)
+    model.eval()
+    if torch.cuda.is_available():
+        device = torch.device('cuda')
+        model = model.to(device)
+        print("  GPU内存测试:")
+        for batch_size in [1, 2, 4]:
+            # 清空缓存
+            torch.cuda.empty_cache()
+            # 记录初始内存
+            initial_memory = torch.cuda.memory_allocated()
+            # 前向传播
+            x = torch.randn(batch_size, 4, 64, 64, device=device)
+            t = torch.randint(0, 1000, (batch_size,), device=device)
+            context = torch.randn(batch_size, 77, 256, device=device)
+            with torch.no_grad():
+                _ = model(x, t, context)
+            # 记录峰值内存
+            peak_memory = torch.cuda.max_memory_allocated()
+            memory_used = (peak_memory - initial_memory) / 1024**3  # GB
+            print(f"    批次大小 {batch_size}: {memory_used:.2f} GB")
+    else:
+        print("  GPU不可用，跳过内存测试")
+    return model
+def run_all_tests():
+    """运行所有测试"""
+    print("=" * 60)
+    print("运行Lumina基础测试")
+    print("=" * 60)
+    try:
+        # 测试各个组件
+        test_timestep_embedding()
+        test_resnet_block()
+        test_attention()
+        test_unet_light()
+        test_diffusion_process()
+        test_memory_efficiency()
+        print("\n" + "=" * 60)
+        print("所有测试通过!")
+        print("=" * 60)
+        return True
+    except Exception as e:
+        print(f"\n测试失败: {e}")
+        import traceback
+        traceback.print_exc()
+        return False
+if __name__ == "__main__":
+    success = run_all_tests()
+    sys.exit(0 if success else 1)