Instructions to use yetter-ai/sage-attention with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Kernels
How to use yetter-ai/sage-attention with Kernels:
# !pip install kernels from kernels import get_kernel kernel = get_kernel("yetter-ai/sage-attention") - Notebooks
- Google Colab
- Kaggle
Remove cache files from SageAttention torch 2.11/2.12 builds
Browse files- .gitattributes +1 -12
- build/torch211-cxx11-cu130-x86_64-linux/__pycache__/__init__.cpython-312.pyc +0 -0
- build/torch211-cxx11-cu130-x86_64-linux/__pycache__/sm100_compile.cpython-312.pyc +0 -0
- build/torch211-cxx11-cu130-x86_64-linux/sageattention/__pycache__/__init__.cpython-312.pyc +0 -0
- build/torch211-cxx11-cu130-x86_64-linux/sageattention/__pycache__/core.cpython-312.pyc +0 -0
- build/torch211-cxx11-cu130-x86_64-linux/sageattention/__pycache__/quant.cpython-312.pyc +0 -0
- build/torch211-cxx11-cu130-x86_64-linux/sageattention/__pycache__/sm80_compile.cpython-312.pyc +0 -0
- build/torch211-cxx11-cu130-x86_64-linux/sageattention/__pycache__/sm89_compile.cpython-312.pyc +0 -0
- build/torch211-cxx11-cu130-x86_64-linux/sageattention/__pycache__/sm90_compile.cpython-312.pyc +0 -0
- build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/__init__.cpython-312.pyc +0 -0
- build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_block_varlen.cpython-312.pyc +0 -0
- build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_per_block.cpython-312.pyc +0 -0
- build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_per_block_causal.cpython-312.pyc +0 -0
- build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_per_block_causal_varlen.cpython-312.pyc +0 -0
- build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/quant_per_block.cpython-312.pyc +0 -0
- build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/quant_per_block_varlen.cpython-312.pyc +0 -0
- build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/quant_per_thread.cpython-312.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/__pycache__/__init__.cpython-313.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/__pycache__/sm100_compile.cpython-313.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/sageattention/__pycache__/__init__.cpython-313.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/sageattention/__pycache__/core.cpython-313.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/sageattention/__pycache__/quant.cpython-313.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/sageattention/__pycache__/sm80_compile.cpython-313.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/sageattention/__pycache__/sm89_compile.cpython-313.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/sageattention/__pycache__/sm90_compile.cpython-313.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/__init__.cpython-313.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_block_varlen.cpython-313.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_per_block.cpython-313.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_per_block_causal.cpython-313.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_per_block_causal_varlen.cpython-313.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/quant_per_block.cpython-313.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/quant_per_block_varlen.cpython-313.pyc +0 -0
- build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/quant_per_thread.cpython-313.pyc +0 -0
.gitattributes
CHANGED
|
@@ -34,15 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
build/torch211-cxx11-cu130-x86_64-linux/_sage_attention_cuda_d7d1339_dirty.abi3.so filter=lfs diff=lfs merge=lfs -text
|
| 37 |
-
build/
|
| 38 |
-
build/torch211-cxx11-cu130-x86_64-linux/fp4quant_cuda.cpython-312-x86_64-linux-gnu.so filter=lfs diff=lfs merge=lfs -text
|
| 39 |
-
build/torch211-cxx11-cu130-x86_64-linux/sageattention/_fused.cpython-312-x86_64-linux-gnu.so filter=lfs diff=lfs merge=lfs -text
|
| 40 |
-
build/torch211-cxx11-cu130-x86_64-linux/sageattention/_qattn_sm80.cpython-312-x86_64-linux-gnu.so filter=lfs diff=lfs merge=lfs -text
|
| 41 |
-
build/torch211-cxx11-cu130-x86_64-linux/sageattention/_qattn_sm89.cpython-312-x86_64-linux-gnu.so filter=lfs diff=lfs merge=lfs -text
|
| 42 |
-
build/torch211-cxx11-cu130-x86_64-linux/sageattention/_qattn_sm90.cpython-312-x86_64-linux-gnu.so filter=lfs diff=lfs merge=lfs -text
|
| 43 |
-
build/torch212-cxx11-cu130-x86_64-linux/fp4attn_cuda.cpython-313-x86_64-linux-gnu.so filter=lfs diff=lfs merge=lfs -text
|
| 44 |
-
build/torch212-cxx11-cu130-x86_64-linux/fp4quant_cuda.cpython-313-x86_64-linux-gnu.so filter=lfs diff=lfs merge=lfs -text
|
| 45 |
-
build/torch212-cxx11-cu130-x86_64-linux/sageattention/_fused.cpython-313-x86_64-linux-gnu.so filter=lfs diff=lfs merge=lfs -text
|
| 46 |
-
build/torch212-cxx11-cu130-x86_64-linux/sageattention/_qattn_sm80.cpython-313-x86_64-linux-gnu.so filter=lfs diff=lfs merge=lfs -text
|
| 47 |
-
build/torch212-cxx11-cu130-x86_64-linux/sageattention/_qattn_sm89.cpython-313-x86_64-linux-gnu.so filter=lfs diff=lfs merge=lfs -text
|
| 48 |
-
build/torch212-cxx11-cu130-x86_64-linux/sageattention/_qattn_sm90.cpython-313-x86_64-linux-gnu.so filter=lfs diff=lfs merge=lfs -text
|
|
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
build/torch211-cxx11-cu130-x86_64-linux/_sage_attention_cuda_d7d1339_dirty.abi3.so filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
build/**/*.so filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/__pycache__/__init__.cpython-312.pyc
DELETED
|
Binary file (816 Bytes)
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/__pycache__/sm100_compile.cpython-312.pyc
DELETED
|
Binary file (9.34 kB)
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/sageattention/__pycache__/__init__.cpython-312.pyc
DELETED
|
Binary file (487 Bytes)
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/sageattention/__pycache__/core.cpython-312.pyc
DELETED
|
Binary file (48.8 kB)
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/sageattention/__pycache__/quant.cpython-312.pyc
DELETED
|
Binary file (14.1 kB)
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/sageattention/__pycache__/sm80_compile.cpython-312.pyc
DELETED
|
Binary file (5.89 kB)
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/sageattention/__pycache__/sm89_compile.cpython-312.pyc
DELETED
|
Binary file (6.08 kB)
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/sageattention/__pycache__/sm90_compile.cpython-312.pyc
DELETED
|
Binary file (4.12 kB)
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/__init__.cpython-312.pyc
DELETED
|
Binary file (211 Bytes)
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_block_varlen.cpython-312.pyc
DELETED
|
Binary file (8.3 kB)
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_per_block.cpython-312.pyc
DELETED
|
Binary file (11 kB)
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_per_block_causal.cpython-312.pyc
DELETED
|
Binary file (10 kB)
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_per_block_causal_varlen.cpython-312.pyc
DELETED
|
Binary file (8.83 kB)
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/quant_per_block.cpython-312.pyc
DELETED
|
Binary file (6.11 kB)
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/quant_per_block_varlen.cpython-312.pyc
DELETED
|
Binary file (5.82 kB)
|
|
|
build/torch211-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/quant_per_thread.cpython-312.pyc
DELETED
|
Binary file (13.1 kB)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/__pycache__/__init__.cpython-313.pyc
DELETED
|
Binary file (818 Bytes)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/__pycache__/sm100_compile.cpython-313.pyc
DELETED
|
Binary file (9.14 kB)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/sageattention/__pycache__/__init__.cpython-313.pyc
DELETED
|
Binary file (487 Bytes)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/sageattention/__pycache__/core.cpython-313.pyc
DELETED
|
Binary file (47.7 kB)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/sageattention/__pycache__/quant.cpython-313.pyc
DELETED
|
Binary file (13.4 kB)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/sageattention/__pycache__/sm80_compile.cpython-313.pyc
DELETED
|
Binary file (5.78 kB)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/sageattention/__pycache__/sm89_compile.cpython-313.pyc
DELETED
|
Binary file (5.98 kB)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/sageattention/__pycache__/sm90_compile.cpython-313.pyc
DELETED
|
Binary file (4.07 kB)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/__init__.cpython-313.pyc
DELETED
|
Binary file (211 Bytes)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_block_varlen.cpython-313.pyc
DELETED
|
Binary file (8.27 kB)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_per_block.cpython-313.pyc
DELETED
|
Binary file (11 kB)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_per_block_causal.cpython-313.pyc
DELETED
|
Binary file (9.96 kB)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/attn_qk_int8_per_block_causal_varlen.cpython-313.pyc
DELETED
|
Binary file (8.79 kB)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/quant_per_block.cpython-313.pyc
DELETED
|
Binary file (6.02 kB)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/quant_per_block_varlen.cpython-313.pyc
DELETED
|
Binary file (5.73 kB)
|
|
|
build/torch212-cxx11-cu130-x86_64-linux/sageattention/triton/__pycache__/quant_per_thread.cpython-313.pyc
DELETED
|
Binary file (13 kB)
|
|
|