Instructions to use Motif-Technologies/optimizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Kernels
How to use Motif-Technologies/optimizer with Kernels:
# !pip install kernels from kernels import get_kernel kernel = get_kernel("Motif-Technologies/optimizer") - Notebooks
- Google Colab
- Kaggle
| tags: | |
| - kernels | |
| license: apache-2.0 | |
| # Optimizer | |
| Optimizer is a python package that provides: | |
| - PyTorch implementation of recent optimizer algorithms | |
| - with support for parallelism techniques for efficient large-scale training. | |
| ## Currently implemented | |
| - Parallel Muon with N-D sharding | |
| - [arxiv URL](https://arxiv.org/abs/2511.07464) | |
| - Supports **general N-D sharding configurations** | |
| - The implementation is not tied to any specific parallel strategy. | |
| - Verified from basic FSDP2 setups up to hybrid configurations such as | |
| **(2 TP + 2 DP-Replicate + 2 DP-Shard)**. | |
| - Verified configurations can be found in [test_muon.py](./test/test_muon.py) | |
| ## Usage | |
| ```python | |
| import torch | |
| from torch.distributed.fsdp import FullyShardedDataParallel as FSDP | |
| from kernels import get_kernel | |
| optimizer = get_kernel("motif-technologies/optimizer") | |
| get_default_muon_param_groups = optimizer.muon.get_default_muon_param_groups | |
| model = None # your model here | |
| fsdp_model = FSDP(model) | |
| # muon, in nature, cannot use 1-d tensor | |
| # we provide helper function to group such tensors | |
| # you can use your own function, if necessary | |
| params = get_default_muon_param_groups(model) # user can write own is_muon_func, if necessary | |
| optim = optimizer.Muon( | |
| params, | |
| lr=0.01, | |
| momentum=0.9, | |
| weight_decay=1e-4, | |
| ) | |
| ``` | |
| ## Documentation | |
| - [Implementation Guide](./docs/implementation.md) β Detailed walkthrough of the internal architecture, parallel pipeline, distributed utilities, and QK clipping. Recommended for code reviewers and new contributors. | |
| - [PyTorch 2.10 TP Fix](./docs/pytorch-2.10-tp-fix.md) β Root cause analysis and fixes for `_StridedShard` compatibility with PyTorch 2.10+. | |
| ## Test | |
| - Check [test/README.md](./test/README.md) for how to run the tests. | |
| ## Pre-commit Hooks | |
| This project uses [pre-commit](https://pre-commit.com/) to automatically check and format code before commits. | |
| ### Setup | |
| 1. Install pre-commit: | |
| ```bash | |
| pip install pre-commit | |
| ``` | |
| 2. Install the git hooks: | |
| ```bash | |
| pre-commit install | |
| ``` | |
| Once installed, the configured hooks will run automatically on each commit. | |
| ### Included Hooks | |
| The following tools are run via pre-commit: | |
| - **[yapf](https://github.com/google/yapf)** β Python code formatter | |
| - **[typos](https://github.com/crate-ci/typos)** β Spell checker for common typos | |
| - **[isort](https://github.com/PyCQA/isort)** β Organizes and sorts Python imports | |
| - **[clang-format](https://clang.llvm.org/docs/ClangFormat.html)** β Formats C++/CUDA code (`--style=file`) | |
| - **[pymarkdown](https://github.com/jackdewinter/pymarkdown)** β Lints and auto-fixes Markdown files | |
| - **[actionlint](https://github.com/rhysd/actionlint)** β Validates GitHub Actions workflows | |
| ### Usage | |
| - Run all checks on the entire codebase: | |
| ```bash | |
| pre-commit run --all-files | |
| ``` | |
| - Run a specific hook (example: isort): | |
| ```bash | |
| pre-commit run isort --all-files | |
| ``` | |
| ### Test | |
| - There is a [simple unittest for Parallel Muon](./test/test_muon/README.md) | |