Instructions to use AGofficial/minai-flash-lite-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use AGofficial/minai-flash-lite-1b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="AGofficial/minai-flash-lite-1b", filename="minai-flash-lite-1b.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use AGofficial/minai-flash-lite-1b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AGofficial/minai-flash-lite-1b # Run inference directly in the terminal: llama-cli -hf AGofficial/minai-flash-lite-1b
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AGofficial/minai-flash-lite-1b # Run inference directly in the terminal: llama-cli -hf AGofficial/minai-flash-lite-1b
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf AGofficial/minai-flash-lite-1b # Run inference directly in the terminal: ./llama-cli -hf AGofficial/minai-flash-lite-1b
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf AGofficial/minai-flash-lite-1b # Run inference directly in the terminal: ./build/bin/llama-cli -hf AGofficial/minai-flash-lite-1b
Use Docker
docker model run hf.co/AGofficial/minai-flash-lite-1b
- LM Studio
- Jan
- Ollama
How to use AGofficial/minai-flash-lite-1b with Ollama:
ollama run hf.co/AGofficial/minai-flash-lite-1b
- Unsloth Studio new
How to use AGofficial/minai-flash-lite-1b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AGofficial/minai-flash-lite-1b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AGofficial/minai-flash-lite-1b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AGofficial/minai-flash-lite-1b to start chatting
- Docker Model Runner
How to use AGofficial/minai-flash-lite-1b with Docker Model Runner:
docker model run hf.co/AGofficial/minai-flash-lite-1b
- Lemonade
How to use AGofficial/minai-flash-lite-1b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull AGofficial/minai-flash-lite-1b
Run and chat with the model
lemonade run user.minai-flash-lite-1b-{{QUANT_TAG}}List all available models
lemonade list
Minai Flash Lite 1B
A lightweight, locally-runnable conversational AI packaged for instant use.
What is Minai Flash Lite 1B?
Minai Flash Lite 1B is a ready-to-run GGUF model that delivers fast, capable conversational AI on consumer hardware with just a single file and a Python script.
Designed to run entirely offline on your local machine β no cloud, no API keys, no data leaving your device.
Contents
| File | Description |
|---|---|
minai-flash-lite-1b.gguf |
The model in GGUF format (float16). ~2 GB. |
chat.py |
Interactive CLI chat script with streaming output. |
README.md |
This file. |
Requirements
- Python 3.10+
llama-cpp-pythonwith Metal support (for Apple Silicon GPU acceleration)
Install llama-cpp-python with Metal (Apple Silicon)
CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python
Linux with NVIDIA GPU: Use
CMAKE_ARGS="-DGGML_CUDA=on"instead. CPU-only: Just runpip install llama-cpp-python.
Run the Chat
python3 chat.py
You'll see a styled terminal interface. Start chatting immediately.
Chat commands:
/resetβ Clear the conversation history/exitβ Quit the chat
Running on Other Platforms
The GGUF file is compatible with any llama.cpp-based runtime:
- Downloads last month
- 143
We're not able to determine the quantization variants.