Minai Flash Lite 1B

A lightweight, locally-runnable conversational AI packaged for instant use.


What is Minai Flash Lite 1B?

Minai Flash Lite 1B is a ready-to-run GGUF model that delivers fast, capable conversational AI on consumer hardware with just a single file and a Python script.

Designed to run entirely offline on your local machine β€” no cloud, no API keys, no data leaving your device.


Contents

File Description
minai-flash-lite-1b.gguf The model in GGUF format (float16). ~2 GB.
chat.py Interactive CLI chat script with streaming output.
README.md This file.

Requirements

  • Python 3.10+
  • llama-cpp-python with Metal support (for Apple Silicon GPU acceleration)

Install llama-cpp-python with Metal (Apple Silicon)

CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python

Linux with NVIDIA GPU: Use CMAKE_ARGS="-DGGML_CUDA=on" instead. CPU-only: Just run pip install llama-cpp-python.


Run the Chat

python3 chat.py

You'll see a styled terminal interface. Start chatting immediately.

Chat commands:

  • /reset β€” Clear the conversation history
  • /exit β€” Quit the chat

Running on Other Platforms

The GGUF file is compatible with any llama.cpp-based runtime:

  • LM Studio β€” Drop the .gguf file in and chat via GUI.
  • Ollama β€” Import the GGUF and run via CLI.
  • Jan β€” Desktop app for local LLMs.
  • llama.cpp β€” Low-level CLI inference. jeminai
Downloads last month
143
GGUF
Model size
1.0B params
Architecture
gemma3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support