Instructions to use Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16", filename="Emollm-InternLM2.5-7B-chat-GGUF-fp16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16 with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16 # Run inference directly in the terminal: llama cli -hf Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16 # Run inference directly in the terminal: llama cli -hf Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16 # Run inference directly in the terminal: ./llama-cli -hf Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16
Use Docker
docker model run hf.co/Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16
- LM Studio
- Jan
- Ollama
How to use Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16 with Ollama:
ollama run hf.co/Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16
- Unsloth Studio
How to use Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16 to start chatting
- Atomic Chat new
- Docker Model Runner
How to use Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16 with Docker Model Runner:
docker model run hf.co/Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16
- Lemonade
How to use Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16
Run and chat with the model
lemonade run user.Emollm-InternLM2.5-7B-chat-GGUF-fp16-{{QUANT_TAG}}List all available models
lemonade list
Model Details
Model Description
- Developed by: AITA
- Model type: Full-Precision Text Generation LLM (FP16 GGUF format)
- Original Model: https://modelscope.cn/models/chg0901/EmoLLMV3.0/summary
- Precision: FP16 (non-quantized full-precision version)
Repository
- GGUF Converter: llama.cpp
- Huggingface Hub: https://huggingface.co/Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16/
Usage
Method 1: llama.cpp Backend Server + Chatbox
Step 1: Start .llama.cpp Server
./llama-server \
-m /path/to/model.gguf \
-c 2048 \ # Context length
--host 0.0.0.0 \ # Allow remote connections
--port 8080 \ # Server port
--n-gpu-layers 35 # GPU acceleration (if available)
Step 2: Connect via Chatbox
- Download Chatbox
- Configure API endpoint:
API URL: http://localhost:8080 Model: (leave empty) API Type: llama.cpp - Set generation parameters:
{ "temperature": 0.7, "max_tokens": 512, "top_p": 0.9 }
Method 2: LM Studio
- Download LM Studio
- Load GGUF file:
- Launch LM Studio
- Search Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16
- Configure settings:
Context Length: 2048 GPU Offload: Recommended (enable if available) Batch Size: 512 - Start chatting through the built-in UI
Precision Details
| Filename | Precision | Size | Characteristics |
|---|---|---|---|
| emollmv3.gguf | FP16 | [15.5GB] | Full original model precision |
Hardware Requirements
Minimum:
- 24GB RAM (for 7B model)
- CPU with AVX/AVX2 instruction set support
Recommended:
- 32GB RAM
- CUDA-capable GPU (for acceleration)
- Fast SSD storage (due to large model size)
Key Notes
- Requires latest llama.cpp (v3+ recommended)
- Use
--n-gpu-layers 35for GPU acceleration (requires CUDA-enabled build) - Initial loading takes longer (2-5 minutes)
- Requires more memory/storage than quantized versions
- Use
--mlockto prevent swapping
Advantages
- Preserves original model precision
- Ideal for precision-sensitive applications
- No quantization loss
- Suitable for continued fine-tuning
Ethical Considerations
All open-source code and models in this repository are licensed under the MIT License. As the currently open-sourced EmoLLM model may have certain limitations, we hereby state the following:
EmoLLM is currently only capable of providing emotional support and related advisory services, and cannot yet offer professional psychological counseling or psychotherapy services. EmoLLM is not a substitute for qualified mental health professionals or psychotherapists, and may exhibit inherent limitations while potentially generating erroneous, harmful, offensive, or otherwise undesirable outputs. In critical or high-risk scenarios, users must exercise prudence and refrain from treating EmoLLM's outputs as definitive decision-making references, to avoid personal harm, property loss, or other significant damages.
Under no circumstances shall the authors, contributors, or copyright holders be liable for any claims, damages, or other liabilities (whether in contract, tort, or otherwise) arising from the use of or transactions related to the EmoLLM software.
By using EmoLLM, you agree to the above terms and conditions, acknowledge awareness of its potential risks, and further agree to indemnify and hold harmless the authors, contributors, and copyright holders from any claims, damages, or liabilities resulting from your use of EmoLLM.
Citation
@misc{2024EmoLLM,
title={EmoLLM: Reinventing Mental Health Support with Large Language Models},
author={EmoLLM Team},
howpublished={\url{https://github.com/SmartFlowAI/EmoLLM}},
year={2024}
}
- Downloads last month
- 22
We're not able to determine the quantization variants.
Model tree for Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16
Base model
internlm/internlm2_5-7b-chat