Instructions to use qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF", dtype="auto") - llama-cpp-python
How to use qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF", filename="replete-coder-qwen2-1.5b-bf16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF:Q4_K_M
Use Docker
docker model run hf.co/qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF with Ollama:
ollama run hf.co/qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF:Q4_K_M
- Unsloth Studio
How to use qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF with Docker Model Runner:
docker model run hf.co/qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF:Q4_K_M
- Lemonade
How to use qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull qwp4w3hyb/Replete-Coder-Qwen2-1.5b-iMat-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Replete-Coder-Qwen2-1.5b-iMat-GGUF-Q4_K_M
List all available models
lemonade list
| license: apache-2.0 | |
| base_model: Replete-AI/Replete-Coder-Qwen2-1.5b | |
| tags: | |
| - text-generation-inference | |
| - transformers | |
| - unsloth | |
| - qwen2 | |
| datasets: | |
| - Replete-AI/code_bagel_hermes-2.5 | |
| - Replete-AI/code_bagel | |
| - Replete-AI/OpenHermes-2.5-Uncensored | |
| - teknium/OpenHermes-2.5 | |
| - layoric/tiny-codes-alpaca | |
| - glaiveai/glaive-code-assistant-v3 | |
| - ajibawa-2023/Code-290k-ShareGPT | |
| - TIGER-Lab/MathInstruct | |
| - chargoddard/commitpack-ft-instruct-rated | |
| - iamturun/code_instructions_120k_alpaca | |
| - ise-uiuc/Magicoder-Evol-Instruct-110K | |
| - cognitivecomputations/dolphin-coder | |
| - nickrosh/Evol-Instruct-Code-80k-v1 | |
| - coseal/CodeUltraFeedback_binarized | |
| - glaiveai/glaive-function-calling-v2 | |
| - CyberNative/Code_Vulnerability_Security_DPO | |
| - jondurbin/airoboros-2.2 | |
| - camel-ai | |
| - lmsys/lmsys-chat-1m | |
| - CollectiveCognition/chats-data-2023-09-22 | |
| - CoT-Alpaca-GPT4 | |
| - WizardLM/WizardLM_evol_instruct_70k | |
| - WizardLM/WizardLM_evol_instruct_V2_196k | |
| - teknium/GPT4-LLM-Cleaned | |
| - GPTeacher | |
| - OpenGPT | |
| - meta-math/MetaMathQA | |
| - Open-Orca/SlimOrca | |
| - garage-bAInd/Open-Platypus | |
| - anon8231489123/ShareGPT_Vicuna_unfiltered | |
| - Unnatural-Instructions-GPT4 | |
| # Quant Infos | |
| - quants done with an importance matrix for improved quantization loss | |
| - ggufs & imatrix generated from bf16 for "optimal" accuracy loss | |
| - Wide coverage of different gguf quant types from Q\_8\_0 down to IQ1\_S | |
| - Quantized with [llama.cpp](https://github.com/ggerganov/llama.cpp) commit [4bfe50f741479c1df1c377260c3ff5702586719e](https://github.com/ggerganov/llama.cpp/commit/4bfe50f741479c1df1c377260c3ff5702586719e) (master as of 2024-06-11) | |
| - Imatrix generated with [this](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8) multi-purpose dataset by [bartowski](https://huggingface.co/bartowski). | |
| ``` | |
| ./imatrix -c 512 -m $model_name-bf16.gguf -f calibration_datav3.txt -o $model_name.imatrix | |
| ``` | |
| # Original Model Card: | |
| # Replete-Coder-Qwen2-1.5b | |
| Finetuned by: Rombodawg | |
| ### More than just a coding model! | |
| Although Replete-Coder has amazing coding capabilities, its trained on vaste amount of non-coding data, fully cleaned and uncensored. Dont just use it for coding, use it for all your needs! We are truly trying to make the GPT killer! | |
|  | |
| Thank you to TensorDock for sponsoring Replete-Coder-llama3-8b and Replete-Coder-Qwen2-1.5b | |
| you can check out their website for cloud compute rental bellow. | |
| - https://tensordock.com | |
| __________________________________________________________________________________________________ | |
| Replete-Coder-Qwen2-1.5b is a general purpose model that is specially trained in coding in over 100 coding languages. The data used to train the model contains 25% non-code instruction data and 75% coding instruction data totaling up to 3.9 million lines, roughly 1 billion tokens, or 7.27gb of instruct data. The data used to train this model was 100% uncensored, then fully deduplicated, before training happened. | |
| The Replete-Coder models (including Replete-Coder-llama3-8b and Replete-Coder-Qwen2-1.5b) feature the following: | |
| - Advanced coding capabilities in over 100 coding languages | |
| - Advanced code translation (between languages) | |
| - Security and vulnerability prevention related coding capabilities | |
| - General purpose use | |
| - Uncensored use | |
| - Function calling | |
| - Advanced math use | |
| - Use on low end (8b) and mobile (1.5b) platforms | |
| Notice: Replete-Coder series of models are fine-tuned on a context window of 8192 tokens. Performance past this context window is not guaranteed. | |
|  | |
| __________________________________________________________________________________________________ | |
| You can find the 25% non-coding instruction below: | |
| - https://huggingface.co/datasets/Replete-AI/OpenHermes-2.5-Uncensored | |
| And the 75% coding specific instruction data below: | |
| - https://huggingface.co/datasets/Replete-AI/code_bagel | |
| These two datasets were combined to create the final dataset for training, which is linked below: | |
| - https://huggingface.co/datasets/Replete-AI/code_bagel_hermes-2.5 | |
| __________________________________________________________________________________________________ | |
| ## Prompt Template: ChatML | |
| ``` | |
| <|im_start|>system | |
| {}<|im_end|> | |
| <|im_start|>user | |
| {}<|im_end|> | |
| <|im_start|>assistant | |
| {} | |
| ``` | |
| Note: The system prompt varies in training data, but the most commonly used one is: | |
| ``` | |
| Below is an instruction that describes a task, Write a response that appropriately completes the request. | |
| ``` | |
| End token: | |
| ``` | |
| <|endoftext|> | |
| ``` | |
| __________________________________________________________________________________________________ | |
| Thank you to the community for your contributions to the Replete-AI/code_bagel_hermes-2.5 dataset. Without the participation of so many members making their datasets free and open source for any to use, this amazing AI model wouldn't be possible. | |
| Extra special thanks to Teknium for the Open-Hermes-2.5 dataset and jondurbin for the bagel dataset and the naming idea for the code_bagel series of datasets. You can find both of their huggingface accounts linked below: | |
| - https://huggingface.co/teknium | |
| - https://huggingface.co/jondurbin | |
| Another special thanks to unsloth for being the main method of training for Replete-Coder. Bellow you can find their github, as well as the special Replete-Ai secret sause (Unsloth + Qlora + Galore) colab code document that was used to train this model. | |
| - https://github.com/unslothai/unsloth | |
| - https://colab.research.google.com/drive/1eXGqy5M--0yW4u0uRnmNgBka-tDk2Li0?usp=sharing | |
| __________________________________________________________________________________________________ | |
| ## Join the Replete-Ai discord! We are a great and Loving community! | |
| - https://discord.gg/ZZbnsmVnjD | |