Transformers
GGUF
English
llama
text-generation-inference
torch
trl
unsloth
How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf student-abdullah/Llama3.1_medicine_fine-tuned_24-09_16bit_gguf:F16
# Run inference directly in the terminal:
llama-cli -hf student-abdullah/Llama3.1_medicine_fine-tuned_24-09_16bit_gguf:F16
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf student-abdullah/Llama3.1_medicine_fine-tuned_24-09_16bit_gguf:F16
# Run inference directly in the terminal:
llama-cli -hf student-abdullah/Llama3.1_medicine_fine-tuned_24-09_16bit_gguf:F16
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf student-abdullah/Llama3.1_medicine_fine-tuned_24-09_16bit_gguf:F16
# Run inference directly in the terminal:
./llama-cli -hf student-abdullah/Llama3.1_medicine_fine-tuned_24-09_16bit_gguf:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf student-abdullah/Llama3.1_medicine_fine-tuned_24-09_16bit_gguf:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf student-abdullah/Llama3.1_medicine_fine-tuned_24-09_16bit_gguf:F16
Use Docker
docker model run hf.co/student-abdullah/Llama3.1_medicine_fine-tuned_24-09_16bit_gguf:F16
Quick Links

Uploaded model

  • Developed by: student-abdullah
  • License: apache-2.0
  • Finetuned from model: meta-llama/Meta-Llama-3.1-8B
  • Created on: 25th September, 2024

Acknowledgement


Model Description

This model is fine-tuned from the meta-llama/Meta-Llama-3.1-8B base model to enhance its capabilities in generating relevant and accurate responses related to generic medications under the PMBJP scheme. The fine-tuning process included the following hyperparameters:

  • Fine Tuning Template: Llama 3.1 Q&A
  • Max Tokens: 512
  • LoRA Alpha: 10
  • LoRA Rank (r): 128
  • Learning rate: 2e-4
  • Gradient Accumulation Steps: 32
  • Batch Size: 4
  • Qunatization: 16 bits

Model Quantitative Performace

  • Training Quantitative Loss: 0.1676 (at final 160th epoch)

Limitations

  • Token Limitations: With a max token limit of 512, the model might not handle very long queries or contexts effectively.
  • Training Data Limitations: The model’s performance is contingent on the quality and coverage of the fine-tuning dataset, which may affect its generalizability to different contexts or medications not covered in the dataset.
  • Potential Biases: As with any model fine-tuned on specific data, there may be biases based on the dataset used for training.
Downloads last month
13
GGUF
Model size
8B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for student-abdullah/Llama3.1_medicine_fine-tuned_24-09_16bit_gguf

Quantized
(325)
this model

Dataset used to train student-abdullah/Llama3.1_medicine_fine-tuned_24-09_16bit_gguf