Instructions to use Mesutby/mistral-7B-wikitext-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Mesutby/mistral-7B-wikitext-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Mesutby/mistral-7B-wikitext-finetuned")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Mesutby/mistral-7B-wikitext-finetuned", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Mesutby/mistral-7B-wikitext-finetuned with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Mesutby/mistral-7B-wikitext-finetuned" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Mesutby/mistral-7B-wikitext-finetuned", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Mesutby/mistral-7B-wikitext-finetuned
- SGLang
How to use Mesutby/mistral-7B-wikitext-finetuned with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Mesutby/mistral-7B-wikitext-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Mesutby/mistral-7B-wikitext-finetuned", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Mesutby/mistral-7B-wikitext-finetuned" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Mesutby/mistral-7B-wikitext-finetuned", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Mesutby/mistral-7B-wikitext-finetuned with Docker Model Runner:
docker model run hf.co/Mesutby/mistral-7B-wikitext-finetuned
Mistral-7B-WikiFineTuned
This project involves fine-tuning the Mistral-7B-Instruct model using the Wikipedia dataset. The goal is to create a model that provides accurate and informative text generation with a coherent and well-structured language output.
Model Description
- Base Model: Mistral-7B
- Fine-Tuned on: Wikitext-103-raw-v1
- Purpose: The model is designed to offer the maximum amount of information with the shortest training time, aiming to provide accurate and informative content while maintaining a coherent and well-structured language output.
- License: MIT
How to Use
To use this model, you can load it with the Hugging Face transformers library. Below is a basic example of how to use the model for text generation:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("Mesutby/mistral-7B-wikitext-finetuned")
# Load the model
model = AutoModelForCausalLM.from_pretrained("Mesutby/mistral-7B-wikitext-finetuned",
device_map="auto",
load_in_4bit=True)
# Create the pipeline
generator = pipeline("text-generation", model=model, tokenizer=tokenizer)
# Generate text
prompt = "The future of AI is"
output = generator(prompt, max_new_tokens=50)
print(output[0]['generated_text'])
Inference API
You can also use the model directly via the Hugging Face Inference API:
import requests
API_URL = "https://api-inference.huggingface.co/models/Mesutby/mistral-7B-wikitext-finetuned"
headers = {"Authorization": f"Bearer YOUR_HF_TOKEN"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({"inputs": "The future of AI is"})
print(output)
Training Details
- Framework Used: PyTorch
- Optimization Techniques:
- 4-bit quantization using
bitsandbytesto reduce memory usage. - Training accelerated using
peftandaccelerate.
- 4-bit quantization using
Dataset
The model was fine-tuned on the Wikitext-103-raw-v1 dataset, split into training and evaluation subsets.
Training Configuration
- Learning Rate: 2e-4
- Batch Size: 4 (with gradient accumulation)
- Max Steps: 125 (for demonstration; should ideally be higher, e.g., 1000)
- Optimizer: Paged AdamW (32-bit)
- Evaluation Strategy: Evaluation every 25 steps
- PEFT Configuration: LoRA with 8 ranks and dropout of 0.1
Hyperparameters
- Learning Rate: 2e-4
- Batch Size: 4
- Max Steps: 125 (demo)
Evaluation
The model was evaluated on a subset of the Wikitext dataset. Detailed evaluation metrics can be observed during training.
Limitations and Biases
While the model performs well on a variety of text generation tasks, it may still exhibit biases present in the training data. Users should be cautious when deploying this model in sensitive or high-stakes applications.
License
This model is licensed under the MIT License. See the LICENSE file for more details.
Contact
For any questions or issues, please contact bymuhammedmesut@gmail.com.
- Downloads last month
- -