Instructions to use 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory", dtype="auto")

llama-cpp-python

How to use 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory",
	filename="unsloth.F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory:F16
# Run inference directly in the terminal:
llama-cli -hf 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory:F16
# Run inference directly in the terminal:
llama-cli -hf 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory:F16
# Run inference directly in the terminal:
./llama-cli -hf 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory:F16

Use Docker

docker model run hf.co/2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory:F16

LM Studio
Jan

vLLM

How to use 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory:F16

SGLang

How to use 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory with Ollama:
```
ollama run hf.co/2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory:F16
```

Unsloth Studio new

How to use 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory to start chatting

Docker Model Runner
How to use 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory with Docker Model Runner:
```
docker model run hf.co/2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory:F16
```

Lemonade

How to use 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory:F16

Run and chat with the model

lemonade run user.llama-3-8b-Instruct-bnb-4bit-personal-shortstory-F16

List all available models

lemonade list

About Model

Developed by: 2173ars
License: apache-2.0
Finetuned from model : unsloth/llama-3-8b-Instruct-bnb-4bit
The future of the entertainment industry is personalized content creation, using genAI we can create custom content for the user, like in this prototype model, it is finetuned such that it creates high-quality custom stories for the user. user can tell what storyline he wants, what genre, world setting, how the characters should be, how should the ending be, endless possibilities and get a story tailor-made for them.

in recent testing outperformed llama 3 70B parameters model in zero-shot story generation (judged by other AI's like gpt 4o, gemini pro)

System Prompt : You are a skilled and creative story writer, tasked with crafting personalized short stories for users. Your stories should meet the following criteria:1. Detailed: Incorporate rich descriptions, complex characters, and immersive settings to create a vivid and engaging narrative.2. Mature and Thoughtful Plotlines: Develop well-thought-out plots that are sophisticated and meaningful, avoiding childish themes and simplistic resolutions. 3. Emotionally Impactful: Explore a range of emotions and outcomes, including sad or bittersweet endings. Aim to evoke strong emotions and leave a lasting impression on the reader.4. Risk-taking: Embrace creative risks in your storytelling. Introduce unexpected twists, unconventional themes, and bold narrative choices. 5. Consistent and Coherent: Ensure the story is logically consistent and coherent from beginning to end, with all plot points and character actions making sense within the narrative.6. High Quality: Prioritize quality in every aspect of the story, from grammar and syntax to the originality and depth of the plot. 7. Personalized: Consider the user's preferences and interests to make the narrative as personalized and engaging as possible.Aim to captivate the reader with every story you write, delivering narratives that are immersive, meaningful, and emotionally resonant.

Sample Input : write a sifi story where there is war between mars and earth, sad ending

Output : The year was 2157, and the once peaceful solar system had been ravaged by war for decades. Mars, fueled by its desire for independence and resources, had finally declared its separation from Earth. The United Earth Government (UEG) responded with military force, seeking to maintain its control over the Martian colonies.

Captain Jaxon Thompson, a seasoned pilot from the UEG's elite squadron, was tasked with leading a crucial mission to capture Mars' strategic Jupiter-bound space station, Aurora-4. His team of skilled pilots and engineers set course for the Red Planet, braving treacherous asteroid fields and hostile alien encounters along the way.

As they approached Mars, Jaxon's ship was ambushed by a fleet of Martian fighters. In the ensuing dogfight, his wingman, Lieutenant Rachel Lee, was struck by a stray laser blast and critically wounded. With her life slipping away, Jaxon made the impossible decision to leave the battle and rush Rachel back to their flagship for medical attention.

The UEG's heavy cruiser, Celestial Dragon, was battered but still operational as Jaxon rushed Rachel to the ship's medbay. However, their triumph was short-lived, as Mars' advanced artillery, known as the "Eye of Mars," targeted the flagship. A direct hit crippled the ship, leaving it vulnerable to Martian boarding parties.

As Jaxon and his team fought to repel the invaders, they were met with a surprise - a group of Martian soldiers, led by a charismatic young officer named Arin Vex, who had defected from the Martian army due to its brutal tactics. The rebels shared intelligence on the Eye's location, revealing it was powered by an ancient alien technology that could grant unimaginable power.

With this new information, Jaxon and his team devised a daring plan: infiltrate the Eye, disable it, and use its energy to destroy Mars' fleet. However, as they entered the facility, they were met with a devastating surprise - Arin Vex was, in fact, the true leader of the Martian rebellion, driven by a desire for revenge against humanity.

The two sides clashed in a final, brutal battle. Jaxon's team managed to disable the Eye, but not before it unleashed its full power, obliterating the Celestial Dragon and leaving only a handful of survivors. As the UEG's last hopes faded, Mars' forces captured Arin, who revealed his true intentions: to harness the alien technology for his own people, enslaving humanity in the process.

The war raged on, with both sides suffering heavy losses. The once hopeful mission had become a catastrophe, leaving Jaxon and Rachel as the only remaining survivors of their squadron. As they drifted through space, they were met with a haunting thought: that the price of independence was not worth the cost in human lives.

In this desolate, war-torn future, the two former enemies gazed up at the stars, mourning the loss of their friends and the dreams of a better tomorrow. The once vibrant Earth-Mars relations had been reduced to ashes, leaving only sorrow and regret in its wake. The last remnants of humanity's hope faded into the darkness, as the galaxy whispered a haunting epitaph: "War is the ultimate cost."

Downloads last month: 124

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

4-bit

16-bit

Model tree for 2173ars/llama-3-8b-Instruct-bnb-4bit-personal-shortstory

Base model

unsloth/llama-3-8b-Instruct-bnb-4bit

Quantized

(662)

this model