Instructions to use gpjt/1xrtx3090m24-fineweb-edu with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use gpjt/1xrtx3090m24-fineweb-edu with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="gpjt/1xrtx3090m24-fineweb-edu", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("gpjt/1xrtx3090m24-fineweb-edu", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use gpjt/1xrtx3090m24-fineweb-edu with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "gpjt/1xrtx3090m24-fineweb-edu"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gpjt/1xrtx3090m24-fineweb-edu",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/gpjt/1xrtx3090m24-fineweb-edu

SGLang

How to use gpjt/1xrtx3090m24-fineweb-edu with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "gpjt/1xrtx3090m24-fineweb-edu" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gpjt/1xrtx3090m24-fineweb-edu",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "gpjt/1xrtx3090m24-fineweb-edu" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "gpjt/1xrtx3090m24-fineweb-edu",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use gpjt/1xrtx3090m24-fineweb-edu with Docker Model Runner:
```
docker model run hf.co/gpjt/1xrtx3090m24-fineweb-edu
```

1xrtx3090m24-fineweb-edu / modeling_gpjtgpt2.py

gpjt

5.0.0 compatibility

841ab53 verified 5 months ago

Raw

History Blame Contribute Delete

1.75 kB

	import torch

	from transformers import PreTrainedModel
	from transformers.generation import GenerationMixin
	from transformers.modeling_outputs import CausalLMOutput

	from .configuration_gpjtgpt2 import GPJTGPT2Config
	from .gpt import GPTModel


	class GPJTGPT2Model(PreTrainedModel):

	config_class = GPJTGPT2Config


	def __init__(self, config):
	super().__init__(config)
	self.model = GPTModel(config.cfg)
	self.post_init()


	def forward(self, input_ids, **kwargs):
	return self.model.forward(input_ids)



	class GPJTGPT2ModelForCausalLM(PreTrainedModel, GenerationMixin):

	config_class = GPJTGPT2Config


	def __init__(self, config):
	super().__init__(config)
	self.model = GPTModel(config.cfg)
	self.post_init()


	def forward(self, input_ids, attention_mask=None, labels=None, **kwargs):
	logits = self.model.forward(input_ids)

	loss = None
	if labels is not None:
	shifted_logits = logits[:, :-1, :]
	shifted_labels = labels[:, 1:]

	if attention_mask is not None:
	shifted_mask = attention_mask[:, 1:]
	shifted_labels = shifted_labels.masked_fill(
	shifted_mask == 0, -100
	)

	loss = torch.nn.functional.cross_entropy(
	shifted_logits.flatten(0, 1), shifted_labels.flatten(),
	ignore_index=-100
	)

	return CausalLMOutput(logits=logits, loss=loss)


	def get_input_embeddings(self):
	return self.model.tok_emb


	def get_output_embeddings(self):
	return self.model.out_head


	def set_output_embeddings(self, new_embeddings):
	self.model.out_head = new_embeddings