Instructions to use h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3

SGLang

How to use h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3 with Docker Model Runner:
```
docker model run hf.co/h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3
```

h2ogpt-gm-oasst1-en-2048-falcon-7b-v3

File size: 1,309 Bytes

cea8438

from transformers import TextGenerationPipeline
from transformers.pipelines.text_generation import ReturnType

STYLE = "<|prompt|>{instruction}<|endoftext|><|answer|>"


class H2OTextGenerationPipeline(TextGenerationPipeline):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.prompt = STYLE

    def preprocess(
        self, prompt_text, prefix="", handle_long_generation=None, **generate_kwargs
    ):
        prompt_text = self.prompt.format(instruction=prompt_text)
        return super().preprocess(
            prompt_text,
            prefix=prefix,
            handle_long_generation=handle_long_generation,
            **generate_kwargs,
        )

    def postprocess(
        self,
        model_outputs,
        return_type=ReturnType.FULL_TEXT,
        clean_up_tokenization_spaces=True,
    ):
        records = super().postprocess(
            model_outputs,
            return_type=return_type,
            clean_up_tokenization_spaces=clean_up_tokenization_spaces,
        )
        for rec in records:
            rec["generated_text"] = (
                rec["generated_text"]
                .split("<|answer|>")[1]
                .strip()
                .split("<|prompt|>")[0]
                .strip()
            )
        return records