Instructions to use Plaban81/gemma-medical_qa-Finetune with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Plaban81/gemma-medical_qa-Finetune with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Plaban81/gemma-medical_qa-Finetune")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Plaban81/gemma-medical_qa-Finetune")
model = AutoModelForCausalLM.from_pretrained("Plaban81/gemma-medical_qa-Finetune")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Plaban81/gemma-medical_qa-Finetune with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Plaban81/gemma-medical_qa-Finetune"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Plaban81/gemma-medical_qa-Finetune",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Plaban81/gemma-medical_qa-Finetune

SGLang

How to use Plaban81/gemma-medical_qa-Finetune with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Plaban81/gemma-medical_qa-Finetune" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Plaban81/gemma-medical_qa-Finetune",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Plaban81/gemma-medical_qa-Finetune" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Plaban81/gemma-medical_qa-Finetune",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Plaban81/gemma-medical_qa-Finetune with Docker Model Runner:
```
docker model run hf.co/Plaban81/gemma-medical_qa-Finetune
```

Inference API

by singhjagpreet - opened Apr 5, 2024

Discussion

singhjagpreet

Apr 5, 2024

Hi,

I have uploaded my text to sql finetuned gemma-2b on hugging face hub, I am trying to create a Inference API for my model, I am facing the error:Could not load model singhjagpreet/gemma-2b_text_to_sql with any of the following classes: (<class 'transformers.models.gemma.modeling_gemma.GemmaForCausalLM'>,). See the original errors: while loading with GemmaForCausalLM, an error is thrown: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1238, in hf_hub_download metadata = get_hf_file_metadata( ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1631, in get_hf_file_metadata r = _request_wrapper( ^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 385, in _request_wrapper response = _request_wrapper( ^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 408, in _request_wrapper response = get_session().request(method=method, url=url, **params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 587, in request resp = self.send(prep, **send_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/requests/sessions.py", line 701, in send r = adapter.send(request, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/huggingface_hub/utils/_http.py", line 78, in send raise OfflineModeIsEnabled( huggingface_hub.utils._http.OfflineModeIsEnabled: Cannot reach https://huggingface.co/google/gemma-2b/resolve/4a94259823ea6b4ef94d21ae0a9c64ce68da53b4/config.json: offline mode is enabled. To disable it, please unset the HF_HUB_OFFLINE environment variable. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/src/transformers/src/transformers/utils/hub.py", line 398, in cached_file resolved_file = hf_hub_download( ^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1371, in hf_hub_download raise LocalEntryNotFoundError( huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/src/transformers/src/transformers/pipelines/base.py", line 279, in infer_framework_load_model model = model_class.from_pretrained(model, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/src/transformers/src/transformers/modeling_utils.py", line 2983, in from_pretrained config, model_kwargs = cls.config_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/src/transformers/src/transformers/configuration_utils.py", line 602, in from_pretrained config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/src/transformers/src/transformers/configuration_utils.py", line 631, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/src/transformers/src/transformers/configuration_utils.py", line 686, in _get_config_dict resolved_config_file = cached_file( ^^^^^^^^^^^^ File "/src/transformers/src/transformers/utils/hub.py", line 441, in cached_file raise EnvironmentError( OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like google/gemma-2b is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

even for your model I can not see the inference working normally. Could you please let me if you have any information.

Plaban81

Owner Apr 6, 2024

•

edited Apr 6, 2024

Code used for inferencing

!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.0

from peft import LoraConfig,PeftModel,AutoPeftModelForCausalLM
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

set the LoRA configurations

peft_config =LoraConfig(
r=64,
lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
#
peft_model_id = "Plaban81/gemma-medical_qa-Finetune"
config = peft_config.from_pretrained(peft_model_id)
#
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,
return_dict=True,
load_in_4bit=True,
device_map="auto",
)

ptokenizer= AutoTokenizer.from_pretrained(peft_model_id)
#
def get_completion(query: str, model, tokenizer) -> str:
device = "cuda:0"

prompt_template = """
user
Below is an instruction that describes a task. Write a response that appropriately completes the request.
{query}
\nmodel

"""
prompt = prompt_template.format(query=query)

encodeds = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)

model_inputs = encodeds.to(device)
#
generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
#
decoded = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
return (decoded)
#
query = """Please answer with one of the option in the bracket. Write reasoning in between . Write answer in between .here are the inputs:Q:A 34-year-old man presents to a clinic with complaints of abdominal discomfort and blood in the urine for 2 days. He has had similar abdominal discomfort during the past 5 years, although he does not remember passing blood in the urine. He has had hypertension for the past 2 years, for which he has been prescribed medication. There is no history of weight loss, skin rashes, joint pain, vomiting, change in bowel habits, and smoking. On physical examination, there are ballotable flank masses bilaterally. The bowel sounds are normal. Renal function tests are as follows:\nUrea 50 mg/dL\nCreatinine 1.4 mg/dL\nProtein Negative\nRBC Numerous\nThe patient underwent ultrasonography of the abdomen, which revealed enlarged kidneys and multiple anechoic cysts with well-defined walls. A CT scan confirmed the presence of multiple cysts in the kidneys. What is the most likely diagnosis?? \n{'A': 'Autosomal dominant polycystic kidney disease (ADPKD)', 'B': 'Autosomal recessive polycystic kidney disease (ARPKD)', 'C': 'Medullary cystic disease', 'D': 'Simple renal cysts', 'E': 'Acquired cystic kidney disease'}"""
result = get_completion(query=query, model=model, tokenizer=ptokenizer)
print(f"Model Answer : \n {result.split('model')[-1]}")

singhjagpreet

Apr 6, 2024

Thanks for the code.

Inference is working fine for my model offline in a google colab.

I am specifically talking about Inference API at hugging face model card.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment