Instructions to use raicrits/OpenLLama13b_Loquace_ITA with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use raicrits/OpenLLama13b_Loquace_ITA with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="raicrits/OpenLLama13b_Loquace_ITA")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("raicrits/OpenLLama13b_Loquace_ITA", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use raicrits/OpenLLama13b_Loquace_ITA with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "raicrits/OpenLLama13b_Loquace_ITA"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raicrits/OpenLLama13b_Loquace_ITA",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/raicrits/OpenLLama13b_Loquace_ITA

SGLang

How to use raicrits/OpenLLama13b_Loquace_ITA with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "raicrits/OpenLLama13b_Loquace_ITA" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raicrits/OpenLLama13b_Loquace_ITA",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "raicrits/OpenLLama13b_Loquace_ITA" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raicrits/OpenLLama13b_Loquace_ITA",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use raicrits/OpenLLama13b_Loquace_ITA with Docker Model Runner:
```
docker model run hf.co/raicrits/OpenLLama13b_Loquace_ITA
```

stefanoscotta commited on Jul 25, 2023

Commit

ab03ec0

1 Parent(s): 8be35b4

Update README.md

Browse files

Files changed (1) hide show

README.md +69 -60

README.md CHANGED Viewed

@@ -31,42 +31,92 @@ This repository contains the model merged with the LoRA adapters obtained in the
 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
@@ -108,8 +158,6 @@ The fine-tuning procedure was done using [LoRA](https://arxiv.org/pdf/2106.09685
-[More Information Needed]
 ## Environmental Impact
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
@@ -121,47 +169,8 @@ Carbon emissions can be estimated using the [Machine Learning Impact calculator]
 - **Cloud Provider:** Private Infrastructure
 - **Carbon Emitted:** 7.34 kg eq. CO2
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
 Stefano Scotta (stefano.scotta@rai.it)

 ## Uses
 <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+The model can be used as is to respond to simple instructions in Italian or can be further fine-tuned to perform specific tasks.
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+As any other LLM it is possible that the model generates content which does not correspond to the reality as well as wrong, biased, offensive and inappropriate answers.
+## How to Get Started with the Model
+ **Prompt template:**
+ ``` python
+"Di seguito è riportata un'istruzione che descrive un compito, abbinata a un input che fornisce un ulteriore contesto. Scrivete una risposta che completi in modo appropriato la richiesta.
+### Istruzione:
+{instruction}
+### Input:
+{input}
+### Risposta:"
+```
+ **Usage:**
+Use the code below to get started with the model.
+ ``` python
+import os
+import torch
+import sys
+from transformers import LlamaTokenizer, LlamaForCausalLM
+if torch.cuda.is_available():
+    device = "cuda"
+else:
+    device = "cpu"
+def generate_prompt(instruction, input=None):
+    if input:
+        return f"""Di seguito è riportata un'istruzione che descrive un compito, abbinata a un input che fornisce un ulteriore contesto. Scrivete una risposta che completi in modo appropriato la richiesta.
+### Istruzione:
+{instruction}
+### Input:
+{input}
+### Risposta:"""
+    else:
+        return f"""Di seguito è riportata un'istruzione che descrive un compito. Scrivete una risposta che completi in modo appropriato la richiesta..
+### Istruzione:
+{instruction}
+### Risposta:"""
+model_name = "raicrits/OpenLLama13b_Loquace_ITA"
+model = LlamaForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.float16,
+    device_map="auto"
+)
+tokenizer = LlamaTokenizer.from_pretrained(model_name)
+instruction = "qual'è la relazione tra i seguenti oggetti"
+input = "sedia, tavolo, divano"
+prompt = generate_prompt("instruction", input)
+inputs = tokenizer(prompt, return_tensors="pt")
+input_ids = inputs["input_ids"].to(device)
+generation_output = model.generate(
+    input_ids=input_ids,
+    max_new_tokens=256,
+)
+output = tokenizer.decode(generation_output[0])
+output = output.split("### Risposta:")[1].strip().replace("</s>","")
+print(output)
+```
+``` python
+"Sedia, tavolo e divano sono tutti oggetti che possono essere utilizzati per creare un'atmosfera rilassante in una stanza."
+```
 ## Training Details
 ## Environmental Impact
 <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 - **Cloud Provider:** Private Infrastructure
 - **Carbon Emitted:** 7.34 kg eq. CO2
+## Model Card Authors
 Stefano Scotta (stefano.scotta@rai.it)