Instructions to use sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0") model = AutoModelForMultimodalLM.from_pretrained("sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0
- SGLang
How to use sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0 with Docker Model Runner:
docker model run hf.co/sreeramajay/TinyLlama-1.1B-step-1431k-orca-dpo-v1.0
| license: apache-2.0 | |
| datasets: | |
| - Intel/orca_dpo_pairs | |
| language: | |
| - en | |
| metrics: | |
| - accuracy | |
| pipeline_tag: text-generation | |
| Applied DPO to TinyLlama-1.1B-intermediate-step-1431k-3T using orca_dpo_pairs dataset | |
| This is only experimental Model, Created by following instruction from the nice Blog [Fine-tune a Mistral-7b model with Direct Preference Optimization | |
| ](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac) | |
| You can run this model using the following code: | |
| ```python | |
| # Format prompt | |
| message = [ | |
| {"role": "system", "content": "You are a helpful assistant chatbot."}, | |
| {"role": "user", "content": "What is a Large Language Model?"} | |
| ] | |
| tokenizer = AutoTokenizer.from_pretrained(new_model) | |
| prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False) | |
| # Create pipeline | |
| pipeline = transformers.pipeline( | |
| "text-generation", | |
| model=new_model, | |
| tokenizer=tokenizer | |
| ) | |
| # Generate text | |
| sequences = pipeline( | |
| prompt, | |
| do_sample=True, | |
| temperature=0.7, | |
| top_p=0.9, | |
| num_return_sequences=1, | |
| max_length=200, | |
| ) | |
| print(sequences[0]['generated_text']) | |
| # <s>[INST] <<SYS>> | |
| # You are a helpful assistant chatbot. | |
| # <</SYS>> | |
| # | |
| # What is a Large Language Model? [/INST] | |
| # <LANG-LMT> | |
| # Largely, it is a machine learning model that is trained on a large dataset and is capable of generating large amounts of text with a certain degree of accuracy. | |
| # | |
| # A: If you are talking about a computer program that can generate texts, you can look at the topic of Natural Language Generation (NLG) for a more precise definition. | |
| # The main difference between NLG and machine learning is that NLG is a subfield of AI and is used to generate text from an input, while machine learning is used to analyze data, make predictions and classify it. | |
| ``` | |
| Results on GPT4ALL benchmark: | |
| | Tasks | Metric |Value | |Stderr| | |
| |-------------|--------|-----:|---|-----:| | |
| |arc_challenge|acc |0.2807|± |0.0131| | |
| | |acc_norm|0.3106|± |0.0135| | |
| |arc_easy |acc |0.6107|± |0.0100| | |
| | |acc_norm|0.5547|± |0.0102| | |
| |boolq |acc |0.5865|± |0.0086| | |
| |hellaswag |acc |0.4478|± |0.0050| | |
| | |acc_norm|0.5924|± |0.0049| | |
| |openbookqa |acc |0.2160|± |0.0184| | |
| | |acc_norm|0.3600|± |0.0215| | |
| |piqa |acc |0.7280|± |0.0104| | |
| | |acc_norm|0.7301|± |0.0104| | |
| |winogrande |acc |0.5856|± |0.0138| |