Instructions to use togethercomputer/GPT-JT-6B-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use togethercomputer/GPT-JT-6B-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="togethercomputer/GPT-JT-6B-v1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("togethercomputer/GPT-JT-6B-v1") model = AutoModelForCausalLM.from_pretrained("togethercomputer/GPT-JT-6B-v1") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use togethercomputer/GPT-JT-6B-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "togethercomputer/GPT-JT-6B-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "togethercomputer/GPT-JT-6B-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/togethercomputer/GPT-JT-6B-v1
- SGLang
How to use togethercomputer/GPT-JT-6B-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "togethercomputer/GPT-JT-6B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "togethercomputer/GPT-JT-6B-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "togethercomputer/GPT-JT-6B-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "togethercomputer/GPT-JT-6B-v1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use togethercomputer/GPT-JT-6B-v1 with Docker Model Runner:
docker model run hf.co/togethercomputer/GPT-JT-6B-v1
Do GPT-JT-6B-v1 model has the ability of follow up questions like CHATGPT
ChatGPT : The dialogue format makes it possible for ChatGPT to answer followup questions
It is aware what we had asked earlier and how to use that undertanding in present question , So such activity is supported by GPT-JT-6B-v1.
Please Guide
This isn't the prettiest way of doing it per-se, but a lot of chatbots are made by effectively tacking on the previous prompt / response into your next generation.
Example:
Input: Hey, how are you today?
Response: Very well thank you, what about you?
###
Input: I am great.
Response: What are you going to do?
###
Input: Most likely read a couple of book and relax.
Response: Fantastic!
Here, each ### is the end token, and they just truncate the chat history when it gets too big for the model.
This example was taken from the following article, which also explains more in-depth how they built a chatbot using GPT: https://nlpcloud.com/build-gpt-j-gpt-neox-discord-chatbot-with-nlpcloud.html