Text Generation
Transformers
PyTorch
Safetensors
English
falcon
custom_code
text-generation-inference
Instructions to use tiiuae/falcon-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tiiuae/falcon-7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="tiiuae/falcon-7b", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("tiiuae/falcon-7b", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use tiiuae/falcon-7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "tiiuae/falcon-7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/tiiuae/falcon-7b
- SGLang
How to use tiiuae/falcon-7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "tiiuae/falcon-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "tiiuae/falcon-7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "tiiuae/falcon-7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use tiiuae/falcon-7b with Docker Model Runner:
docker model run hf.co/tiiuae/falcon-7b
Commit History
enh: add link to updated model 545fd35 verified
Move to in-library checkpoint (for real this time) (#84) 898df13
Update README.md f779652
Add recommendations for inference and finetuning 7327008
Update citation info a0395a0
Update metadata f6d4744
Remove TII Falcon LLM license da8d49a
Update license info to Apache 2.0 21be553
license: tii-falcon-llm 87c4c0d
Update config.json f3185ff
Update modelling_RW.py 1ba2370
Add PyTorch 2.0 notice 308ced2
Add link to RefinedWeb 650217d
Fix typo c1a49e6
Update model card 591607b
Create README.md 555b780
Create LICENSE.txt 15f1e0e
Update modelling_RW.py 552521c
Daniel Hesslow commited on
Upload tokenizer b65214c
Daniel Hesslow commited on
Update modelling_RW.py 854fab6
Daniel Hesslow commited on
Upload RWForCausalLM 9562025
Daniel Hesslow commited on
Upload RWForCausalLM 80b61a6
Daniel Hesslow commited on