yankexe commited on
Commit
ec745b4
·
1 Parent(s): 930959e

Add air-gapped usage for embedding using vLLM 0.20.0 and above

Browse files
Files changed (1) hide show
  1. README.md +15 -1
README.md CHANGED
@@ -4650,6 +4650,20 @@ The **gte-multilingual-base** model is the latest in the [GTE](https://huggingfa
4650
  - Embedding Dimension: 768
4651
  - Max Input Tokens: 8192
4652
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4653
 
4654
  ## Usage
4655
 
@@ -4851,4 +4865,4 @@ If you find our paper or models helpful, please consider cite:
4851
  pages={1393--1412},
4852
  year={2024}
4853
  }
4854
- ```
 
4650
  - Embedding Dimension: 768
4651
  - Max Input Tokens: 8192
4652
 
4653
+ ## Usage Air-Gapped
4654
+
4655
+ > Tested with **vLLM 0.20.0** and above.
4656
+
4657
+ - Set `HF_HOME` to a temp directory, eg: `/tmp/hf`
4658
+ - **OR** set `HF_MODULES_CACHE` to `/tmp/hf-modules-cache`. (we need this to stage `configuration.py` and `modeling.py` from the root dir [Reference](https://github.com/huggingface/transformers/blob/4e9f6fc67ce6290b3ab6efe2ddb1fcfc3e554382/src/transformers/dynamic_module_utils.py#L35-L47))
4659
+ - Set env var `HF_HUB_OFFLINE=1`
4660
+
4661
+ ```bash
4662
+ vllm serve yankexe/gte-multilingual-base-air-gapped \
4663
+ --hf-overrides='{"architectures":["GteNewModel"]}' \
4664
+ --runner=pooling \
4665
+ --trust-remote-code
4666
+ ```
4667
 
4668
  ## Usage
4669
 
 
4865
  pages={1393--1412},
4866
  year={2024}
4867
  }
4868
+ ```