Spaces:

salvinjose
/

HNTAI

Paused

App Files Files Community

HNTAI / TODO.md

sachinchandrankallar's picture

sachinchandrankallar

patient summary working

f91c303 10 months ago

|

543 Bytes

TODO: Fix GGUF Model Context Window Error and Optimize Speed

Tasks

Modify generate method in model_loader_gguf.py to dynamically adjust max_tokens based on prompt length
Tune n_threads in model initialization for maximum speed
Test the changes to ensure no breaking

Details

Approximate prompt tokens by word count (split on whitespace)
Calculate allowed max_tokens = 4000 - prompt_tokens
Reduce max_tokens if necessary, log warning
Raise error if prompt too long
Set n_threads to os.cpu_count() for speed