# TODO: Fix GGUF Model Context Window Error and Optimize Speed ## Tasks - [x] Modify generate method in model_loader_gguf.py to dynamically adjust max_tokens based on prompt length - [x] Tune n_threads in model initialization for maximum speed - [ ] Test the changes to ensure no breaking ## Details - Approximate prompt tokens by word count (split on whitespace) - Calculate allowed max_tokens = 4000 - prompt_tokens - Reduce max_tokens if necessary, log warning - Raise error if prompt too long - Set n_threads to os.cpu_count() for speed