# TODO: Fix GGUF Model Context Window Error and Optimize Speed

## Tasks
- [x] Modify generate method in model_loader_gguf.py to dynamically adjust max_tokens based on prompt length
- [x] Tune n_threads in model initialization for maximum speed
- [ ] Test the changes to ensure no breaking

## Details
- Approximate prompt tokens by word count (split on whitespace)
- Calculate allowed max_tokens = 4000 - prompt_tokens
- Reduce max_tokens if necessary, log warning
- Raise error if prompt too long
- Set n_threads to os.cpu_count() for speed