Instructions to use DS-Archive/limarp-miqu-1-70b-qlora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use DS-Archive/limarp-miqu-1-70b-qlora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("models/miqu-1-70b-sf") model = PeftModel.from_pretrained(base_model, "DS-Archive/limarp-miqu-1-70b-qlora") - Notebooks
- Google Colab
- Kaggle
| library_name: peft | |
| tags: | |
| - generated_from_trainer | |
| - llama | |
| - llama 2 | |
| model-index: | |
| - name: volume/limarp-70b-qlora | |
| results: [] | |
| datasets: | |
| - lemonilia/LimaRP | |
| language: | |
| - en | |
| [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl) | |
| <details><summary>See axolotl config</summary> | |
| axolotl version: `0.4.0` | |
| ```yaml | |
| base_model: models/miqu-1-70b-sf | |
| model_type: LlamaForCausalLM | |
| tokenizer_type: LlamaTokenizer | |
| is_llama_derived_model: true | |
| load_in_8bit: false | |
| load_in_4bit: true | |
| strict: false | |
| datasets: | |
| - path: train-all-max-alpaca-llama.jsonl | |
| type: completion | |
| dataset_prepared_path: | |
| val_set_size: 0.0 | |
| output_dir: ./volume/limarp-70b-qlora | |
| adapter: qlora | |
| lora_model_dir: | |
| sequence_len: 16384 | |
| sample_packing: true | |
| pad_to_sequence_len: true | |
| lora_r: 32 | |
| lora_alpha: 16 | |
| lora_dropout: 0.05 | |
| lora_target_modules: | |
| lora_target_linear: true | |
| lora_fan_in_fan_out: | |
| wandb_project: 70b-lora | |
| wandb_entity: | |
| wandb_watch: | |
| wandb_name: | |
| wandb_log_model: | |
| gradient_accumulation_steps: 4 | |
| micro_batch_size: 1 | |
| num_epochs: 2 | |
| optimizer: adamw_bnb_8bit | |
| lr_scheduler: cosine | |
| learning_rate: 0.0001 | |
| train_on_inputs: true | |
| group_by_length: false | |
| bf16: true | |
| fp16: false | |
| tf32: true | |
| gradient_checkpointing: true | |
| gradient_checkpointing_kwargs: | |
| use_reentrant: true | |
| early_stopping_patience: | |
| resume_from_checkpoint: | |
| local_rank: | |
| logging_steps: 1 | |
| xformers_attention: | |
| flash_attention: true | |
| warmup_steps: 10 | |
| eval_steps: | |
| eval_table_size: | |
| save_steps: | |
| debug: | |
| deepspeed: | |
| weight_decay: 0.0 | |
| fsdp: | |
| fsdp_config: | |
| special_tokens: | |
| bos_token: "<s>" | |
| eos_token: "</s>" | |
| unk_token: "<unk>" | |
| ``` | |
| </details><br> | |
| # limarp-miqu-1-70b-qlora | |
| Experimental limarp qlora trained at 16384 ctx length (greater than size of the longest limarp sample when tokenized via llama's tokenizer) on the fixed dequantized miqu-1-70b model by 152334H. | |
| I wasn't particularly happy with the results I got when I tried applying the lora at varying weights to the miqu-1-70b model. It's possible that this is related to the fact that the model was dequantized from Q5_K_M GGUF, or perhaps due to it already being an instruct-tuned model. | |
| However, I decided to go ahead and release this in case someone else finds a use for it. Provided as-is and YMMV. | |
| ## Model description | |
| The intended prompt format is the Alpaca instruction format of LimaRP v3: | |
| ``` | |
| ### Instruction: | |
| Character's Persona: {bot character description} | |
| User's Persona: {user character description} | |
| Scenario: {what happens in the story} | |
| Play the role of Character. Taking the above information into consideration, you must engage in a roleplaying chat with User below this line. Do not write dialogues and narration for User. | |
| ### Input: | |
| User: {utterance} | |
| ### Response: | |
| Character: {utterance} | |
| ### Input: | |
| User: {utterance} | |
| ### Response: | |
| Character: {utterance} | |
| (etc.) | |
| ``` | |
| Inspired by the previously named "Roleplay" preset in SillyTavern, with this version of LimaRP it is possible to append a length modifier to the response instruction sequence, like this: | |
| ``` | |
| ### Input | |
| User: {utterance} | |
| ### Response: (length = medium) | |
| Character: {utterance} | |
| ``` | |
| This has an immediately noticeable effect on bot responses. The lengths using during training are: | |
| `micro`, `tiny`, `short`, `medium`, `long`, `massive`, `huge`, `enormous`, `humongous`, `unlimited`. | |
| **The recommended starting length is medium**. Keep in mind that the AI can ramble or impersonate | |
| the user with very long messages. | |
| The length control effect is reproducible, but the messages will not necessarily follow | |
| lengths very precisely, rather follow certain ranges on average, as seen in this table | |
| with data from tests made with one reply at the beginning of the conversation: | |
|  | |
| Response length control appears to work well also deep into the conversation. **By omitting | |
| the modifier, the model will choose the most appropriate response length** (although it might | |
| not necessarily be what the user desires). | |
| ## Intended uses & limitations | |
| The model will show biases similar to those observed in niche roleplaying forums on the Internet, besides those exhibited by the base model. | |
| ## Training and evaluation data | |
| For more details about LimaRP, see the dataset page. | |
| ## Training procedure | |
| ### Training hyperparameters | |
| The following hyperparameters were used during training: | |
| - learning_rate: 0.0001 | |
| - train_batch_size: 1 | |
| - eval_batch_size: 1 | |
| - seed: 42 | |
| - gradient_accumulation_steps: 4 | |
| - total_train_batch_size: 4 | |
| - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 | |
| - lr_scheduler_type: cosine | |
| - lr_scheduler_warmup_steps: 10 | |
| - num_epochs: 2 | |
| ### Framework versions | |
| - PEFT 0.7.2.dev0 | |
| - Transformers 4.37.0 | |
| - Pytorch 2.1.2+cu118 | |
| - Datasets 2.16.1 | |
| - Tokenizers 0.15.0 |