File size: 2,171 Bytes
da127ba
 
 
 
 
 
 
 
c92614c
fdabcf2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c92614c
fdabcf2
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---

language: en
license: mit
tags:
  - image-to-json
  - fine-tuning
datasets:
  - naver-clova-ix/cord-v2
---


# Fine-Tuned LLAVA Model

This repository hosts the fine-tuned LLAVA model files, which have been adapted for data parsing and extracting JSON information from image reciepts. The model was fine-tuned on [cord-v2](https://huggingface.co/datasets/naver-clova-ix/cord-v2) dataset.

## Model Details

### Model Versions
- **LLAVA 1.6 Mistral 7B**  
  Fine-tuned version on Cord-V2 datasets.

## How to Use

You can load and use this model directly from the HuggingFace Hub with the `transformers` library. Below is an example of how to load the model:

```python

from transformers import AutoProcessor, BitsAndBytesConfig, LlavaNextForConditionalGeneration



MODEL_ID = "llava-hf/llava-v1.6-mistral-7b-hf"

REPO_ID = "Farzad-R/llava-v1.6-mistral-7b-cordv2"



processor = AutoProcessor.from_pretrained(MODEL_ID)



quantization_config = BitsAndBytesConfig(

    load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16

)

model = LlavaNextForConditionalGeneration.from_pretrained(

    REPO_ID,

    torch_dtype=torch.float16,

    quantization_config=quantization_config,

)



image = Image.open(io.BytesIO(image_bytes))



# Prepare input

prompt = f"[INST] <image>\nExtract JSON [/INST]"

max_output_token = 256

inputs = processor(prompt, image, return_tensors="pt").to("cuda:0")

output = model.generate(**inputs, max_new_tokens=max_output_token)

response = processor.decode(output[0], skip_special_tokens=True)



# Convert response to JSON

generated_json = token2json(response)

```
---
To see the fine-tuning process and training configurtaton please visit [this GitHub](https://github.com/Farzad-R/Finetune-LLAVA-NEXT) repository.
---

## Additional Resources

- [Link to Hyperstack Cloud](https://www.hyperstack.cloud/?utm_source=Influencer&utm_medium=AI%20Round%20Table&utm_campaign=Video%201)
- [GitHub Repository for Fine-Tuning LLAVA](https://github.com/Farzad-R/Finetune-LLAVA-NEXT)
- A link to a YouTube video will be added here soon to provide further insights and demonstrations.