Spaces:

FayssalJ
/

visual-search

Running on Zero

FayssalJ Claude Opus 4.5 commited on Feb 2

Commit

76fca23

1 Parent(s): ec0ee7c

Initial setup: Visual Search with Jina CLIP v2

- indexer/: Local indexing script using Jina CLIP v2
- hf-space/: HuggingFace Space app for search API
- CLAUDE.md: Project documentation

Architecture:
- Local model for indexing (free, no API costs)
- HF Space with ZeroGPU for search (free)
- Pinecone for vector storage (free tier)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Files changed (8) hide show

CLAUDE.md +76 -0
hf-space/README.md +43 -0
hf-space/app.py +163 -0
hf-space/requirements.txt +8 -0
indexer/.env.example +7 -0
indexer/.gitignore +6 -0
indexer/index.py +287 -0
indexer/requirements.txt +8 -0

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,76 @@

+# Visual Search Project
+## Overview
+AI-powered visual product search for Shopify stores using Jina CLIP v2 embeddings.
+## Architecture
+```
+┌─────────────────────────────────────────────────────────────┐
+│  INDEXING (Local, one-time)                                 │
+│  Local Jina CLIP v2 → embeddings → Pinecone                 │
+└─────────────────────────────────────────────────────────────┘
+┌─────────────────────────────────────────────────────────────┐
+│  SEARCH (HuggingFace Space, free)                           │
+│  User image → HF Space (Jina CLIP v2) → Pinecone → Results  │
+└─────────────────────────────────────────────────────────────┘
+```
+## Components
+| Component | Location | Purpose |
+|-----------|----------|---------|
+| `indexer/` | Local script | Index products to Pinecone |
+| `hf-space/` | HuggingFace Space | Search API endpoint |
+| `shopify/` | Theme integration | Frontend UI |
+## Tech Stack
+- **Model**: Jina CLIP v2 (jinaai/jina-clip-v2)
+- **Vector DB**: Pinecone (free tier)
+- **Search API**: HuggingFace Spaces (ZeroGPU, free)
+- **Frontend**: Shopify theme integration
+## Environment Variables
+### Indexer (.env)
+```
+SHOPIFY_STORE=25c0da-4
+SHOPIFY_ADMIN_TOKEN=shpat_xxxxx
+PINECONE_API_KEY=xxxxx
+PINECONE_HOST=xxxxx.pinecone.io
+```
+### HF Space (Secrets)
+```
+PINECONE_API_KEY=xxxxx
+PINECONE_HOST=xxxxx.pinecone.io
+```
+## Pinecone Index
+- **Name**: products (or shopify-llm)
+- **Dimensions**: 512
+- **Metric**: cosine
+## Future Plans
+- Sales pattern analysis using visual embeddings
+- Cluster similar products → correlate with sales
+- Predict new product performance
+## Commands
+```bash
+# Index products (run locally)
+cd indexer
+pip install -r requirements.txt
+python index.py --clear
+# Deploy HF Space
+cd hf-space
+# Push to HuggingFace
+```
+## Related
+- Theme repo: Kuwait-v6
+- Store: https://25c0da-4.myshopify.com
+- Store domain: https://alnasser.net

hf-space/README.md ADDED Viewed

	@@ -0,0 +1,43 @@

+---
+title: Visual Product Search
+emoji: 🔍
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 4.44.0
+app_file: app.py
+pinned: false
+license: mit
+---
+# Visual Product Search API
+AI-powered visual search using **Jina CLIP v2** embeddings.
+## Features
+- Upload an image to find visually similar products
+- Uses Jina CLIP v2 for state-of-the-art image embeddings
+- Queries Pinecone vector database for similarity search
+## API Usage
+```python
+from gradio_client import Client
+client = Client("YOUR_USERNAME/visual-search")
+result = client.predict(
+    "path/to/image.jpg",
+    api_name="/predict"
+)
+print(result)
+```
+## Setup
+Set these secrets in HuggingFace Space settings:
+- `PINECONE_API_KEY`: Your Pinecone API key
+- `PINECONE_HOST`: Your Pinecone index host (without https://)
+## Model
+Uses [jinaai/jina-clip-v2](https://huggingface.co/jinaai/jina-clip-v2) - a multilingual multimodal embedding model.

hf-space/app.py ADDED Viewed

	@@ -0,0 +1,163 @@

+"""
+Visual Search API - HuggingFace Space
+Provides image embedding endpoint using Jina CLIP v2.
+Queries Pinecone for similar products.
+Deploy to HuggingFace Spaces with ZeroGPU (free).
+"""
+import os
+import gradio as gr
+import torch
+import numpy as np
+from PIL import Image
+# Pinecone config from HF Secrets
+PINECONE_API_KEY = os.environ.get('PINECONE_API_KEY')
+PINECONE_HOST = os.environ.get('PINECONE_HOST')
+# Model (loaded on first use)
+model = None
+def load_model():
+    """Load Jina CLIP v2 model."""
+    global model
+    if model is None:
+        print("Loading Jina CLIP v2...")
+        from transformers import AutoModel
+        model = AutoModel.from_pretrained(
+            "jinaai/jina-clip-v2",
+            trust_remote_code=True
+        )
+        if torch.cuda.is_available():
+            model = model.cuda()
+        model.eval()
+        print("Model loaded!")
+    return model
+def get_embedding(image: Image.Image) -> list:
+    """Generate 512-dim embedding for an image."""
+    m = load_model()
+    with torch.no_grad():
+        emb = m.encode_image(image)
+        if hasattr(emb, 'cpu'):
+            emb = emb.cpu().numpy()
+        emb = emb.flatten()
+        emb = emb / np.linalg.norm(emb)  # L2 normalize
+        if len(emb) > 512:
+            emb = emb[:512]
+        return emb.tolist()
+def query_pinecone(embedding: list, top_k: int = 12) -> list:
+    """Query Pinecone for similar products."""
+    if not PINECONE_API_KEY or not PINECONE_HOST:
+        return []
+    import requests
+    resp = requests.post(
+        f"https://{PINECONE_HOST}/query",
+        headers={
+            "Api-Key": PINECONE_API_KEY,
+            "Content-Type": "application/json"
+        },
+        json={
+            "vector": embedding,
+            "topK": top_k,
+            "includeMetadata": True
+        },
+        timeout=15
+    )
+    if resp.status_code != 200:
+        return []
+    matches = resp.json().get('matches', [])
+    return [
+        {
+            'handle': m.get('metadata', {}).get('handle', m.get('id')),
+            'title': m.get('metadata', {}).get('title', ''),
+            'score': m.get('score', 0),
+            'image_url': m.get('metadata', {}).get('image_url', '')
+        }
+        for m in matches
+    ]
+def search(image: Image.Image) -> dict:
+    """
+    Main search function.
+    Returns embedding and similar products.
+    """
+    if image is None:
+        return {"error": "No image provided"}
+    # Get embedding
+    embedding = get_embedding(image)
+    # Query Pinecone
+    products = query_pinecone(embedding)
+    return {
+        "embedding": embedding,
+        "products": products
+    }
+def search_simple(image: Image.Image) -> str:
+    """Simple search returning product handles."""
+    if image is None:
+        return "No image"
+    embedding = get_embedding(image)
+    products = query_pinecone(embedding)
+    if not products:
+        return "No similar products found"
+    return "\n".join([
+        f"{i+1}. {p['title']} ({p['handle']}) - {p['score']:.2f}"
+        for i, p in enumerate(products)
+    ])
+# Gradio Interface
+with gr.Blocks(title="Visual Search API") as demo:
+    gr.Markdown("# Visual Product Search")
+    gr.Markdown("Upload an image to find similar products.")
+    with gr.Row():
+        with gr.Column():
+            image_input = gr.Image(type="pil", label="Upload Image")
+            search_btn = gr.Button("Search", variant="primary")
+        with gr.Column():
+            output = gr.Textbox(label="Results", lines=15)
+    search_btn.click(
+        fn=search_simple,
+        inputs=[image_input],
+        outputs=[output]
+    )
+    gr.Markdown("---")
+    gr.Markdown("### API Endpoint")
+    gr.Markdown("""
+    Use the `/api/predict` endpoint for programmatic access:
+    ```python
+    from gradio_client import Client
+    client = Client("YOUR_SPACE_URL")
+    result = client.predict(image_path, api_name="/predict")
+    ```
+    """)
+if __name__ == "__main__":
+    demo.launch()

hf-space/requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+torch
+transformers
+pillow
+numpy
+requests
+einops
+timm
+gradio

indexer/.env.example ADDED Viewed

	@@ -0,0 +1,7 @@

+# Shopify Store
+SHOPIFY_STORE=25c0da-4
+SHOPIFY_ADMIN_TOKEN=shpat_xxxxx
+# Pinecone
+PINECONE_API_KEY=xxxxx
+PINECONE_HOST=xxxxx.pinecone.io

indexer/.gitignore ADDED Viewed

	@@ -0,0 +1,6 @@

+.env
+*.log
+__pycache__/
+*.pyc
+venv/
+.venv/

indexer/index.py ADDED Viewed

	@@ -0,0 +1,287 @@

+#!/usr/bin/env python3
+"""
+Visual Search Product Indexer
+Indexes Shopify products into Pinecone using local Jina CLIP v2 model.
+Uses the SAME model as the HF Space search endpoint for compatible embeddings.
+Usage:
+    python index.py                    # Index all products
+    python index.py --limit 10         # Test with 10 products
+    python index.py --clear            # Clear index first
+    python index.py --dry-run          # Test without uploading
+"""
+import os
+import sys
+import argparse
+import time
+from io import BytesIO
+from pathlib import Path
+try:
+    import torch
+    from PIL import Image
+    import requests
+    from tqdm import tqdm
+    from pinecone import Pinecone
+except ImportError as e:
+    print(f"Missing package: {e}")
+    print("Run: pip install -r requirements.txt")
+    sys.exit(1)
+def load_env():
+    """Load .env file."""
+    env_path = Path(__file__).parent / '.env'
+    if env_path.exists():
+        print(f"Loading {env_path}")
+        for line in env_path.read_text().splitlines():
+            line = line.strip()
+            if line and not line.startswith('#') and '=' in line:
+                key, value = line.split('=', 1)
+                os.environ[key.strip()] = value.strip().strip('"\'')
+load_env()
+# Config
+SHOPIFY_STORE = os.environ.get('SHOPIFY_STORE', '25c0da-4')
+SHOPIFY_ADMIN_TOKEN = os.environ.get('SHOPIFY_ADMIN_TOKEN')
+PINECONE_API_KEY = os.environ.get('PINECONE_API_KEY')
+PINECONE_HOST = os.environ.get('PINECONE_HOST')
+API_VERSION = "2024-01"
+# Model (loaded lazily)
+model = None
+device = None
+def check_config():
+    """Validate environment variables."""
+    missing = []
+    if not SHOPIFY_ADMIN_TOKEN:
+        missing.append('SHOPIFY_ADMIN_TOKEN')
+    if not PINECONE_API_KEY:
+        missing.append('PINECONE_API_KEY')
+    if not PINECONE_HOST:
+        missing.append('PINECONE_HOST')
+    if missing:
+        print("Missing environment variables:")
+        for v in missing:
+            print(f"  - {v}")
+        print("\nCopy .env.example to .env and fill in values")
+        sys.exit(1)
+def load_model():
+    """Load Jina CLIP v2 model."""
+    global model, device
+    print("Loading Jina CLIP v2 model...")
+    print("(First run downloads ~2GB)")
+    from transformers import AutoModel
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    print(f"Using: {device.upper()}")
+    model = AutoModel.from_pretrained(
+        "jinaai/jina-clip-v2",
+        trust_remote_code=True
+    ).to(device).eval()
+    print("Model loaded!")
+def get_pinecone():
+    """Connect to Pinecone."""
+    print("Connecting to Pinecone...")
+    pc = Pinecone(api_key=PINECONE_API_KEY)
+    index = pc.Index(host=f"https://{PINECONE_HOST}")
+    stats = index.describe_index_stats()
+    print(f"Connected! {stats.get('total_vector_count', 0)} vectors")
+    return index
+def fetch_products(limit=None, tags=None):
+    """Fetch products from Shopify."""
+    print(f"Fetching products from {SHOPIFY_STORE}...")
+    if tags:
+        print(f"  Tags filter: {tags}")
+    products = []
+    url = f"https://{SHOPIFY_STORE}.myshopify.com/admin/api/{API_VERSION}/products.json?limit=250&status=active&order=created_at%20desc"
+    headers = {"X-Shopify-Access-Token": SHOPIFY_ADMIN_TOKEN}
+    while url:
+        resp = requests.get(url, headers=headers, timeout=30)
+        resp.raise_for_status()
+        batch = resp.json().get('products', [])
+        # Filter by tags
+        if tags:
+            tag_list = [t.strip().lower() for t in tags.split(',')]
+            batch = [p for p in batch if any(
+                t.lower() in [x.strip().lower() for x in p.get('tags', '').split(',')]
+                for t in tag_list
+            )]
+        products.extend(batch)
+        print(f"  {len(products)} products...", end='\r')
+        if limit and len(products) >= limit:
+            products = products[:limit]
+            break
+        # Pagination
+        url = None
+        link = resp.headers.get('Link', '')
+        if 'rel="next"' in link:
+            for part in link.split(','):
+                if 'rel="next"' in part:
+                    url = part.split('<')[1].split('>')[0]
+    print(f"\nFetched {len(products)} products")
+    return products
+def download_image(url):
+    """Download image as PIL."""
+    try:
+        url = url + ('&' if '?' in url else '?') + 'width=512'
+        resp = requests.get(url, timeout=15)
+        resp.raise_for_status()
+        return Image.open(BytesIO(resp.content)).convert('RGB')
+    except:
+        return None
+def get_embedding(image):
+    """Generate embedding."""
+    global model
+    try:
+        with torch.no_grad():
+            emb = model.encode_image(image)
+            if hasattr(emb, 'cpu'):
+                emb = emb.cpu().numpy()
+            emb = emb.flatten()
+            emb = emb / (emb ** 2).sum() ** 0.5  # L2 normalize
+            if len(emb) > 512:
+                emb = emb[:512]
+            return emb.tolist()
+    except Exception as e:
+        print(f"\nEmbedding error: {e}")
+        return None
+def get_price(product):
+    """Extract price from variants."""
+    try:
+        return float(product.get('variants', [{}])[0].get('price', 0))
+    except:
+        return 0.0
+def main():
+    parser = argparse.ArgumentParser(description='Index products for visual search')
+    parser.add_argument('--limit', type=int, help='Limit products')
+    parser.add_argument('--tags', type=str, default='clothing,footwear', help='Filter by tags')
+    parser.add_argument('--batch-size', type=int, default=100, help='Pinecone batch size')
+    parser.add_argument('--clear', action='store_true', help='Clear index first')
+    parser.add_argument('--dry-run', action='store_true', help='No upload')
+    args = parser.parse_args()
+    print("=" * 50)
+    print("  Visual Search Indexer")
+    print("=" * 50)
+    check_config()
+    load_model()
+    index = None
+    if not args.dry_run:
+        index = get_pinecone()
+        if args.clear:
+            print("Clearing index...")
+            index.delete(delete_all=True)
+            time.sleep(2)
+    products = fetch_products(limit=args.limit, tags=args.tags)
+    if not products:
+        print("No products found!")
+        return
+    print(f"\nIndexing {len(products)} products...")
+    vectors = []
+    ok, skip, err = 0, 0, 0
+    for product in tqdm(products, desc="Processing"):
+        if not product.get('images'):
+            skip += 1
+            continue
+        try:
+            # Get default image
+            images = product['images']
+            img_data = next((i for i in images if i.get('position') == 1), images[0])
+            img_url = img_data['src']
+            # Download & embed
+            img = download_image(img_url)
+            if not img:
+                err += 1
+                continue
+            emb = get_embedding(img)
+            if not emb:
+                err += 1
+                continue
+            # Build vector with metadata for future analysis
+            tags = [t.strip() for t in product.get('tags', '').split(',') if t.strip()]
+            vectors.append({
+                'id': str(product['id']),
+                'values': emb,
+                'metadata': {
+                    'product_id': product['id'],
+                    'handle': product['handle'],
+                    'title': product['title'],
+                    'vendor': product.get('vendor', ''),
+                    'product_type': product.get('product_type', ''),
+                    'tags': tags[:20],
+                    'price': get_price(product),
+                    'created_at': product.get('created_at', ''),
+                    'image_url': img_url
+                }
+            })
+            ok += 1
+            # Batch upload
+            if len(vectors) >= args.batch_size and not args.dry_run:
+                index.upsert(vectors=vectors)
+                vectors = []
+        except Exception as e:
+            err += 1
+    # Final batch
+    if vectors and not args.dry_run:
+        index.upsert(vectors=vectors)
+    print("\n" + "=" * 50)
+    print("  Done!")
+    print("=" * 50)
+    print(f"  Indexed: {ok}")
+    print(f"  Skipped: {skip}")
+    print(f"  Errors:  {err}")
+    if args.dry_run:
+        print("  (dry run - nothing uploaded)")
+    print("=" * 50)
+if __name__ == "__main__":
+    main()

indexer/requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+torch
+transformers
+pillow
+pinecone-client
+requests
+tqdm
+einops
+timm