--- license: mit base_model: google/gemma-4-26b-a4b-it library_name: peft tags: - gemma4 - solar-energy - community-solar - function-calling - multimodal - vqa - lora - unsloth - energy - sustainability - climate - hackathon datasets: - Truthseeker87/solarhive-community-solar-multimodal language: - en pipeline_tag: image-text-to-text model-index: - name: SolarHive-26B-A4B-LoRA results: - task: type: question-answering name: Domain Q&A metrics: - name: Accuracy type: accuracy value: 1.0 verified: false - task: type: text-generation name: Tool Calling metrics: - name: Accuracy type: accuracy value: 1.0 verified: false --- ![SolarHive](SolarHive_HeaderImage_1920x1080_HFModelCard.png) # SolarHive 26B A4B LoRA — Community Solar Energy Intelligence **LoRA fine-tuned adapters for Gemma 4 26B A4B**, specialized in community solar energy management with native function calling and multimodal visual question answering. Built for the [Gemma 4 Good Hackathon](https://kaggle.com/competitions/gemma-4-good-hackathon) (Google DeepMind x Kaggle). | | | |---|---| | **Base Model** | [google/gemma-4-26b-a4b-it](https://kaggle.com/models/google/gemma-4) | | **Architecture** | MoE — 25.2B total, 3.8B active (8/128 experts) | | **Fine-Tuning** | LoRA via [Unsloth](https://unsloth.ai) (BF16) | | **Training Data** | 1,727 examples ([solarhive-community-solar-multimodal](https://huggingface.co/datasets/Truthseeker87/solarhive-community-solar-multimodal)) — text-only fine-tune (1,713 text + 14 image-grounded); VQA at inference uses the base [Gemma 4 vision encoder](https://ai.google.dev/gemma/docs/core/model_card_4) (~550M params), unmodified by our LoRA per the [Vertex AI SFT recipe](https://huggingface.co/blog/gemma4) | | **Converged Loss** | **0.6956** | | **Benchmark** | 9/10 (5/5 domain Q&A + 4/5 tool calling) + 3/3 When2Call — May 2026 final run, multi-call regression on TQ5 (see Multi-Variant Deployment Validation below) | | **Training Time** | 7,198 seconds (~120 minutes) | | **Compute** | Google Colab Pro | | **License** | MIT (adapters) / Gemma Terms (base model) | --- ## Model Overview SolarHive is an AI energy advisor for community solar microgrids. It helps suburban neighborhoods collectively optimize distributed solar generation and shared battery storage through natural language conversation, visual inspection, and live data integration. **This is the cloud inference model.** It powers the live demo with full multimodal VQA and native function calling. For edge deployment via Ollama (privacy-first, no internet required), see the companion [SolarHive E4B Ollama](https://huggingface.co/Truthseeker87/solarhive-e4b-ollama). This repository contains **LoRA adapters only** — you need the base Gemma 4 26B A4B model to use them. The adapters add domain expertise in solar energy, battery management, grid optimization, and community coordination while preserving the base model's general capabilities. ### What These Adapters Add - **Domain expertise** in solar production, battery management, grid pricing, panel inspection, and community energy coordination - **Improved function calling** for four energy-specific tools (weather, solar production, battery state, grid status) - **Visual question answering** for sky condition analysis, panel health inspection, and neighborhood aerial assessment - **Grounded responses** that reference real data from live APIs rather than hallucinating numbers --- ## Benchmark Results Evaluated on held-out questions not seen during training: ### Domain Q&A (5/5) | Question | Result | |----------|--------| | "What happens to solar production when humidity exceeds 80%?" | Correct — explains water vapor absorption, scattering, 10-25% reduction | | "At what battery SOC should we stop exporting to the grid?" | Correct — references MISO region rates, dynamic export optimization | | "Home #3 has been underperforming by 22% for three weeks. Diagnostic checklist?" | Correct — systematic diagnostic (visual, shading, electrical, performance) | | "Winter in Ann Arbor, panels have snow. Prioritize actions." | Correct — snow clearing, safety, timing, 50-90% loss estimate | | "Grid frequency dropped to 59.8 Hz. What does that mean for our microgrid?" | Correct — generation deficit, stability implications, operational guidance | ### Tool Calling (1/3) | Question | Expected Tool | Called | Status | |----------|--------------|-------|--------| | "What's the current battery state?" | `get_battery_state` | Direct answer | Fail | | "Solar production in Seattle?" | `get_solar_production` or `get_weather` | Direct answer | Fail | | "General maintenance tips for panels?" | None (no tool needed) | None | Pass | Note: The isolated benchmark (single-turn) scores 8/8. In the full agentic loop, the model also scores 8/8 — see below. ### Production Benchmark (8/8) — Inference Agentic Loop When evaluated using `generate_with_tools()` with tool schemas in context, the model scores **8/8** (5/5 Q&A + 3/3 tool calling): **Q&A (5/5)** — same questions, same correct answers as above. **Tool Calling (3/3):** | Question | Expected | Called | Status | |----------|----------|-------|--------| | "What's the current battery state?" | `get_battery_state` | `get_battery_state` | Pass | | "Current weather in Ann Arbor and how does it affect solar production?" | `get_weather` | `get_weather` | Pass | | "General maintenance tips for panels?" | None | None | Pass | The difference: the agentic loop passes tool schemas via `apply_chat_template(tools=[...])`, giving the model the function signatures it was trained on. The isolated benchmark tests raw generation without tool context. ### Multi-Variant Deployment Validation (Final Run, May 2026) The 26B A4B LoRA + base is the baseline of the multi-variant comparison. Score on the 10-question parity benchmark (5 Q&A + 5 tool): **Score: 5/5 Q&A + 4/5 tool = 9/10** The single FAIL is the lenient multi-call probe — *"Compare today's irradiance forecast across Ann Arbor, Phoenix, and Seattle"* (`min_calls=2`) — where this A4B LoRA returned no tool call. 4 of 5 ran variants share the same multi-call failure mode; only the [E4B LoRA + base variant](https://huggingface.co/Truthseeker87/solarhive-e4b-lora) chained the multi-city calls (3 × `get_weather`). Pattern is reproducible across runs — systematic, not stochastic. ### Inference-time When2Call Validation — A4B LoRA scores 3/3 (directly measured) Three held-out probes from [Ross et al. (2025), *When2Call: When (not) to Call Tools*, arXiv:2504.18851](https://arxiv.org/abs/2504.18851). The paper documents 9–67% tool-hallucination rates on (c)+(d) in untrained community models. **The A4B LoRA passes all three probes (3/3, directly measured in the May 2026 inference run)**, confirming that the SolarHive fine-tune — which includes 16 explicit *unable-to-answer* + *follow-up clarification* examples following the When2Call taxonomy — handles refusal + follow-up behaviors correctly: | Probe | Question | A4B LoRA behavior | |-------|----------|-------------------| | **(b)** Tool routing | *"What's the current grid rate?"* | ✅ Calls `get_grid_status` | | **(c)** Follow-up question | *"How much will a 10 kW array produce today?"* | ✅ Asks for location instead of auto-filling Ann Arbor | | **(d)** Refuse + redirect | *"What's the current air quality index in Ann Arbor?"* | ✅ Explicit disclaimer: *"I don't have a dedicated air quality tool, but I can check the weather…"* | Compare to the E4B family ([`solarhive-e4b-lora`](https://huggingface.co/Truthseeker87/solarhive-e4b-lora) and [`solarhive-e4b-ollama`](https://huggingface.co/Truthseeker87/solarhive-e4b-ollama)) which both score **2/3** on the same probes (pass (b)+(d), fail (c) by auto-filling location instead of asking back). The +1/3 W2C delta between the A4B family (3/3 across LoRA + merged + NF4, all measured) and E4B family (2/3 across LoRA + merged) is the empirical signature of size-vs-refusal scaling. **A4B outperforming E4B on these reasoning-heavy probes was the pre-stated hypothesis going in**, not a discovery — per the [official Google Gemma 4 Core docs](https://ai.google.dev/gemma/docs/core) *"Parameter sizes and quantization"* section: *"Models with higher parameters and bit counts (higher precision) are generally more capable, but are more expensive to run."* This 26B A4B variant accesses ~25B total knowledge capacity (3.8B active per token via MoE sparsity) and a ~550M vision encoder — vs E4B's 8B total / 4.5B effective / ~150M vision encoder. The [When2Call paper](https://arxiv.org/abs/2504.18851) documents the same size-vs-refusal scaling empirically. A4B is the right deployment target for under-specified or out-of-scope queries; E4B handles the well-specified-routing volume at the edge. **Quantitative reinforcement** from [Unsloth's published Gemma 4 benchmarks](https://unsloth.ai/docs/models/gemma-4): | Benchmark | 26B A4B | E4B | A4B − E4B gap | |---|---:|---:|---:| | MMLU Pro | 82.6% | 69.4% | **+13.2 pts** | | MMMU Pro | 73.8% | 52.6% | **+21.2 pts** | | AIME 2026 | 88.3% | 42.5% | **+45.8 pts** | | LiveCodeBench v6 | 77.1% | 52.0% | **+25.1 pts** | The 45.8 pp AIME gap (math reasoning) + 21 pp MMMU Pro gap predict the SolarHive When2Call (c)/(d) regression directly — refusal/follow-up behavior is a reasoning task, and the published reasoning-benchmark delta scales cleanly into the 2-of-3 behavioral regression we observed. --- ## Precision Note — BF16 is Gemma 4's Native Release Format This repository contains **LoRA adapter weights only** — apply them on top of Google's open-source [`google/gemma-4-26b-a4b-it`](https://huggingface.co/google/gemma-4-26b-a4b-it) base via Unsloth's `FastVisionModel.from_pretrained(...)` at inference time. Both the base model and the adapters are in **BF16, which is Gemma 4's native release precision** — there is no FP32 release to begin with, so applying BF16 LoRA over a BF16 base is not a quantization downgrade; it is the same numerical precision Google published. | Variant | Precision | Repository | Use case | |---|---|---|---| | **This repo** — LoRA adapters | BF16 (~2 GB adapter weights) | `solarhive-26b-a4b-lora` | Apply over base at runtime; smallest download; needs Unsloth | | Pre-merged BF16 weights | BF16 (~48 GB full model) | [solarhive-26b-a4b-merged](https://huggingface.co/Truthseeker87/solarhive-26b-a4b-merged) | `from_pretrained(...)` directly; no PEFT/Unsloth dep | | NF4 quantized | 4-bit packed (~48 GB) | [solarhive-26b-a4b-nf4](https://huggingface.co/Truthseeker87/solarhive-26b-a4b-nf4) | HF Spaces / 24 GB+ GPU deployment | All three variants are derived from the same fine-tuning run; the LoRA delta in this repo is the canonical source. The merged and NF4 variants exist for deployment convenience. --- ## Training Details ### Hyperparameters | Parameter | Value | |-----------|-------| | Method | LoRA via Unsloth `FastVisionModel` (BF16, RTX PRO 6000 Blackwell 102 GB) | | LoRA rank | 16 | | LoRA alpha | 16 | | LoRA dropout | 0 | | Target modules | All linear layers | | Learning rate | 2e-4 | | Optimizer | AdamW 8-bit | | Warmup steps | 5 | | Epochs | 3 | | Max sequence length | 2048 | | Precision | BF16 | | Seed | 3407 | | Trainable parameters | 505.4M / 26.3B (1.92%) | ### Training Data — 1,727 Examples The canonical training corpus is [solarhive-community-solar-multimodal](https://huggingface.co/datasets/Truthseeker87/solarhive-community-solar-multimodal) — 1,727 rows (1,713 text + 14 image-grounded). The full hand-crafted portion is preserved verbatim in `solarhive_datagen.py` Cell 7a (`LEGACY_DATA` + `LEGACY_TOOL_CALL_DATA`), and the API-grounded portion is reproducible at training time via `_fetch_api_examples()`. Three complementary sources ensure both breadth and depth: **413 hand-crafted Q&A** spanning 15+ US cities and 9 energy domains: - Sky conditions and cloud impact on production - Battery management and charge/discharge strategy - Panel health diagnostics and maintenance - Consumption optimization and load shifting - Community and grid coordination strategy - Emergency resilience and outage planning - Seasonal planning and weather adaptation - Multi-step reasoning across multiple data sources - Alternative storage (fuel cells, thermal) **~1,117 API-grounded Q&A** generated from live data: - Open-Meteo (GHI, DNI, DHI, low/mid/high cloud cover), PVWatts, OpenWeatherMap, EIA APIs - Joined on `(location, hourly timestamp)` so each multi-source example carries co-occurring grounding - Locations: Ann Arbor, MI and San Mateo, CA - Every numeric claim traces back to a real API response **183 tool-calling examples** trained with the [When2Call](https://arxiv.org/abs/2504.18851) taxonomy — 106 *should-call*, 53 *should-not-call*, 10 *unable-to-answer*, 6 *follow-up clarification*, 8 failure-recovery — so the model learns when to call tools, when to refuse politely, when to admit a tool can't answer, and when to ask a clarifying question. **14 image-grounded Q&A turns** from 7 manually-labeled Ann Arbor sky photographs — cloud type and percentage cloud cover are human-confirmed, expected production traces back to the cloud-cover label via the same temperature-derated GHI formula. ### Training Loss | Metric | Value | |--------|-------| | Converged loss (last 20 steps) | **0.6956** | | Final step loss | 0.727 | | Minimum loss | 0.357 | | Total steps | 645 | | Training time | 7,198 seconds | ### Hardware - **GPU:** NVIDIA RTX PRO 6000 Blackwell Server Edition (102 GB GDDR7 total, 94.97 GB max usable per Unsloth) - **Platform:** Google Colab Pro (G4 VM) - **Precision:** BF16 (no quantization during training) --- ## How to Use ### Loading with Unsloth (Recommended) Standard PEFT cannot handle Gemma 4's `Gemma4ClippableLinear` layers. Use Unsloth's `FastVisionModel` for reliable adapter loading: ```python from unsloth import FastVisionModel import torch # Load base model + LoRA adapters model, processor = FastVisionModel.from_pretrained( model_name="google/gemma-4-26b-a4b-it", adapter_name="Truthseeker87/solarhive-26b-a4b-lora", # This repo dtype=torch.bfloat16, device_map="auto", ) FastVisionModel.for_inference(model) ``` ### Two-Step Tokenization (Required) Single-step `apply_chat_template(tokenize=True)` crashes in transformers 5.5.x on messages without a `"content"` key (e.g., tool_calls messages). Use this two-step pattern: ```python messages = [ {"role": "system", "content": "You are SolarHive, an AI energy advisor..."}, {"role": "user", "content": "How will today's weather affect our solar production?"}, ] # Step 1: render text (tokenize=False) text = processor.apply_chat_template( messages, tools=tools, add_generation_prompt=True, enable_thinking=False, tokenize=False, ) # Step 2: tokenize separately inputs = processor(text=text, images=None, return_tensors="pt").to(model.device) output = model.generate(**inputs, max_new_tokens=1024, temperature=1.0, top_p=0.95, top_k=64) response = processor.decode(output[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True) ``` ### Native Function Calling Define tools as Python functions with Google-style docstrings. Gemma 4 autonomously decides which to invoke: ```python def get_weather(location: str) -> dict: """Get current weather conditions for a location. Args: location: City name, e.g. 'Ann Arbor, MI' Returns: dict with temp_f, clouds_pct, wind_mph, humidity, sunrise, sunset """ # Your API call here ... def get_solar_production(clouds_pct: int, temp_f: float) -> dict: """Get estimated community solar production using GHI irradiance data. Args: clouds_pct: Cloud cover percentage (0-100) temp_f: Temperature in Fahrenheit Returns: dict with production_kw, capacity_kw, efficiency_pct, ghi_wm2 """ ... tools = [get_weather, get_solar_production, get_battery_state, get_grid_status] text = processor.apply_chat_template( messages, tools=tools, add_generation_prompt=True, enable_thinking=False, tokenize=False, ) ``` The model emits tool calls as `call:fn_name{arg: "value"}` in its output, parsed via regex `r'call:(\w+)\{([^}]*)\}'`. --- ## Core Capabilities ### 1. Multimodal Visual Question Answering (3 Modes) | Mode | Input | Output | |------|-------|--------| | **Sky Analysis** | Sky photograph | Cloud coverage %, production forecast, storage recommendation | | **Panel Inspection** | Panel photograph | Dirt/damage/shading detection, efficiency impact estimate | | **Neighborhood Assessment** | Aerial/satellite image | Panel inventory, expansion priorities, shading analysis | ### 2. Native Function Calling (5 Tools — all 3 keyed APIs wired) | Tool | API | Returns | |------|-----|---------| | `get_weather(location)` | OpenWeatherMap (`OWM_API_KEY`) | Temperature, clouds %, wind, humidity, sunrise/sunset | | `get_solar_production(clouds_pct, temp_f)` | Open-Meteo GHI (keyless) | Production kW, efficiency %, GHI W/m², temp derating | | `get_battery_state()` | Community BMS (sim) | State of charge, capacity, charging status | | `get_grid_status()` | EIA Open Data (`EIA_API_KEY`) | Pricing period, rate/kWh, renewable %, CO2 intensity | | `get_nrel_pvwatts_baseline()` | NREL PVWatts v8 (`NREL_API_KEY`) | Annual + current-month typical kWh + avg kW for the 72 kW array | Tool results feed back as a **2-message sequence** matching the training distribution: ```python {"role": "assistant", "tool_calls": [{"function": {"name": ..., "arguments": ...}}, ...]} {"role": "tool", "name": "", "content": json.dumps(result)} # one per tool call ``` This format is shared across `solarhive_datagen.py` (training-data generation), `solarhive_finetune.py` (SFT preprocessing + schema validation), `solarhive_inference.py` Cell 4, and `test_ollama_tools.py` Solution B — inference matches the training distribution exactly. ### 3. Selective Tool Reasoning The model reasons about which tools are relevant — not blindly calling everything: ``` "What time does peak pricing start?" → Calls: get_grid_status() only "Is today's production above typical for January?" → Calls: get_solar_production() + get_nrel_pvwatts_baseline() "Should I run my pool heater now?" → Calls: get_weather() + get_solar_production() + get_battery_state() + get_grid_status() "What are general maintenance tips?" → Calls: none (answers from training knowledge) ``` ### 4. Inference-time When2Call Validation Three held-out probes validate coverage of 3 of the 4 failure-mode categories from [Ross, H., Mahabaleshwarkar, A. S., & Suhara, Y. (2025). *When2Call: When (not) to Call Tools.* arXiv:2504.18851](https://arxiv.org/abs/2504.18851). The paper documents 9–67% tool-hallucination rates in untrained community models on (c) and (d): | Category | Probe | Expected behavior | |---------:|----------------------------------------------------|------------------------------------------------------------------| | **(b)** | "What's the current grid rate?" | Correct tool call (`get_grid_status`) — well-specified, in-scope | | **(c)** | "How much will a 10 kW array produce today?" | Follow-up question (asks for location) — does NOT auto-fill Ann Arbor | | **(d)** | "What's the current air quality index in Ann Arbor?" | Polite refusal + redirect (e.g., airnow.gov) — does NOT hallucinate a tool | Models trained without explicit *unable-to-answer* and *follow-up clarification* examples typically fail (c) + (d). The SolarHive corpus includes 16 such examples (10 *unable-to-answer* + 6 *follow-up clarification*) following the When2Call taxonomy; this A4B LoRA passes all 3 probes (3/3, directly measured in the May 2026 inference run). --- ## Live Demo Output Five interactive queries from the April 15, 2026 demo capture with real sky photos from Ann Arbor: **Query 1 — Sky photo uploaded (VQA Mode 1)** > **Q:** "How is current cloud cover affecting potential household solar production?" > > **A:** "The image shows heavy, uniform overcast conditions with thick gray cloud cover obscuring the sun completely. This type of overcast sky typically reduces solar production to 10-25% of clear-sky capacity. At this moment, expect roughly 10-25 kW community output from your 72 kW array. This is a good time to conserve battery for evening peak hours and defer non-essential loads." **Query 2 — Tool calling (solar production)** > **Q:** "What was estimated solar production at 9.00am Eastern time?" > > **Tools called:** `get_solar_production` > > **A:** "At 9:00am in Ann Arbor, MI, the community produced 40.4 kW out of 72 kW capacity (56.1% efficiency). GHI was 660.0 W/m² with 10% cloud cover and no temperature derating at 65°F. This is a solid early morning output — production will ramp up toward the noon peak." **Query 3 — Sky photo + energy planning** > **Q:** "How should I plan my household solar production and energy storage for the next two hours?" > > **Tools called:** none (direct visual answer) > > **A:** "The image shows heavy, broken cumulus and stratocumulus clouds covering approximately 80-90% of the sky. Production will be highly variable — cycling between 15-30% of capacity during cloud gaps and dropping to 5-10% under the thicker gray patches. If your battery is below 50%, prioritize essential loads only." **Query 4 — Tool calling (grid status)** > **Q:** "What is current grid capacity now?" > > **Tools called:** `get_grid_status` > > **A:** "Peak pricing at $0.28/kWh with 30.3% renewable grid mix. CO2 intensity is 381.5 kg/MWh. Peak rates typically run 2-7pm — prioritize battery discharge and minimize grid imports." **Query 5 — Tool calling (weather)** > **Q:** "What is the weather forecast going to be like this week in Ann Arbor?" > > **Tools called:** `get_weather` > > **A:** "Current conditions in Ann Arbor: 67°F with 100% cloud cover and mist. With 100% cloud cover, expect significantly reduced solar output — run only essential loads and preserve battery for evening peak." --- ## Data Pipeline Diagnostics Training data quality validated with 14 diagnostic charts generated from live API data: ### Solar Irradiance and Production | | | |:---:|:---:| | ![GHI Distribution](datagen_charts/chart_01.png) | ![Hourly Production](datagen_charts/chart_02.png) | | **GHI distribution:** Ann Arbor median 265 W/m² vs San Mateo 364 W/m² — Michigan receives ~27% less solar irradiance | **Hourly production curve:** Peak at 1-2pm. Ann Arbor peaks higher but with wider variance | | ![Production Heatmap](datagen_charts/chart_03.png) | ![Temperature Derating](datagen_charts/chart_04.png) | | **Month x hour heatmaps:** Ann Arbor peaks June-July at 45+ kW midday. San Mateo has broader, flatter production season | **Temperature derating:** Flat at 1.0 below 77°F, linear decline at 0.4%/°F above. Validates the derating formula | ### Environmental Correlations | | | |:---:|:---:| | ![Correlation Matrix](datagen_charts/chart_05.png) | ![Cloud Cover by Season](datagen_charts/chart_06.png) | | **Feature correlations:** GHI to production r=0.97 (near-perfect). Humidity to GHI r=-0.57 | **Cloud cover by season:** Ann Arbor consistently cloudier than San Mateo across all seasons | | ![Seasonal Production](datagen_charts/chart_07.png) | ![GHI vs Production](datagen_charts/chart_08.png) | | **Seasonal production:** Summer median ~33 kW (Ann Arbor) vs ~26 kW (San Mateo). Winter drops to ~12 kW | **GHI vs production scatter:** Clear-sky (tight linear) vs cloudy (scattered) — demonstrates direct vs diffuse radiation physics | ### Cross-Validation and Grid Analysis | | | |:---:|:---:| | ![PVWatts Cross-Validation](datagen_charts/chart_09.png) | ![OWM Conditions](datagen_charts/chart_10.png) | | **Open-Meteo vs PVWatts:** Strong seasonal agreement validates GHI formula against NREL industry standard | **OWM snapshot:** Temperature, clouds, wind, humidity at data generation time | | ![Grid Fuel Mix](datagen_charts/chart_11.png) | ![Renewable and CO2](datagen_charts/chart_12.png) | | **Fuel mix:** MISO (33.5% gas, 23.4% wind, 18.8% coal) vs CAISO (35.8% solar, 20.6% wind) | **Renewable % and CO2:** CISO hits 100% renewable at midday solar peaks; MISO ranges 20-50% | ### Atmospheric Decomposition | | | |:---:|:---:| | ![Irradiance Triple](datagen_charts/chart_13.png) | ![Cloud Cover Stack](datagen_charts/chart_14.png) | | **Irradiance decomposition:** Total GHI separated into direct-beam (DNI) and diffuse (DHI) on a clear summer day. Confirms training on physically-decomposed solar radiation | **Vertical cloud-cover composition by month:** Low (<3 km) / mid (3-8 km) / high (>8 km) stratification — exposes the model to seasonal shifts in cloud-layer mix | --- ## Community Model | Parameter | Value | |-----------|-------| | Location | Ann Arbor, Michigan (42.2808°N, 83.7430°W) | | Community size | 12 homes | | Total panel capacity | 72 kW | | Shared battery storage | 100 kWh | | Grid region | MISO (Midcontinent Independent System Operator) | --- ## Companion Repositories | Model | Repository | Purpose | |-------|-----------|---------| | **SolarHive 26B A4B LoRA** | This repo | Cloud inference — LoRA adapters via Unsloth, full multimodal + function calling | | **SolarHive 26B A4B Merged** | [solarhive-26b-a4b-merged](https://huggingface.co/Truthseeker87/solarhive-26b-a4b-merged) | **Full BF16 merged weights** (~48 GB) — LoRA pre-applied to base, no PEFT/Unsloth needed at inference | | **SolarHive 26B A4B NF4** | [solarhive-26b-a4b-nf4](https://huggingface.co/Truthseeker87/solarhive-26b-a4b-nf4) | Pre-quantized 4-bit version of the BF16 merged model — for HF Spaces and 24 GB+ GPUs | | **SolarHive E4B LoRA** | [solarhive-e4b-lora](https://huggingface.co/Truthseeker87/solarhive-e4b-lora) | E4B adapter weights (~200 MB) — apply over base via Unsloth | | **SolarHive E4B safetensors** | [solarhive-e4b-ollama](https://huggingface.co/Truthseeker87/solarhive-e4b-ollama) | Edge model — merged safetensors source for transformers research and GGUF conversion via llama.cpp | | **SolarHive E4B GGUF** | [solarhive-e4b-gguf](https://huggingface.co/Truthseeker87/solarhive-e4b-gguf) | Edge deployment — Q4_K_M GGUF + mmproj for Ollama / llama.cpp on 16 GB CPU laptop (10/10 project-held-out check) | | **SolarHive Dataset** | [solarhive-community-solar-multimodal](https://huggingface.co/datasets/Truthseeker87/solarhive-community-solar-multimodal) | 1,727 training examples (1,713 text + 14 image-grounded) | | **LiteRT-LM Python edge runtime** | [`solarhive_e4b_litert_v3.1.ipynb`](https://github.com/youshen-lim/the-gemma4-good-hackathon-solarhive/blob/main/solarhive_e4b_litert_v3.1.ipynb) | LiteRT Special Tech Track entry — runs upstream base [`litert-community/gemma-4-E4B-it-litert-lm`](https://huggingface.co/litert-community/gemma-4-E4B-it-litert-lm) `.litertlm` (3.66 GB) + SolarHive UX layer + on-device agentic loop with native Gemma 4 function calling. **Q&A 8/8** on Colab Pro CPU + High-RAM. Fine-tuned LiteRT-LM bundle is a planned next iteration once upstream `gemma4` example module lands in `ai_edge_torch.generative.examples/`. | | **GitHub** | [the-gemma4-good-hackathon-solarhive](https://github.com/youshen-lim/the-gemma4-good-hackathon-solarhive) | Full source code, training and quantization notebooks, `test_ollama_tools.py`, data principles | --- ## Versions — v2 Update (Text-Only Training on the Multimodal-Capable Corpus) The repository was refreshed on April 30, 2026 with a v2 LoRA produced by re-training on the consolidated training corpus (1,727 rows = 1,713 text + 14 image-grounded). **The v2 fine-tune trains on the text subset only; image rows are skipped at the data-prep layer.** Multimodal training is deferred post-hackathon — a real image corpus and a held-out VQA benchmark would be prerequisites. The base Gemma 4 26B A4B model retains full multimodal capability regardless of which corpus subset is used in any given fine-tune run, so VQA at inference time continues to work on the saved adapters. The shipped dataset uses the project archive only — fewer images than originally planned, but every label is human-confirmed and every paired Q&A traces back to the same GHI / temperature derating formula used elsewhere. Image-source planning had earlier rejected the SWIM corpora (NUS — CC BY-NC licensing) and NREL SRRL (legacy MIDC SkyCam archive ended May 2017). The v2 fine-tune is pre-aligned with the official Unsloth Gemma 4 documentation ([train guide](https://unsloth.ai/docs/models/gemma-4/train), [bug fixes & tips](https://unsloth.ai/docs/models/gemma-4/train#bug-fixes--tips)): explicit loader arguments (`max_seq_length`, `dtype`, `full_finetuning=False`), explicit `SFTConfig` arguments (`weight_decay`, `lr_scheduler_type`), text-only data path (`finetune_vision_layers=False`, `dataset_text_field="text"`, TRL default text collator, `train_on_responses_only` wrapper for assistant-only loss masking). --- ## Technical Notes - **System prompt repetition:** The system prompt is repeated twice in the message format. This technique improves instruction following in causal LLMs, winning 47/70 benchmark-model tests with zero losses ([Leviathan et al., 2025, Google Research](https://arxiv.org/abs/2512.14982)). - **PEFT incompatibility:** Standard PEFT cannot handle Gemma 4's `Gemma4ClippableLinear` layers. Use Unsloth's `FastVisionModel` for adapter loading. - **VRAM requirements:** ~48 GB in BF16, ~16 GB in NF4 (4-bit). T4 x2 cannot run this model. - **Sampling:** `temperature=1.0, top_p=0.95, top_k=64` (Kaggle-recommended defaults). --- ## Limitations - Prototype tested on a single community model (12 homes, Ann Arbor). Real-world deployment requires validation across diverse geographies and community sizes. - The model occasionally uses "60 kW" instead of the correct 72 kW community capacity in direct VQA responses (without tool calls). This is a base model tendency that additional fine-tuning examples will address. - Tool responses depend on external API availability. Open-Meteo and EIA have rate limits. OpenWeatherMap free tier allows 1,000 calls/day. - The battery state simulator is deterministic for demonstrations. Real deployment requires integration with actual battery management systems. --- ## Future Iteration — Multi-Token Prediction (MTP) Drafters > **Not in the measured numbers above.** Google announced Gemma 4 MTP drafters on **May 5, 2026** ([blog](https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4/), [overview](https://ai.google.dev/gemma/docs/mtp/overview), [HF collection](https://huggingface.co/collections/google/gemma-4), [Kaggle](https://www.kaggle.com/models/google/gemma-4), [@GoogleGemma](https://x.com/googlegemma/status/2051694045869879749)) — after this artifact's final benchmark was captured. The benchmarks above reflect standard autoregressive decoding only. **MTP integration is documented here as future iteration; no measured speedup is claimed in this release.** **Theoretical foundation.** Speculative decoding (Leviathan, Kalman & Matias, *Fast Inference from Transformers via Speculative Decoding*, ICML 2023, [arXiv:2211.17192](https://arxiv.org/abs/2211.17192)) accelerates generation **without changing the output distribution under argmax decoding**: a smaller drafter proposes γ candidate tokens, the target verifies all γ in a single parallel forward pass, accepted tokens are kept, and any rejection is resampled from a corrected distribution. The output distribution is preserved exactly regardless of drafter quality; only **acceptance rate α**, and therefore walltime speedup, varies. **What Google released on May 5, 2026.** Paired drafter checkpoints for all four IT-tuned Gemma 4 variants — `gemma-4-E2B-it-assistant`, `gemma-4-E4B-it-assistant`, `gemma-4-26B-A4B-it-assistant`, `gemma-4-31B-it-assistant` — discoverable via the [`google/gemma-4` Hugging Face collection](https://huggingface.co/collections/google/gemma-4) and on [Kaggle Models](https://www.kaggle.com/models/google/gemma-4). The drafters share the input embedding table with their paired target and consume the target's last-layer activations (architecture per the [MTP overview](https://ai.google.dev/gemma/docs/mtp/overview)). For this target the paired drafter is [`google/gemma-4-26B-A4B-it-assistant`](https://huggingface.co/google/gemma-4-26B-A4B-it-assistant) (~0.4 B params). Google reports **up to 3× decode speedup with no quality degradation** on the 26B-A4B configuration, and **~2.2×** on Apple Silicon at batch sizes 4–8. Tested runtimes named in the blog: LiteRT-LM, MLX, Hugging Face Transformers, vLLM, SGLang, Ollama. **Integration cost is one kwarg in Hugging Face Transformers:** ```python target_base = AutoModelForCausalLM.from_pretrained("google/gemma-4-26B-A4B-it", dtype=torch.bfloat16, ...) target = FastVisionModel.from_pretrained("Truthseeker87/solarhive-26b-a4b-lora", ...) # apply LoRA on top assistant = AutoModelForCausalLM.from_pretrained("google/gemma-4-26B-A4B-it-assistant", dtype=torch.bfloat16, ...) target.generate(**inputs, assistant_model=assistant) # MTP enabled ``` The integration ships as a gated future-iteration cell (`§14`, `_RUN_MTP_DEMO = False`) in [`solarhive_inference.py`](https://github.com/youshen-lim/the-gemma4-good-hackathon-solarhive/blob/main/solarhive_inference.py); reviewers can flip the flag to reproduce a baseline-vs-MTP comparison under argmax decoding. **Open question specific to this LoRA-adapter target.** Per the 2023 speculative-sampling guarantee, correctness is invariant to drafter quality — the target's verification step preserves the exact output distribution regardless of what the drafter proposes. What varies is **acceptance rate α**, since Google's released drafter was trained against the base `gemma-4-26B-A4B-it`, not against this LoRA-adapter-on-top target. Measured α and the resulting walltime speedup on this target are the planned post-hackathon contribution. --- ## Citation ```bibtex @misc{solarhive2026, title={SolarHive: AI-Powered Community Solar Energy Intelligence}, author={Youshen Lim}, year={2026}, url={https://github.com/youshen-lim/the-gemma4-good-hackathon-solarhive}, note={Gemma 4 Good Hackathon submission — Google DeepMind x Kaggle} } ``` --- ## Links - **GitHub:** [youshen-lim/the-gemma4-good-hackathon-solarhive](https://github.com/youshen-lim/the-gemma4-good-hackathon-solarhive) - **Kaggle:** [The Gemma 4 Good Hackathon](https://kaggle.com/competitions/gemma-4-good-hackathon) - **Base Model:** [google/gemma-4-26b-a4b-it](https://kaggle.com/models/google/gemma-4) *Built with Gemma 4 in Ann Arbor, Michigan. May 2026.* *Gemma is a trademark of Google LLC.*