--- title: AI Prof emoji: 🎓 colorFrom: indigo colorTo: purple sdk: gradio sdk_version: 5.50.0 app_file: app.py python_version: "3.12" tags: - track:backyard - sponsor:openbmb - sponsor:openai - sponsor:nvidia - sponsor:modal - achievement:offbrand - achievement:llama - achievement:fieldnotes --- # AI Prof A real-time AI teaching assistant for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon) (**Backyard AI** track). Upload your real lecture slides and AI Prof walks through them one by one — reading each slide *as an image* (diagrams, equations, layout, not just scraped text) and explaining it like a TA would in class. Ask a question at any time and it pauses, answers, and picks back up. Built for a classmate who has the slides but misses the in-class explanation. ## Demo - **Live Space:** [AI Prof on Hugging Face](https://huggingface.co/spaces/build-small-hackathon/ai-prof) - **Demo video:** **TODO before submission — add the public video URL here** - **Social post:** **TODO before submission — add the public post URL here** - **Team:** [@pranavkarthik10](https://huggingface.co/pranavkarthik10) ### Judge-friendly walkthrough 1. Upload a lecture PDF. 2. Watch AI Prof index every slide before teaching, giving the agent a map of the complete lecture. 3. Start the lecture and see the professor explain slides aloud while adding notes or equations to the whiteboard. 4. Interrupt with a typed or spoken question. The professor can answer, navigate to a supporting slide, and continue. ## How it works - **Eyes — [MiniCPM-V](https://huggingface.co/openbmb/MiniCPM-V-4) (~4.1B, GGUF / llama.cpp):** reads each slide image. - **Brain — [Nemotron 3 Nano](https://huggingface.co/collections/nvidia/nvidia-nemotron-v3):** turns the slide reading into a teaching plan, selects tools, explains concepts, and answers interjections. - **Voice:** VoxCPM2 speaks the professor's response; distil-Whisper transcribes student questions. - **Runtime:** Modal hosts the four OpenAI-compatible inference services and scales them down when idle. - **Frontend:** a custom Gradio classroom UI hosted as a Hugging Face Space. Every model stays under the hackathon's ≤32B-per-model cap. ## Why it is agentic AI Prof does more than summarize a PDF. Nemotron receives a compact index of the entire deck, the current slide reading, recent conversation, and whiteboard state. For every teaching beat it returns validated narration and actions. The orchestrator can navigate slides, write a note, typeset an equation, clear the board, or continue the lecture. Student questions cancel the active lecture turn and trigger a new plan grounded in the full deck. Current tools: `goto_slide`, `next_slide`, `prev_slide`, `write_note`, `write_latex`, and `clear_whiteboard`. ## Small-model stack | Role | Model | Size / serving | | --- | --- | --- | | Slide vision | MiniCPM-V-4 | ~4.1B, Q4_K_M GGUF with llama.cpp | | Professor agent | NVIDIA Nemotron 3 Nano | 30B total / 3B active, vLLM | | Speech synthesis | VoxCPM2 | vLLM-Omni | | Speech recognition | distil-Whisper large-v3 | faster-whisper | ## Run it ```bash uv venv --python 3.12 && uv pip install -r requirements.txt .venv/bin/python app.py # opens the Gradio UI ``` Out of the box it runs in **mock mode** (no weights) so you can drive the full UX — upload a PDF, watch it stream explanations, ask questions. To plug in the real models, copy `.env.example` to `.env` and point `VISION_BASE_URL` / `BRAIN_BASE_URL` at your OpenAI-compatible endpoints (llama.cpp `llama-server` for MiniCPM-V, vLLM/llama.cpp for Nemotron). MiniCPM-V is included as a Modal service: ```bash modal run modal_app_vision.py::download_model modal run modal_app_vision.py::warm modal deploy modal_app_vision.py ``` Set `VISION_BASE_URL` to the deployed `serve` URL. The service uses MiniCPM-V-4 Q4_K_M plus its F16 vision projector on one L4 and scales down after five idle minutes. ## Pre-indexed deck cache AI Prof hashes each uploaded PDF and caches its rendered slide images, extracted text, MiniCPM-V readings, and complete deck index. Re-uploading the same deck with the same vision model and DPI skips indexing entirely. The cache is local by default. To share a prepared demo deck across local testing and the public Space, create a Hugging Face dataset repo and configure: ```bash HF_DECK_CACHE_REPO=pranavkarthik10/ai-prof-decks HF_TOKEN=hf_... HF_DECK_CACHE_WRITE=true ``` Upload the demo PDF once while writes are enabled. The processed deck is stored under `decks//` in the dataset. For the public Space, set `HF_DECK_CACHE_REPO` but leave `HF_DECK_CACHE_WRITE=false`; judges can then load that exact deck from the **Prepared lectures** picker without uploading the PDF or granting the app write access. Do not enable remote writes for arbitrary uploads to a public dataset. Processed slides can contain the original lecture material. Use a private dataset for personal testing or only prewarm material you have permission to publish.