Spaces:
Paused
Paused
| """ | |
| Gradio app que expone las 4 tools del RAG sobre ESL/ISLP/FES/PDSH como MCP Server. | |
| Este módulo es la variante "HF Spaces" del servidor (transporte streamable HTTP): | |
| - UI web para uso humano (debug y exploración). | |
| - Endpoint MCP en `/gradio_api/mcp/` (gracias a `demo.launch(mcp_server=True)`). | |
| La lógica de las tools vive en `rag_books_mcp.tools`. Aquí solo armamos la UI. | |
| Para correr local: | |
| uv run python -m rag_books_mcp.app | |
| Para desplegar a HF Spaces: | |
| python deploy_to_hf_space.py # requiere HF_TOKEN | |
| """ | |
| from __future__ import annotations | |
| import gradio as gr | |
| from rag_books_mcp.tools import ( | |
| cite_foundation, | |
| get_section, | |
| list_available_topics, | |
| search_theory, | |
| ) | |
| # --- Tabs de la UI (cada uno corresponde a una tool MCP) --- | |
| def _build_search_tab() -> gr.Interface: | |
| return gr.Interface( | |
| fn=search_theory, | |
| inputs=[ | |
| gr.Textbox( | |
| label="query", | |
| value="bias-variance tradeoff", | |
| placeholder="Consulta en lenguaje natural", | |
| ), | |
| gr.Radio( | |
| choices=["all", "both", "esl", "islp", "fes", "pdsh", "r4ds"], | |
| value="all", | |
| label="book", | |
| info="R4DS está en R/tidyverse; los principios se traducen a pandas/seaborn.", | |
| ), | |
| gr.Slider(minimum=1, maximum=10, step=1, value=5, label="top_k"), | |
| ], | |
| outputs=gr.Markdown(label="Resultados"), | |
| title="🔎 search_theory", | |
| description=( | |
| "Búsqueda semántica en ESL, ISLP, FES, PDSH y R4DS. Devuelve los " | |
| "fragmentos más relevantes ordenados por similitud." | |
| ), | |
| api_name="search_theory", | |
| ) | |
| def _build_get_section_tab() -> gr.Interface: | |
| return gr.Interface( | |
| fn=get_section, | |
| inputs=[ | |
| gr.Radio(choices=["esl", "islp", "fes", "pdsh", "r4ds"], value="islp", label="book"), | |
| gr.Textbox( | |
| label="chapter", | |
| value="8 Tree-Based Methods", | |
| placeholder="Nombre del capítulo (búsqueda parcial soportada)", | |
| ), | |
| gr.Textbox( | |
| label="section", | |
| value="", | |
| placeholder="(Opcional) Nombre de la sección", | |
| ), | |
| gr.Slider(minimum=1, maximum=15, step=1, value=5, label="max_chunks"), | |
| ], | |
| outputs=gr.Markdown(label="Sección"), | |
| title="📑 get_section", | |
| description=( | |
| "Recupera una sección específica de ESL, ISLP, FES, PDSH o R4DS. Si " | |
| "no se encuentra por metadata, hace fallback a búsqueda semántica." | |
| ), | |
| api_name="get_section", | |
| ) | |
| def _build_cite_tab() -> gr.Interface: | |
| return gr.Interface( | |
| fn=cite_foundation, | |
| inputs=[ | |
| gr.Textbox( | |
| label="topic", | |
| value="ridge regression", | |
| placeholder="Tema a fundamentar (ej: 'bagging', 'feature selection', 'EDA')", | |
| ), | |
| gr.Radio( | |
| choices=["brief", "medium", "deep"], | |
| value="medium", | |
| label="detail_level", | |
| ), | |
| ], | |
| outputs=gr.Markdown(label="Fundamentación"), | |
| title="📚 cite_foundation", | |
| description=( | |
| "Fundamentación teórica que cita los 5 libros: ISLP (intuitivo), " | |
| "ESL (riguroso), FES (feature engineering), PDSH (código Python) y " | |
| "R4DS (workflow iterativo de EDA y data wrangling)." | |
| ), | |
| api_name="cite_foundation", | |
| ) | |
| def _build_list_topics_tab() -> gr.Interface: | |
| return gr.Interface( | |
| fn=list_available_topics, | |
| inputs=[], | |
| outputs=gr.Markdown(label="Contenido indexado"), | |
| title="🗂️ list_available_topics", | |
| description="Lista los capítulos y secciones indexados en ChromaDB.", | |
| api_name="list_available_topics", | |
| ) | |
| def build_demo() -> gr.Blocks: | |
| """Construye la UI tabulada del MCP Server.""" | |
| with gr.Blocks(title="rag-books-mcp · ESL + ISLP + FES + PDSH + R4DS") as demo: | |
| gr.Markdown( | |
| """ | |
| # 📖 RAG Books MCP — ESL + ISLP + FES + PDSH + R4DS | |
| Servidor MCP que expone búsqueda semántica sobre cinco libros de | |
| referencia de Statistical Learning, Data Science y Data Wrangling: | |
| - **ESL** — *The Elements of Statistical Learning* (Hastie, Tibshirani, Friedman) | |
| - **ISLP** — *An Introduction to Statistical Learning with Python* (James, Witten, Hastie, Tibshirani) | |
| - **FES** — *Feature Engineering and Selection* (Kuhn, Johnson) | |
| - **PDSH** — *Python Data Science Handbook* (VanderPlas) | |
| - **R4DS** — *R for Data Science, 2nd Ed.* (Wickham, Çetinkaya-Rundel, Grolemund) — _ejemplos en R/tidyverse, principios universales para EDA y data wrangling_ | |
| > ℹ️ R4DS está bajo licencia CC BY-NC-ND 3.0 US. Está incluido en | |
| > este Space bajo uso académico, con atribución explícita a sus | |
| > autores. Mecanismo de takedown en el [DATA_CARD del dataset v2](https://huggingface.co/datasets/gusdelact/rag-esl-islp-chromadb). | |
| **Endpoint MCP:** `/gradio_api/mcp/` (streamable HTTP). | |
| **Embeddings:** `sentence-transformers/all-MiniLM-L6-v2` (local, sin API key). | |
| **Vector store:** ChromaDB con 3689 chunks (1093 ESL + 884 ISLP + 465 FES + 563 PDSH + 684 R4DS). | |
| Usa los tabs de abajo para probar las tools desde el navegador, o | |
| conéctalas a tu cliente MCP (Kiro, Claude Desktop, Cursor, etc.). | |
| """ | |
| ) | |
| gr.TabbedInterface( | |
| interface_list=[ | |
| _build_search_tab(), | |
| _build_cite_tab(), | |
| _build_get_section_tab(), | |
| _build_list_topics_tab(), | |
| ], | |
| tab_names=["search_theory", "cite_foundation", "get_section", "list_available_topics"], | |
| ) | |
| return demo | |
| def main() -> None: | |
| demo = build_demo() | |
| # mcp_server=True publica el endpoint MCP en /gradio_api/mcp/. | |
| # server_name=0.0.0.0 es necesario en HF Spaces; localmente Gradio expone 127.0.0.1. | |
| demo.launch(mcp_server=True, server_name="0.0.0.0") | |
| if __name__ == "__main__": | |
| main() | |