tebyan-medical-backend / ARCHITECTURE.md
Ψ±ΨΊΨ―
feat: complete platform β€” auth, deployment, hardening
344e369
|
raw
history blame
5.92 kB

ΨͺΨ¨ΩŠΨ§Ω† Ψ§Ω„Ψ·Ψ¨ΩŠ β€” Architecture

System Overview

Tebyan Medical is a production-grade Arabic medical report analysis platform. Users upload lab reports (PDF or image), the system extracts findings, generates a clinical interpretation in Arabic, and provides an interactive voice-enabled chat assistant.

Frontend (Next.js 15)          Backend (FastAPI)              External Services
─────────────────────          ─────────────────              ─────────────────
Upload β†’ /api/analyze ────────► AgentCoordinator              Groq (LLM/STT)
Chat   β†’ /api/chat   ────────► RAG + LLM streaming            Supabase (pgvector)
Voice  β†’ /api/voice  ────────► WhisperSTT / TTS               Cohere (rerank)
Risk   β†’ /api/risk   ────────► RiskEngine                     Google Vision/TTS

Backend Layer

Entry Point

backend/main.py β€” FastAPI application. Registers all routes, mounts middleware, and wires dependency-injected singletons via @lru_cache.

Multi-Agent Pipeline (services/agents/)

Replaces the flat services/agent/pipeline.py (kept for backward compat) with a structured agent graph:

AgentCoordinator
β”œβ”€β”€ OCRAgent           β€” PDF (fitz + EasyOCR) + Image (Google Vision fallback)
β”œβ”€β”€ ExtractionAgent    β€” regex parse β†’ LLM fallback β†’ physiological bounds filter
β”œβ”€β”€ ClassificationAgent β€” panel detection (CBC, Thyroid, Liver, Kidney, Lipid, Diabetes)
β”œβ”€β”€ MedicalReasoningAgent β€” RAG retrieval + panel-specific LLM prompt
└── SafetyAgent        β€” PDPL/NDMO compliance filter + emergency detection

Each agent extends AgentBase which provides retry-with-backoff and structured logging via AgentContext.

RAG Stack (services/rag/, services/search/)

  • Embedding: intfloat/multilingual-e5-large (1024-dim, via HuggingFace)
  • Vector store: Supabase pgvector (match_documents RPC, 2834+ medical chunks)
  • Query expansion: Groq LLM generates 3 alternate queries; query_parser.py adds Arabic synonym expansions
  • Retrieval: BM25 fallback + Cohere rerank-v3.5 cross-encoder
  • Cache: 5-minute TTL in-memory LRU (services/cache.py) prevents redundant embedding calls

Risk Engine (services/risk/)

Evidence-based clinical threshold scoring for 6 conditions. FeatureExtractor normalises findings into a 35-feature vector; each scorer (_score_diabetes, _score_cardiovascular, etc.) applies WHO/ADA/ACC clinical cutoffs. ML .pkl model files in services/risk/models/ override rule-based scores when present.

Voice (services/voice/)

  • STT: WhisperSTT wraps Groq whisper-large-v3. Accepts WebM/MP4/OGG/WAV (25 MB max).
  • TTS: Provider chain β€” Google Cloud TTS (Wavenet-A) β†’ gTTS (free fallback) β†’ ElevenLabs.

LLM Router (services/llm/router.py)

LLMRouter wraps a primary provider (GroqProvider) and optional fallback (HuggingFaceProvider). Model selection: llama-3.3-70b-versatile for analysis, llama-3.1-8b-instant for chat.

Security (middleware/)

  • AuditMiddleware: Writes one JSON record per request to rotating log files (logs/audit/audit.jsonl). Marks PDPL-sensitive paths. Skips health/docs endpoints.
  • validate_upload: Magic-byte sniffing (anti-MIME-spoofing), 20 MB size limit, extension blocklist.
  • sanitize_text: Strips HTML tags, null bytes, XSS patterns, and SQL injection signatures from all user text inputs.

Rate Limiting (services/ratelimit.py)

In-memory sliding window (no external dependency). Limits: analyze=5/min, chat=30/min, search=60/min. Uses X-Forwarded-For for IP detection behind proxies.


Frontend Layer

Next.js 15 App Router, RTL Arabic, Tailwind CSS v4, Framer Motion.

Component Purpose
upload-section.tsx File picker + /api/analyze call + loading state
analysis-history.tsx Saved analyses list + semantic search + health trend chart
health-trend-chart.tsx Recharts line chart β€” tracks lab values over time with alerts
risk-dashboard.tsx Calls /api/risk + renders 6 radial gauge cards (collapsible)
chat-bot.tsx Floating chat panel β€” streaming SSE + voice input/output
voice-recorder.tsx MediaRecorder β†’ /api/voice/transcribe + TTS playback
compare-analyses.tsx Side-by-side analysis diff

Data Flow β€” Analysis Request

1. User uploads PDF/image
2. validate_upload() β€” size + MIME + magic bytes
3. AgentCoordinator.run()
   a. OCRAgent       β†’ raw_text
   b. ExtractionAgent β†’ findings[] (with impossible-value filter)
   c. ClassificationAgent β†’ panel_code
   d. MedicalReasoningAgent β†’ RAG context + LLM report
   e. SafetyAgent    β†’ filtered report
4. Response: { findings, summary, report }
5. Frontend saves to Supabase via /api/analyses/save
6. RiskDashboard calls /api/risk with findings

Key Design Decisions

  • No streaming for analysis: Analysis takes 8–15 s; streaming a partial JSON would be malformed. Response is returned in full after all agents complete.
  • RAG cache before agent: MedicalReasoningAgent checks rag_cache before calling pgvector β€” avoids redundant 300 ms embedding round-trips on identical queries.
  • Agents over monolith: Each agent is independently retryable and observable via AgentContext.logs. Failures degrade gracefully β€” OCR failure sets raw_text="", downstream agents handle empty input without crashing.
  • Rule-based risk scoring: ML .pkl models are optional overrides. The platform is useful immediately without training data.
  • In-memory rate limiter: Avoids Redis dependency for MVP. Replace with Redis-backed limiter for multi-process deployments.