Spaces:

Noi97
/

tebyan-medical-backend

Running

App Files Files Community

tebyan-medical-backend / ARCHITECTURE.md

رغد

feat: complete platform — auth, deployment, hardening

344e369 1 day ago

preview code

raw

history blame

5.92 kB

تبيان الطبي — Architecture

System Overview

Tebyan Medical is a production-grade Arabic medical report analysis platform. Users upload lab reports (PDF or image), the system extracts findings, generates a clinical interpretation in Arabic, and provides an interactive voice-enabled chat assistant.

Frontend (Next.js 15)          Backend (FastAPI)              External Services
─────────────────────          ─────────────────              ─────────────────
Upload → /api/analyze ────────► AgentCoordinator              Groq (LLM/STT)
Chat   → /api/chat   ────────► RAG + LLM streaming            Supabase (pgvector)
Voice  → /api/voice  ────────► WhisperSTT / TTS               Cohere (rerank)
Risk   → /api/risk   ────────► RiskEngine                     Google Vision/TTS

Backend Layer

Entry Point

backend/main.py — FastAPI application. Registers all routes, mounts middleware, and wires dependency-injected singletons via @lru_cache.

Multi-Agent Pipeline (`services/agents/`)

Replaces the flat services/agent/pipeline.py (kept for backward compat) with a structured agent graph:

AgentCoordinator
├── OCRAgent           — PDF (fitz + EasyOCR) + Image (Google Vision fallback)
├── ExtractionAgent    — regex parse → LLM fallback → physiological bounds filter
├── ClassificationAgent — panel detection (CBC, Thyroid, Liver, Kidney, Lipid, Diabetes)
├── MedicalReasoningAgent — RAG retrieval + panel-specific LLM prompt
└── SafetyAgent        — PDPL/NDMO compliance filter + emergency detection

Each agent extends AgentBase which provides retry-with-backoff and structured logging via AgentContext.

RAG Stack (`services/rag/`, `services/search/`)

Embedding: intfloat/multilingual-e5-large (1024-dim, via HuggingFace)
Vector store: Supabase pgvector (match_documents RPC, 2834+ medical chunks)
Query expansion: Groq LLM generates 3 alternate queries; query_parser.py adds Arabic synonym expansions
Retrieval: BM25 fallback + Cohere rerank-v3.5 cross-encoder
Cache: 5-minute TTL in-memory LRU (services/cache.py) prevents redundant embedding calls

Risk Engine (`services/risk/`)

Evidence-based clinical threshold scoring for 6 conditions. FeatureExtractor normalises findings into a 35-feature vector; each scorer (_score_diabetes, _score_cardiovascular, etc.) applies WHO/ADA/ACC clinical cutoffs. ML .pkl model files in services/risk/models/ override rule-based scores when present.

Voice (`services/voice/`)

STT: WhisperSTT wraps Groq whisper-large-v3. Accepts WebM/MP4/OGG/WAV (25 MB max).
TTS: Provider chain — Google Cloud TTS (Wavenet-A) → gTTS (free fallback) → ElevenLabs.

LLM Router (`services/llm/router.py`)

LLMRouter wraps a primary provider (GroqProvider) and optional fallback (HuggingFaceProvider). Model selection: llama-3.3-70b-versatile for analysis, llama-3.1-8b-instant for chat.

Security (`middleware/`)

AuditMiddleware: Writes one JSON record per request to rotating log files (logs/audit/audit.jsonl). Marks PDPL-sensitive paths. Skips health/docs endpoints.
validate_upload: Magic-byte sniffing (anti-MIME-spoofing), 20 MB size limit, extension blocklist.
sanitize_text: Strips HTML tags, null bytes, XSS patterns, and SQL injection signatures from all user text inputs.

Rate Limiting (`services/ratelimit.py`)

In-memory sliding window (no external dependency). Limits: analyze=5/min, chat=30/min, search=60/min. Uses X-Forwarded-For for IP detection behind proxies.

Frontend Layer

Next.js 15 App Router, RTL Arabic, Tailwind CSS v4, Framer Motion.

Component	Purpose
`upload-section.tsx`	File picker + `/api/analyze` call + loading state
`analysis-history.tsx`	Saved analyses list + semantic search + health trend chart
`health-trend-chart.tsx`	Recharts line chart — tracks lab values over time with alerts
`risk-dashboard.tsx`	Calls `/api/risk` + renders 6 radial gauge cards (collapsible)
`chat-bot.tsx`	Floating chat panel — streaming SSE + voice input/output
`voice-recorder.tsx`	MediaRecorder → `/api/voice/transcribe` + TTS playback
`compare-analyses.tsx`	Side-by-side analysis diff

Data Flow — Analysis Request

1. User uploads PDF/image
2. validate_upload() — size + MIME + magic bytes
3. AgentCoordinator.run()
   a. OCRAgent       → raw_text
   b. ExtractionAgent → findings[] (with impossible-value filter)
   c. ClassificationAgent → panel_code
   d. MedicalReasoningAgent → RAG context + LLM report
   e. SafetyAgent    → filtered report
4. Response: { findings, summary, report }
5. Frontend saves to Supabase via /api/analyses/save
6. RiskDashboard calls /api/risk with findings

Key Design Decisions

No streaming for analysis: Analysis takes 8–15 s; streaming a partial JSON would be malformed. Response is returned in full after all agents complete.
RAG cache before agent: MedicalReasoningAgent checks rag_cache before calling pgvector — avoids redundant 300 ms embedding round-trips on identical queries.
Agents over monolith: Each agent is independently retryable and observable via AgentContext.logs. Failures degrade gracefully — OCR failure sets raw_text="", downstream agents handle empty input without crashing.
Rule-based risk scoring: ML .pkl models are optional overrides. The platform is useful immediately without training data.
In-memory rate limiter: Avoids Redis dependency for MVP. Replace with Redis-backed limiter for multi-process deployments.

تبيان الطبي — Architecture

System Overview

Backend Layer

Entry Point

Multi-Agent Pipeline (services/agents/)

RAG Stack (services/rag/, services/search/)

Risk Engine (services/risk/)

Voice (services/voice/)

LLM Router (services/llm/router.py)

Security (middleware/)

Rate Limiting (services/ratelimit.py)