Spaces:

Noi97
/

tebyan-medical-backend

Running

App Files Files Community

tebyan-medical-backend / ARCHITECTURE.md

رغد

feat: complete platform — auth, deployment, hardening

344e369 1 day ago

preview code

raw

history blame

5.92 kB

	# تبيان الطبي — Architecture

	## System Overview

	Tebyan Medical is a production-grade Arabic medical report analysis platform. Users upload lab reports (PDF or image), the system extracts findings, generates a clinical interpretation in Arabic, and provides an interactive voice-enabled chat assistant.

	```
	Frontend (Next.js 15) Backend (FastAPI) External Services
	───────────────────── ───────────────── ─────────────────
	Upload → /api/analyze ────────► AgentCoordinator Groq (LLM/STT)
	Chat → /api/chat ────────► RAG + LLM streaming Supabase (pgvector)
	Voice → /api/voice ────────► WhisperSTT / TTS Cohere (rerank)
	Risk → /api/risk ────────► RiskEngine Google Vision/TTS
	```

	---

	## Backend Layer

	### Entry Point

	`backend/main.py` — FastAPI application. Registers all routes, mounts middleware, and wires dependency-injected singletons via `@lru_cache`.

	### Multi-Agent Pipeline (`services/agents/`)

	Replaces the flat `services/agent/pipeline.py` (kept for backward compat) with a structured agent graph:

	```
	AgentCoordinator
	├── OCRAgent — PDF (fitz + EasyOCR) + Image (Google Vision fallback)
	├── ExtractionAgent — regex parse → LLM fallback → physiological bounds filter
	├── ClassificationAgent — panel detection (CBC, Thyroid, Liver, Kidney, Lipid, Diabetes)
	├── MedicalReasoningAgent — RAG retrieval + panel-specific LLM prompt
	└── SafetyAgent — PDPL/NDMO compliance filter + emergency detection
	```

	Each agent extends `AgentBase` which provides retry-with-backoff and structured logging via `AgentContext`.

	### RAG Stack (`services/rag/`, `services/search/`)

	- Embedding: `intfloat/multilingual-e5-large` (1024-dim, via HuggingFace)
	- Vector store: Supabase pgvector (`match_documents` RPC, 2834+ medical chunks)
	- Query expansion: Groq LLM generates 3 alternate queries; `query_parser.py` adds Arabic synonym expansions
	- Retrieval: BM25 fallback + Cohere `rerank-v3.5` cross-encoder
	- Cache: 5-minute TTL in-memory LRU (`services/cache.py`) prevents redundant embedding calls

	### Risk Engine (`services/risk/`)

	Evidence-based clinical threshold scoring for 6 conditions. `FeatureExtractor` normalises findings into a 35-feature vector; each scorer (`_score_diabetes`, `_score_cardiovascular`, etc.) applies WHO/ADA/ACC clinical cutoffs. ML `.pkl` model files in `services/risk/models/` override rule-based scores when present.

	### Voice (`services/voice/`)

	- STT: `WhisperSTT` wraps Groq `whisper-large-v3`. Accepts WebM/MP4/OGG/WAV (25 MB max).
	- TTS: Provider chain — Google Cloud TTS (Wavenet-A) → gTTS (free fallback) → ElevenLabs.

	### LLM Router (`services/llm/router.py`)

	`LLMRouter` wraps a primary provider (`GroqProvider`) and optional fallback (`HuggingFaceProvider`). Model selection: `llama-3.3-70b-versatile` for analysis, `llama-3.1-8b-instant` for chat.

	### Security (`middleware/`)

	- `AuditMiddleware`: Writes one JSON record per request to rotating log files (`logs/audit/audit.jsonl`). Marks PDPL-sensitive paths. Skips health/docs endpoints.
	- `validate_upload`: Magic-byte sniffing (anti-MIME-spoofing), 20 MB size limit, extension blocklist.
	- `sanitize_text`: Strips HTML tags, null bytes, XSS patterns, and SQL injection signatures from all user text inputs.

	### Rate Limiting (`services/ratelimit.py`)

	In-memory sliding window (no external dependency). Limits: analyze=5/min, chat=30/min, search=60/min. Uses `X-Forwarded-For` for IP detection behind proxies.

	---

	## Frontend Layer

	Next.js 15 App Router, RTL Arabic, Tailwind CSS v4, Framer Motion.

	\| Component \| Purpose \|
	\|---\|---\|
	\| `upload-section.tsx` \| File picker + `/api/analyze` call + loading state \|
	\| `analysis-history.tsx` \| Saved analyses list + semantic search + health trend chart \|
	\| `health-trend-chart.tsx` \| Recharts line chart — tracks lab values over time with alerts \|
	\| `risk-dashboard.tsx` \| Calls `/api/risk` + renders 6 radial gauge cards (collapsible) \|
	\| `chat-bot.tsx` \| Floating chat panel — streaming SSE + voice input/output \|
	\| `voice-recorder.tsx` \| MediaRecorder → `/api/voice/transcribe` + TTS playback \|
	\| `compare-analyses.tsx` \| Side-by-side analysis diff \|

	---

	## Data Flow — Analysis Request

	```
	1. User uploads PDF/image
	2. validate_upload() — size + MIME + magic bytes
	3. AgentCoordinator.run()
	a. OCRAgent → raw_text
	b. ExtractionAgent → findings[] (with impossible-value filter)
	c. ClassificationAgent → panel_code
	d. MedicalReasoningAgent → RAG context + LLM report
	e. SafetyAgent → filtered report
	4. Response: { findings, summary, report }
	5. Frontend saves to Supabase via /api/analyses/save
	6. RiskDashboard calls /api/risk with findings
	```

	---

	## Key Design Decisions

	- No streaming for analysis: Analysis takes 8–15 s; streaming a partial JSON would be malformed. Response is returned in full after all agents complete.
	- RAG cache before agent: MedicalReasoningAgent checks `rag_cache` before calling pgvector — avoids redundant 300 ms embedding round-trips on identical queries.
	- Agents over monolith: Each agent is independently retryable and observable via `AgentContext.logs`. Failures degrade gracefully — OCR failure sets `raw_text=""`, downstream agents handle empty input without crashing.
	- Rule-based risk scoring: ML `.pkl` models are optional overrides. The platform is useful immediately without training data.
	- In-memory rate limiter: Avoids Redis dependency for MVP. Replace with Redis-backed limiter for multi-process deployments.