# BioFlow > **Multimodal Biological Design & Discovery Intelligence Engine** > A low-code workflow platform for unified biological discovery pipelines ![Python](https://img.shields.io/badge/Python-3.10-blue) ![Next.js](https://img.shields.io/badge/Next.js-16-black) ![Qdrant](https://img.shields.io/badge/Qdrant-Vector_DB-red) ![CUDA](https://img.shields.io/badge/CUDA-11.8-green) ![Team](https://img.shields.io/badge/Team-Lacoste-purple) --- ## Problem Statement Biological R&D knowledge is fragmented across disconnected silos: - **Textual literature** (papers, lab notes) - **3D structural data** (PDB files) - **Chemical sequences** (SMILES) Researchers must manually navigate incompatible formats, creating bottlenecks and "blind spots" where critical connections are missed. ## Our Solution **BioFlow** is a visual workflow engine that unifies biological discovery pipelines. Rather than a single "black box" model, we function as an **intelligent platform** — allowing researchers to chain state-of-the-art open-source biological models into coherent discovery workflows. ### Key Features | Feature | Description | |---------|-------------| | **Visual Pipeline Builder** | Drag-and-drop node editor for constructing discovery workflows | | **DeepPurpose Integration** | Drug-Target Interaction prediction with Morgan + CNN encoding | | **Molecule & Protein Visualization** | Interactive 2D SMILES and 3D PDB structure viewing (powered by 3Dmol.js and SmilesDrawer) | | **Qdrant Vector Search** | High-dimensional similarity search across 23,531+ compounds | | **3D Embedding Explorer** | Real PCA projections of drug-target chemical space | | **Validator Agents** | Automated toxicity and novelty checking | --- ## Architecture ``` ┌──────────────────────────────────────────┐ │ BioFlow │ │ Visual Pipeline Builder (UI) │ └─────────────────┬────────────────────────┘ │ ┌─────────────────────────────────┼─────────────────────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Data Input │ │ DeepPurpose │ │ OpenBioMed │ │ SMILES/Protein │────────────▶│ DTI Model │────────────▶│ Multimodal │ │ Sequences │ │ Morgan + CNN │ │ Embeddings │ └─────────────────┘ └────────┬────────┘ └────────┬────────┘ │ │ └───────────────┬───────────────┘ │ ▼ ┌─────────────────┐ │ Qdrant │ │ Vector DB │ │ HNSW Indexing │ │ 23,531 vectors │ └────────┬────────┘ │ ┌─────────────────────────────┼─────────────────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Similarity │ │ Validator │ │ Results │ │ Search Agent │ │ Agent │ │ Output │ │ Top-K Retrieval │ │ Toxicity/Novelty│ │ Candidates │ └─────────────────┘ └─────────────────┘ └─────────────────┘ ``` --- ## Model Performance | Dataset | Concordance Index | Pearson | MSE | |---------|-------------------|---------|-----| | **KIBA** | 0.7003 | 0.5219 | 0.0008 | | **BindingDB_Kd** | 0.8083 | 0.7679 | 0.6668 | | **DAVIS** | 0.7914 | 0.5446 | 0.4684 | --- ## Quick Start ### Prerequisites - Python 3.10+ - Node.js 18+ - Docker Desktop - CUDA 11.8 (optional, for GPU acceleration) ### 1. Clone & Setup ```bash git clone https://github.com/hamzasammoud11-dotcom/lacoste001.git cd lacoste001 # Python environment python -m venv .venv .venv\Scripts\activate # Windows pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 pip install DeepPurpose qdrant-client fastapi uvicorn scikit-learn ``` ### 2. Start Qdrant Vector Database ```bash docker run -d --name qdrant -p 6333:6333 -p 6334:6334 qdrant/qdrant:latest ``` ### 3. Ingest Data (One-time) ```bash python ingest_qdrant.py # Loads KIBA dataset → DeepPurpose embeddings → Qdrant # ~23,531 drug-target pairs indexed ``` ### 4. Start Backend API ```bash python -m uvicorn bioflow.api.server:app --host 0.0.0.0 --port 8001 ``` ### 5. Start Frontend ```bash cd ui pnpm install pnpm dev # Open http://localhost:3000 ``` ### 6. Start Langflow (Visual Workflow Builder) ```bash # You can use the provided script ./run_langflow.bat # Or manually: pip install langflow langflow run --host 0.0.0.0 --port 7860 # Access via http://localhost:3000/workflow (embedded) # Or directly at http://localhost:7860 ``` --- ## Visual Workflow Builder (Langflow Integration) BioFlow integrates **Langflow** as the visual workflow engine, providing a full-screen drag-and-drop pipeline builder accessible from `/workflow`. ### Building a DTI Pipeline in Langflow 1. **Import the Template Flow**: - Open Langflow (`/workflow` or `localhost:7860`) - Click "New Project" → "Import" - Load `langflow/bioflow_dti_pipeline.json` 2. **Configure the Pipeline**: - **Drug Input**: Enter SMILES string (e.g., `CC(=O)Nc1ccc(O)cc1`) - **Target Input**: Enter protein sequence - **API Nodes**: Point to `http://localhost:8001/api/*` 3. **Run the Flow**: - Click "Run" to execute DeepPurpose encoding → Qdrant search → Results --- ## Project Structure ``` ├── config.py # Shared configuration ├── ingest_qdrant.py # ETL: TDC → DeepPurpose → Qdrant ├── deeppurpose002.py # Model training script ├── bioflow/ │ └── api/ │ └── server.py # FastAPI backend ├── runs/ │ └── 20260125_104915_KIBA/ │ ├── model.pt # Trained model weights │ └── config.pkl # Model configuration ├── ui/ │ ├── app/ │ │ ├── workflow/ # Visual Pipeline Builder │ │ ├── explorer/ # 3D Embedding Visualization │ │ ├── discovery/ # Drug Discovery Interface │ │ └── data/ # Data Browser │ └── components/ └── data/ └── kiba.tab # Cached TDC dataset ``` --- ## API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/health` | GET | Service health + model metrics | | `/api/points` | GET | Get 3D PCA points for visualization | | `/api/search` | POST | Similarity search by SMILES/sequence | ### Example: Search Similar Compounds ```bash curl -X POST "http://localhost:8001/api/search" \ -H "Content-Type: application/json" \ -d '{"smiles": "CC(=O)Nc1ccc(O)cc1", "top_k": 10}' ``` --- ## Qdrant Integration Strategy ### 1. Multimodal Bridge Using OpenBioMed for joint embeddings across proteins, molecules, and text — enabling **cross-modal retrieval**. ### 2. Dynamic Workflow Memory Pipeline nodes store intermediate results in Qdrant collections, enabling agent-to-agent communication. ### 3. High-Dimensional Scalability HNSW indexing handles bio-embeddings at scale, keeping similarity searches interactive and real-time. ## Resources - [DeepPurpose](https://github.com/kexinhuang12345/DeepPurpose) — DTI Prediction Toolkit - [OpenBioMed](https://github.com/PharMolix/OpenBioMed) — Multimodal AI Framework - [Qdrant](https://qdrant.tech/) — Vector Database - [TDC](https://tdcommons.ai/) — Therapeutics Data Commons --- ## License MIT License - See [LICENSE](LICENSE) for details.