{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Explore Biomedical QA in OpenBioMed\n", "\n", "BioMedGPT-R1-17B is composed of a reasoning language model (BioMedGPT-R1-LM-14B), a molecule encoder (GraphMVP), a protein encoder (ESM2-3B) and two modality adaptors. Approximately 70GB GPU Memory is required to load BioMedGPT-R1-17B (40GB GPU with fp16).\n", "\n", "Feel free to [download](https://huggingface.co/PharMolix/BioMedGPT-R1) our trained models, put them under checkpoints, and explore the biomedical QA!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Change working directory\n", "import os\n", "import sys\n", "parent = os.path.dirname(os.path.abspath(''))\n", "print(parent)\n", "sys.path.append(parent)\n", "os.chdir(parent)\n", "\n", "import logging\n", "logging.basicConfig(level=logging.ERROR)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In OpenBioMed, we provide an unified interface for multimodal biomedical chatbot. To build an agent, you just need to configure the path to the model checkpoint and which device to deploy the model." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n" ] } ], "source": [ "from open_biomed.models.foundation_models.biomedgpt import BioMedGPTR14Chat\n", "from open_biomed.data import Molecule, Protein, Text\n", "\n", "# Initialize chatbot\n", "agent = BioMedGPTR14Chat.from_pretrained(\"./checkpoints/biomedgpt-r1\", \"cuda:7\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we provide examples on molecule QA and protein QA. \n", "\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Human: Please describe this molecule.\n", "Assistant: The molecule is a monoterpenoid indole alkaloid with formula C21H24N2O4, originally isolated from Tabernaemontana corymbosa. It has a role as a plant metabolite. It is a delta-lactam, a monoterpenoid indole alkaloid and an organic heterohexacyclic compound.\n" ] } ], "source": [ "# Molecule QA\n", "molecule = \"CN1CC[C@]23[C@@H]4[C@H]5[C@@H](CC2=O)C(=CCO[C@H]5CC(=O)N4C6=C3C=C(C=C6)O)C1\"\n", "question = \"Please describe this molecule.\"\n", "\n", "# Initialize a molecule via SMILES string and append it to context\n", "# Then ask a question about it\n", "agent.reset()\n", "molecule = Molecule.from_smiles(molecule)\n", "agent.append_molecule(molecule)\n", "answer = agent.chat(Text.from_str(question))\n", "\n", "print(\"Human: \", question)\n", "print (\"Assistant: \", answer)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Human: There is a mutation from G to A at position 14. What is the mutational effects?\n", "Assistant: Okay, I'm trying to figure out the mutational effects. Hmm, that looks a bit complicated, but I can try to break it down.\n", "\n", "First, I'm thinking about the function of the provided protein. Transports viral genome to neighboring plant cells via plasmodesmata, resulting in systemic infection. The movement protein oligomerizes plasmodesmata and enlarges their pore size (Probable). \n", "\n", "Then there's a mutation from G to A at position 14. What is the mutational effects? Wait. I know it.\n", "\n", "\n", "Complete loss of movement.\n" ] } ], "source": [ "# Protein QA\n", "protein = \"MDCVLRSYLLLAFGFLICLFLFCLVVFIWFVYKQILFRTTAQSNEARHNHSTVV\"\n", "question = \"There is a mutation from G to A at position 14. What is the mutational effects?\"\n", "\n", "# Initialize a protein with an amino acid sequence and append it to context\n", "# Then ask a question about it\n", "agent.reset()\n", "protein = Protein.from_fasta(protein)\n", "agent.append_protein(protein)\n", "answer = agent.chat(Text.from_str(question))\n", "\n", "print(\"Human: \", question)\n", "print (\"Assistant: \", answer)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course, you can also have some regular text-only chat." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Human: Give me some brief advice for future R&D on EGFR-targeted drugs.\n", "Assistant: Okay, so I need to come up with some brief advice for future R&D on EGFR-targeted drugs. Hmm, where do I start? I remember that EGFR stands for epidermal growth factor receptor, and it's involved in cell proliferation and survival. Mutations in the EGFR gene can lead to cancer, so targeting EGFR could be a way to treat various cancers.\n", "\n", "The user mentioned resistance mechanisms like T790M mutation and C797T mutation. I think the T790M mutation occurs in the exon 20 region of the EGFR gene and causes resistance to first-generation EGFR tyrosine kinase inhibitors (TKIs) like gefitinib and erlotinib. The C797T mutation occurs in the exon 19 region of the EGFR gene and causes resistance to third-generation EGFR TKIs like osimertinib. \n", "\n", "So, what can be done about these resistance mechanisms? One idea is to combine EGFR TKIs with other targeted therapies to overcome resistance. For example, combining EGFR TKIs with MEK inhibitors or PI3K inhibitors might help. Another idea is to develop next-generation EGFR TKIs that can overcome both the T790M and C797T mutations. Additionally, using biomarkers to identify patients who are more likely to respond to EGFR TKIs could improve treatment outcomes. Also, developing EGFR-targeted vaccines or immunotherapies could be an alternative approach to treating EGFR-positive cancers. \n", "\n", "Wait, am I missing anything? Oh right, there's also the possibility of combining EGFR TKIs with anti-PD-1/PD-L1 checkpoint inhibitors. That could potentially enhance the efficacy of EGFR TKIs by overcoming immune suppression in the tumor microenvironment. \n", "\n", "I'm not entirely sure about all the details, but I think I've covered the main points. I'll try to put it all together in a clear and concise manner.\n", "\n", "\n", "To address resistance mechanisms and improve the efficacy of EGFR TKIs, consider the following strategies:\n", "\n", "1. **Combination Therapy**: Combine EGFR TKIs with MEK inhibitors, PI3K inhibitors, or anti-PD-1/PD-L1 checkpoint inhibitors to overcome resistance caused by the T790M or C797T mutation.\n", "2. **Next-Generation EGFR TKIs**: Develop next-generation EGFR TKIs that can overcome both the T790M and C797T mutations.\n", "3. **Biomarker Use**: Utilize biomarkers such as the T790M mutation to identify patients who are more likely to respond to EGFR TKIs.\n", "4. **EGFR-Targeted Vaccines or Immunotherapies**: Explore the development of EGFR-targeted vaccines or immunotherapies as an alternative approach to treating EGFR-positive cancers.\n" ] } ], "source": [ "# Text-only Chat\n", "question = \"Give me some brief advice for future R&D on EGFR-targeted drugs.\"\n", "\n", "agent.reset()\n", "answer = agent.chat(Text.from_str(question))\n", "\n", "print(\"Human: \", question)\n", "print (\"Assistant: \", answer)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also provide an intermediate version of BioMedGPT-R1, [BioMedGPT-R1-DS-Qwen](https://huggingface.co/PharMolix/BioMedGPT-R1-DS-Qwen), which is only finetuned for cross-modal alignment. Here is an example on multimodal reasoning." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n" ] } ], "source": [ "from open_biomed.models.foundation_models.biomedgpt import BioMedGPTR14Chat\n", "from open_biomed.data import Molecule, Protein, Text\n", "\n", "# Initialize chatbot\n", "agent = BioMedGPTR14Chat.from_pretrained(\"./checkpoints/biomedgpt-r1-ds-qwen\", \"cuda:7\")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Human: Please describe this molecule and figure out potential applications in pharmaceuticals.\n", "Assistant: Okay, so I'm trying to figure out what this molecule is and how it might be useful in pharmaceuticals. Let me start by looking at the SMILES notation provided: CN1C(=O)CN=C(C2=C1C=CC(=C2)Cl)C3=CC=CC=C3. Hmm, that looks a bit complicated, but I think I can break it down.\n", "\n", "First, I notice that there's a ring structure involved because of the numbers in the SMILES string. The \"C1\" and \"C2\" suggest that there are two rings connected somehow. The part \"CN1C(=O)CN=C(C2=C1C=CC(=C2)Cl)\" seems to describe a bicyclic structure with a ketone group (C(=O)) and a chlorine substituent (Cl). The other part, \"C3=CC=CC=C3\", looks like a benzene ring attached to the rest of the molecule.\n", "\n", "So, putting it all together, this molecule has a bicyclic core with a ketone and a chlorine atom, and it's connected to a benzene ring. I wonder what kind of properties this structure might have. Maybe it's a type of heterocyclic compound, which are often found in pharmaceuticals because they can have various biological activities.\n", "\n", "I should check if this molecule is similar to any known drugs or compounds. Let me think... The presence of a benzene ring and a bicyclic structure with a ketone makes me think of some antidepressants or antipsychotics, which often have complex ring systems. For example, drugs like sertraline or olanzapine have multiple rings and functional groups that interact with neurotransmitters.\n", "\n", "Another thing to consider is the substituents. The chlorine atom could be important for the molecule's pharmacokinetics, like how it's absorbed or metabolized in the body. Chlorine is a common substituent in pharmaceuticals because it can increase the lipophilicity of a molecule, which affects how well it crosses cell membranes.\n", "\n", "I also notice that the molecule has an amine group (CN) and a ketone. Amines are common in drugs because they can form hydrogen bonds, which is important for binding to target proteins. The ketone might be involved in enzyme interactions or could serve as a site for further chemical modifications.\n", "\n", "Now, thinking about potential applications, this molecule could be a candidate for treating conditions like depression or schizophrenia, given the structural similarities to existing medications. It might act as a selective serotonin reuptake inhibitor (SSRI) or a dopamine antagonist. Alternatively, it could have other therapeutic uses depending on how it interacts with biological systems.\n", "\n", "I should also consider the molecule's solubility and stability. The combination of a bicyclic structure and a benzene ring might make it somewhat hydrophobic, which could affect its bioavailability. Maybe there are ways to modify the structure to improve these properties, such as adding hydrophilic groups or changing the substituents.\n", "\n", "Testing this molecule in vitro would be the next step. I'd want to see how it interacts with target receptors or enzymes. For example, if it's an SSRI, it should inhibit the reuptake of serotonin in a neuronal cell model. If it's a dopamine antagonist, it might block dopamine receptors in a binding assay.\n", "\n", "In vivo studies would follow, where the molecule is tested in animal models to observe its effects on behavior and to assess toxicity. This would help determine the appropriate dosage and identify any adverse effects.\n", "\n", "Overall, while I don't have all the answers yet, breaking down the structure and comparing it to known drugs gives me a starting point for understanding its potential in pharmaceuticals. Further research and testing would be needed to fully evaluate its therapeutic applications.\n", "\n", "\n", "The molecule described by the SMILES string `CN1C(=O)CN=C(C2=C1C=CC(=C2)Cl)C3=CC=CC=C3` is a bicyclic compound with a ketone group and a chlorine substituent, connected to a benzene ring. This structure suggests it could be a candidate for pharmaceutical applications, potentially as an antidepressant or antipsychotic due to structural similarities with existing drugs. The presence of a chlorine atom may influence pharmacokinetics, while the amine and ketone groups could play roles in biological interactions. Further research, including in vitro and in vivo testing, would be necessary to evaluate its therapeutic potential and determine appropriate dosages and safety profiles.\n" ] } ], "source": [ "# Multimodal Reasoning\n", "molecule = \"CN1C(=O)CN=C(C2=C1C=CC(=C2)Cl)C3=CC=CC=C3\"\n", "question = \"Please describe this molecule and figure out potential applications in pharmaceuticals.\"\n", "\n", "# Initialize a molecule via SMILES string and append it to context\n", "# Then ask a question about it\n", "agent.reset()\n", "molecule = Molecule.from_smiles(molecule)\n", "agent.append_molecule(molecule)\n", "answer = agent.chat(Text.from_str(question))\n", "\n", "print(\"Human: \", question)\n", "print (\"Assistant: \", answer)" ] } ], "metadata": { "kernelspec": { "display_name": "biomed", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.21" } }, "nbformat": 4, "nbformat_minor": 2 }