--- '[object Object]': null license: apache-2.0 language: - en pipeline_tag: token-classification tags: - RepresentationLearning - Genomics - Variant - Classiciation - Mutations - Embedding - VariantClassificaion --- # Model - GvEM (Genomic Variant Embedding Model) **GvEM** is a PyTorch-based deep learning model designed to embed and model genomic mutation data from VCF (Variant Call Format) files using a biologically-informed hierarchy: **Pathway → Chromosome → Gene → Mutations** --- ## Hierarchy of input data example_data = { 'sample1': { 'pathway1': { 'chr1': { 'gene1': [ { 'impact': 'HIGH', 'reference': 'A', 'alternate': 'T' } ] } } } } --- ## Features * **VCF Parser**: Converts standard VCF files into a hierarchical JSON-like structure. * **MutationEmbedder**: Learns embeddings for categorical mutation features (scalable). * **GeneEncoder**: Processes lists of mutations using Transformer and heirarchical attention to get gene-level representations. * **ChromosomeEncoder**: Aggregates gene encodings. * **PathwayEncoder**: Aggregates chromosome encodings to yield final sample representation. * **Scalable**: Easily extensible to new fields or biological groupings. * **HuggingFace Compatible**: Designed for sharing and experimentation on the 🤗 Hub. --- ## Uses # Direct Use : * Obtain sample level embeddings * Mutation pattern learning * Transfer learning across genomic datasets # Downstream Use : * Variant-based disease prediction (e.g., cancer, rare diseases, ASD) * Multi-omics fusion models (tabular + image + VCF) * Cohort level mutation analysis * Fine-tuning for prognosis, drug response prediction, or variant effect interpretation. # Limitations * Use in clinical decision-making without expert oversight. * Input variants must already be annotated. * Application to non-human genomes, unless explicitly fine-tuned for those organisms. * High-resolution functional variant prediction - FUTURE DEVELOPMENT TO BE MADE --- ## MODEL STILL UNDER DEVELOPMENT