--- language: - dna tags: - biology - genomics - transposable-elements - dnabert - bilstm - sequence-classification license: mit --- # TE-GER — Superfamily Classification Part of the **TE-GER** (Transposable Elements Genomic Entity Recognition) toolkit. TE-GER superfamily classification model: fine-grained TE annotation across 21 superfamilies (Gypsy, Copia, Mutator, HAT, etc.) in genomic sequences. Architecture: DNABERT-2 + BiLSTM hybrid. ## Model Architecture - **Base:** [DNABERT-2](https://huggingface.co/zhihan1996/DNABERT-2-117M) (DNA language model) - **Head:** Bidirectional LSTM + Linear Classifier - **Input:** 512 bp sliding windows over raw FASTA sequences - **Task:** Sequence classification (token-level TE annotation) ## Usage Use this model via the [TE-GER CLI](https://github.com/johanpina/te-ger): ```bash python Te_annotator.py genome.fasta output.gff3 --level superfamilies ``` ## Labels - `0`: Background - `1`: ACADEM-1 - `2`: BELPAO - `3`: CACTA - `4`: COPIA - `5`: CR1 - `6`: DIRS - `7`: ERV - `8`: GYPSY - `9`: HAT - `10`: HELITRON - `11`: I - `12`: KOLOBOK - `13`: L1 - `14`: LARD - `15`: LINE - `16`: LTR - `17`: MULE - `18`: P - `19`: PIFHARBINGER - `20`: PIGGYBAC - `21`: PLE - `22`: R1 - `23`: RTE - `24`: SINE - `25`: TC1MARINER - `26`: TIR - `27`: TRNA ## Citation Developed by Johan S. Piña — 2025