mjbommar commited on
Commit
387487e
Β·
verified Β·
1 Parent(s): b4f00df

README: badge lines as bullets so each renders on its own line

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -45,11 +45,11 @@ model-index:
45
 
46
  A 37.76M-backbone-parameter BERT-style encoder for fine-grained file-content-type detection from binary data. Takes any 4 KB byte buffer (regardless of source offset) and produces a 512-dimensional embedding that classifiers map to one of [libmagic](https://github.com/file/file)'s 125 MIME labels. Designed for inputs where you only have a chunk: a forensic-carved fragment, a random disk-block read, a streaming HTTP upload, a single network packet payload.
47
 
48
- **πŸ”— Model**: [`mjbommar/mimelens-001-medium-bpe-16k-s1`](https://huggingface.co/mjbommar/mimelens-001-medium-bpe-16k-s1)
49
- **πŸ‘₯ Family**: [`mjbommar/mimelens-001`](https://huggingface.co/mjbommar/mimelens-001) (28 pretrained cells; family hub forthcoming)
50
- **πŸ”€ Tokenizer**: [`mjbommar/binary-tokenizer-001-16k`](https://huggingface.co/mjbommar/binary-tokenizer-001-16k)
51
- **πŸ“„ Paper**: *MimeLens: Pretrained Encoders for Fine-Grained Content-Type Detection* (Bommarito 2026). [GitHub](https://github.com/mjbommar/binary-embedding-paper) (source release forthcoming)
52
- **πŸ“Š Pretraining corpus**: [`mjbommar/binary-30k-tokenized`](https://huggingface.co/datasets/mjbommar/binary-30k-tokenized) plus magic-frags, glaurung, Windows drivers (33 GB stratified)
53
 
54
  ---
55
 
 
45
 
46
  A 37.76M-backbone-parameter BERT-style encoder for fine-grained file-content-type detection from binary data. Takes any 4 KB byte buffer (regardless of source offset) and produces a 512-dimensional embedding that classifiers map to one of [libmagic](https://github.com/file/file)'s 125 MIME labels. Designed for inputs where you only have a chunk: a forensic-carved fragment, a random disk-block read, a streaming HTTP upload, a single network packet payload.
47
 
48
+ - **πŸ”— Model**: [`mjbommar/mimelens-001-medium-bpe-16k-s1`](https://huggingface.co/mjbommar/mimelens-001-medium-bpe-16k-s1)
49
+ - **πŸ‘₯ Family**: [`mjbommar/mimelens-001`](https://huggingface.co/mjbommar/mimelens-001) (28 pretrained cells; family hub forthcoming)
50
+ - **πŸ”€ Tokenizer**: [`mjbommar/binary-tokenizer-001-16k`](https://huggingface.co/mjbommar/binary-tokenizer-001-16k)
51
+ - **πŸ“„ Paper**: *MimeLens: Pretrained Encoders for Fine-Grained Content-Type Detection* (Bommarito 2026). [GitHub](https://github.com/mjbommar/binary-embedding-paper) (source release forthcoming)
52
+ - **πŸ“Š Pretraining corpus**: [`mjbommar/binary-30k-tokenized`](https://huggingface.co/datasets/mjbommar/binary-30k-tokenized) plus magic-frags, glaurung, Windows drivers (33 GB stratified)
53
 
54
  ---
55