SurweeshSP
/

mathtok

Model card Files Files and versions

SurweeshSP commited on 3 days ago

Commit

b09b54e

·

verified ·

1 Parent(s): f2b7e07

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -1,6 +1,4 @@
-# MathTok
-**A Hybrid Canonicalized AST-Based Tokenization Framework for Mathematical Language Modeling**
 ---
 language:
 - en
@@ -67,6 +65,10 @@ pretty_name: MathTok
 thumbnail: assets/mathtok_architecture_improvements.svg
 ---
 ## Overview
 MathTok is a research-grade tokenizer pipeline that converts raw mathematical expressions (LaTeX or ASCII) into a structured, semantically-rich token stream. Unlike standard BPE or SentencePiece tokenizers, MathTok is *structure-aware*: it builds an Abstract Syntax Tree (AST) from each expression and serializes it via DFS preorder traversal, preserving full mathematical structure.

 ---
 language:
 - en
 thumbnail: assets/mathtok_architecture_improvements.svg
 ---
+# MathTok
+**A Hybrid Canonicalized AST-Based Tokenization Framework for Mathematical Language Modeling**
 ## Overview
 MathTok is a research-grade tokenizer pipeline that converts raw mathematical expressions (LaTeX or ASCII) into a structured, semantically-rich token stream. Unlike standard BPE or SentencePiece tokenizers, MathTok is *structure-aware*: it builds an Abstract Syntax Tree (AST) from each expression and serializes it via DFS preorder traversal, preserving full mathematical structure.