SurweeshSP commited on
Commit
b09b54e
·
verified ·
1 Parent(s): f2b7e07

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -1,6 +1,4 @@
1
- # MathTok
2
 
3
- **A Hybrid Canonicalized AST-Based Tokenization Framework for Mathematical Language Modeling**
4
  ---
5
  language:
6
  - en
@@ -67,6 +65,10 @@ pretty_name: MathTok
67
  thumbnail: assets/mathtok_architecture_improvements.svg
68
  ---
69
 
 
 
 
 
70
  ## Overview
71
 
72
  MathTok is a research-grade tokenizer pipeline that converts raw mathematical expressions (LaTeX or ASCII) into a structured, semantically-rich token stream. Unlike standard BPE or SentencePiece tokenizers, MathTok is *structure-aware*: it builds an Abstract Syntax Tree (AST) from each expression and serializes it via DFS preorder traversal, preserving full mathematical structure.
 
 
1
 
 
2
  ---
3
  language:
4
  - en
 
65
  thumbnail: assets/mathtok_architecture_improvements.svg
66
  ---
67
 
68
+ # MathTok
69
+
70
+ **A Hybrid Canonicalized AST-Based Tokenization Framework for Mathematical Language Modeling**
71
+
72
  ## Overview
73
 
74
  MathTok is a research-grade tokenizer pipeline that converts raw mathematical expressions (LaTeX or ASCII) into a structured, semantically-rich token stream. Unlike standard BPE or SentencePiece tokenizers, MathTok is *structure-aware*: it builds an Abstract Syntax Tree (AST) from each expression and serializes it via DFS preorder traversal, preserving full mathematical structure.