readme

51194fd unverified 23 days ago

7.97 kB

license: apache-2.0
language: multilingual
library_name: transformers.js
pipeline_tag: text-classification
base_model: huawei-noah/TinyBERT_General_4L_312D
tags:
  - autofill
  - field-classification
  - bert
  - tinybert
  - onnx
  - transformers.js
  - browser

TinyBERT Address Autofill

A compact field-type classifier for HTML form autofill developed by the Credentials Management Team on Firefox. Given a string describing a single form field's attributes, it predicts one of 66 autofill field types (given-name, family-name, email, postal-code, address-line1, cc-number, etc.) or other when the field should not be filled.

The model is fine-tuned from huawei-noah/TinyBERT_General_4L_312D on a corpus of manually annotated shopping and address forms collected by Mozilla, and is intended to run client-side inside Firefox (or any Transformers.js host) as a replacement or augmentation for the existing regex-based heuristic field detector.

ONNX variants

All variants live under onnx/ and are loadable through Transformers.js by passing the corresponding dtype argument.

File	Precision	Size	Transformers.js `dtype`
`onnx/model.onnx`	fp32	57.6 MB	`fp32`
`onnx/model_fp16.onnx`	fp16	28.9 MB	`fp16`
`onnx/model_quantized.onnx`	int8 dynamic (default)	14.6 MB	`q8`
`onnx/model_int8.onnx`	int8 dynamic	14.6 MB	`int8`
`onnx/model_uint8.onnx`	uint8 dynamic	14.6 MB	`uint8`
`onnx/model_q4.onnx`	4-bit weight-only on MatMul	42.3 MB	`q4`
`onnx/model_q4f16.onnx`	4-bit on top of fp16	22.4 MB	`q4f16`
`onnx/model_bnb4.onnx`	bitsandbytes NF4	41.9 MB	`bnb4`

How to use

Transformers.js (browser)

import { pipeline } from "@huggingface/transformers";

const classifier = await pipeline(
  "text-classification",
  "vazish/tinybert-address-autofill",
  { dtype: "q8" }   // try "fp16" for highest fidelity, "q4f16" for smallest
);

const out = await classifier(
  "a-c-postal-code billing zip code dwfrm billing address fields postal code"
);
// → [{ label: "postal-code", score: 0.99 }]

Python (Optimum + ONNX Runtime)

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline

model = ORTModelForSequenceClassification.from_pretrained(
    "vazish/tinybert-address-autofill",
    file_name="onnx/model.onnx",   # or onnx/model_quantized.onnx, etc.
)
tokenizer = AutoTokenizer.from_pretrained("vazish/tinybert-address-autofill")
clf = pipeline("text-classification", model=model, tokenizer=tokenizer)

clf("email email mail **email")
# → [{"label": "email", "score": 0.99}]

Input format

The model expects a single string per field, built by concatenating that field's HTML attributes after light normalisation:

Concatenate (in order): type + autocomplete + id + name + placeholder + the field's computed <label> text.
Split camelCase boundaries to whitespace (firstName → first name).
Lowercase the whole thing.
If the field declares an autocomplete attribute, prepend an a-c-<value> token (e.g. a-c-postal-code).
Optionally include adjacent-field context — bb-prefixed tokens for the previous field on the same form and aa-prefixed tokens for the next. Including adjacent context improves accuracy by roughly 8 percentage points relative to the same model trained on isolated fields.

Example input for a "first name" field followed by a "last name" field:

first name first name enter first name aaa-c-family-name aalast aaname

Training


Base model	`huawei-noah/TinyBERT_General_4L_312D` (4 layers, hidden 312, intermediate 1200, 12 heads, ~14M params, max sequence length 512)
Head	`BertForSequenceClassification`, 66 output classes
Training set	~360 real shopping / checkout / address forms, 6,691 labelled fields
Validation / test	~246 forms, 4,300 fields, split into validation and test
Regions covered	US, CA, GB, FR, DE, BR, ES, JP, AT, IN, IT, PL, AU, CH (supported); some additional regions also represented for evaluation
Optimizer / schedule	Hugging Face `Trainer` defaults, 50 epochs
Hardware	Apple M1 MacBook Pro, ~75 minutes wall time

Each form field is annotated with data-mozautofill-type="<type>" set to the expected autofill class; fields that should not be filled receive no attribute and are mapped to other.

Evaluation

Evaluated on the project's held-out test set (2,168 labelled fields drawn from real address / shopping forms) using ONNX Runtime on CPU.

Total — strict exact-match accuracy.
Close — counts predictions on closely related labels as correct (e.g. street-address predicted when ground truth is address-line1, tel predicted when ground truth is tel-national).
Blank — false-fill rate. Fraction of other-labelled fields the model predicted as a real autofill type. Lower is better; this metric matters most for user experience because high false-fill means filling search boxes, comments, and gift-card fields with personal data.

Variant	Total	Close	Blank	Throughput (CPU)
fp32	89.62%	91.51%	2.40%	~218/s
fp16	89.71%	91.61%	2.31%	~132/s
bnb4	88.42%	90.64%	2.77%	~214/s
q4	88.01%	90.54%	2.58%	~209/s
q4f16	88.01%	90.54%	2.58%	~95/s
uint8	87.27%	89.53%	3.27%	~163/s
int8 / quantized	84.82%	87.73%	1.94%	~257/s

For reference, the existing Firefox regex-based heuristic detector reaches roughly 85% total accuracy on comparable test sets.

Highlights:

fp16 is statistically indistinguishable from fp32 across all metrics while halving the file size. It is the recommended high-fidelity variant. Latency on CPU is ~2× fp32 because most CPUs lack native fp16 ops, but the gap closes on hardware with fp16 support and on WebGPU.
int8 / quantized has the lowest exact accuracy but the lowest false-fill rate of any variant (1.94%, below the fp32 baseline). It errs toward other when uncertain — the safer failure mode for an autofill UI. This is the recommended size-constrained default.
4-bit variants (q4, q4f16, bnb4) cluster around 88% total accuracy with q4f16 being the smallest at 22 MB.

Limitations

Trained primarily on the supported-region list above. Accuracy on unsupported regions trained-without-data drops ~5–10 percentage points; adding region-specific samples to the training set typically recovers most of that gap.
Underrepresented field types (address-line3, additional-name, phonetic-*, tel-local-prefix, etc.) have very few training examples and are sometimes confidently misclassified.
Quantized variants disagree with fp32 on roughly 0.1% (fp16) to ~5% (int8) of inputs. The exact disagreement pattern is captured in the evaluation table above.
The model assumes the team's preprocessing format (camelCase-split, lowercased, with optional a-c-/bb/aa markers). Feeding raw HTML attribute strings without this normalisation will degrade accuracy.

Citation

This model is built on TinyBERT:

@inproceedings{jiao-etal-2020-tinybert,
  title     = {{TinyBERT}: Distilling {BERT} for Natural Language Understanding},
  author    = {Jiao, Xiaoqi and Yin, Yichun and Shang, Lifeng and Jiang, Xin
               and Chen, Xiao and Li, Linlin and Wang, Fang and Liu, Qun},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2020},
  year      = {2020},
  pages     = {4163--4174},
  url       = {https://aclanthology.org/2020.findings-emnlp.372}
}

If you use this checkpoint, please also cite the Mozilla autofill ML investigation that produced it (citation forthcoming).

License

Apache 2.0.