vazish's picture
readme
51194fd unverified
metadata
license: apache-2.0
language: multilingual
library_name: transformers.js
pipeline_tag: text-classification
base_model: huawei-noah/TinyBERT_General_4L_312D
tags:
  - autofill
  - field-classification
  - bert
  - tinybert
  - onnx
  - transformers.js
  - browser

TinyBERT Address Autofill

A compact field-type classifier for HTML form autofill developed by the Credentials Management Team on Firefox. Given a string describing a single form field's attributes, it predicts one of 66 autofill field types (given-name, family-name, email, postal-code, address-line1, cc-number, etc.) or other when the field should not be filled.

The model is fine-tuned from huawei-noah/TinyBERT_General_4L_312D on a corpus of manually annotated shopping and address forms collected by Mozilla, and is intended to run client-side inside Firefox (or any Transformers.js host) as a replacement or augmentation for the existing regex-based heuristic field detector.

ONNX variants

All variants live under onnx/ and are loadable through Transformers.js by passing the corresponding dtype argument.

File Precision Size Transformers.js dtype
onnx/model.onnx fp32 57.6 MB fp32
onnx/model_fp16.onnx fp16 28.9 MB fp16
onnx/model_quantized.onnx int8 dynamic (default) 14.6 MB q8
onnx/model_int8.onnx int8 dynamic 14.6 MB int8
onnx/model_uint8.onnx uint8 dynamic 14.6 MB uint8
onnx/model_q4.onnx 4-bit weight-only on MatMul 42.3 MB q4
onnx/model_q4f16.onnx 4-bit on top of fp16 22.4 MB q4f16
onnx/model_bnb4.onnx bitsandbytes NF4 41.9 MB bnb4

How to use

Transformers.js (browser)

import { pipeline } from "@huggingface/transformers";

const classifier = await pipeline(
  "text-classification",
  "vazish/tinybert-address-autofill",
  { dtype: "q8" }   // try "fp16" for highest fidelity, "q4f16" for smallest
);

const out = await classifier(
  "a-c-postal-code billing zip code dwfrm billing address fields postal code"
);
// β†’ [{ label: "postal-code", score: 0.99 }]

Python (Optimum + ONNX Runtime)

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer, pipeline

model = ORTModelForSequenceClassification.from_pretrained(
    "vazish/tinybert-address-autofill",
    file_name="onnx/model.onnx",   # or onnx/model_quantized.onnx, etc.
)
tokenizer = AutoTokenizer.from_pretrained("vazish/tinybert-address-autofill")
clf = pipeline("text-classification", model=model, tokenizer=tokenizer)

clf("email email mail **email")
# β†’ [{"label": "email", "score": 0.99}]

Input format

The model expects a single string per field, built by concatenating that field's HTML attributes after light normalisation:

  1. Concatenate (in order): type + autocomplete + id + name + placeholder + the field's computed <label> text.
  2. Split camelCase boundaries to whitespace (firstName β†’ first name).
  3. Lowercase the whole thing.
  4. If the field declares an autocomplete attribute, prepend an a-c-<value> token (e.g. a-c-postal-code).
  5. Optionally include adjacent-field context β€” bb-prefixed tokens for the previous field on the same form and aa-prefixed tokens for the next. Including adjacent context improves accuracy by roughly 8 percentage points relative to the same model trained on isolated fields.

Example input for a "first name" field followed by a "last name" field:

first name first name enter first name aaa-c-family-name aalast aaname

Training

Base model huawei-noah/TinyBERT_General_4L_312D (4 layers, hidden 312, intermediate 1200, 12 heads, ~14M params, max sequence length 512)
Head BertForSequenceClassification, 66 output classes
Training set ~360 real shopping / checkout / address forms, 6,691 labelled fields
Validation / test ~246 forms, 4,300 fields, split into validation and test
Regions covered US, CA, GB, FR, DE, BR, ES, JP, AT, IN, IT, PL, AU, CH (supported); some additional regions also represented for evaluation
Optimizer / schedule Hugging Face Trainer defaults, 50 epochs
Hardware Apple M1 MacBook Pro, ~75 minutes wall time

Each form field is annotated with data-mozautofill-type="<type>" set to the expected autofill class; fields that should not be filled receive no attribute and are mapped to other.

Evaluation

Evaluated on the project's held-out test set (2,168 labelled fields drawn from real address / shopping forms) using ONNX Runtime on CPU.

  • Total β€” strict exact-match accuracy.
  • Close β€” counts predictions on closely related labels as correct (e.g. street-address predicted when ground truth is address-line1, tel predicted when ground truth is tel-national).
  • Blank β€” false-fill rate. Fraction of other-labelled fields the model predicted as a real autofill type. Lower is better; this metric matters most for user experience because high false-fill means filling search boxes, comments, and gift-card fields with personal data.
Variant Total Close Blank Throughput (CPU)
fp32 89.62% 91.51% 2.40% ~218/s
fp16 89.71% 91.61% 2.31% ~132/s
bnb4 88.42% 90.64% 2.77% ~214/s
q4 88.01% 90.54% 2.58% ~209/s
q4f16 88.01% 90.54% 2.58% ~95/s
uint8 87.27% 89.53% 3.27% ~163/s
int8 / quantized 84.82% 87.73% 1.94% ~257/s

For reference, the existing Firefox regex-based heuristic detector reaches roughly 85% total accuracy on comparable test sets.

Highlights:

  • fp16 is statistically indistinguishable from fp32 across all metrics while halving the file size. It is the recommended high-fidelity variant. Latency on CPU is ~2Γ— fp32 because most CPUs lack native fp16 ops, but the gap closes on hardware with fp16 support and on WebGPU.
  • int8 / quantized has the lowest exact accuracy but the lowest false-fill rate of any variant (1.94%, below the fp32 baseline). It errs toward other when uncertain β€” the safer failure mode for an autofill UI. This is the recommended size-constrained default.
  • 4-bit variants (q4, q4f16, bnb4) cluster around 88% total accuracy with q4f16 being the smallest at 22 MB.

Limitations

  • Trained primarily on the supported-region list above. Accuracy on unsupported regions trained-without-data drops ~5–10 percentage points; adding region-specific samples to the training set typically recovers most of that gap.
  • Underrepresented field types (address-line3, additional-name, phonetic-*, tel-local-prefix, etc.) have very few training examples and are sometimes confidently misclassified.
  • Quantized variants disagree with fp32 on roughly 0.1% (fp16) to ~5% (int8) of inputs. The exact disagreement pattern is captured in the evaluation table above.
  • The model assumes the team's preprocessing format (camelCase-split, lowercased, with optional a-c-/bb/aa markers). Feeding raw HTML attribute strings without this normalisation will degrade accuracy.

Citation

This model is built on TinyBERT:

@inproceedings{jiao-etal-2020-tinybert,
  title     = {{TinyBERT}: Distilling {BERT} for Natural Language Understanding},
  author    = {Jiao, Xiaoqi and Yin, Yichun and Shang, Lifeng and Jiang, Xin
               and Chen, Xiao and Li, Linlin and Wang, Fang and Liu, Qun},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2020},
  year      = {2020},
  pages     = {4163--4174},
  url       = {https://aclanthology.org/2020.findings-emnlp.372}
}

If you use this checkpoint, please also cite the Mozilla autofill ML investigation that produced it (citation forthcoming).

License

Apache 2.0.