StentorLabs commited on
Commit
639e20d
·
verified ·
1 Parent(s): 49c9bd6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -272,8 +272,7 @@ These are candid, first-hand observations about how this model actually behaves.
272
 
273
  6. **The model talks about education and academics — a lot.** Trained on FineWeb-HQ (a high-quality filtered web corpus with significant PDF and educational content) and StenCore (100% PDFs), the model has a strong prior toward academic language, school systems, curriculum, research, and formal writing. Prompts unrelated to education will frequently be redirected toward educational framing anyway.
274
 
275
- 7. **This model will usually stop on its own.** Unlike other Stentor models that tend to run until they hit `max_new_tokens`, Stentor2-12M will typically emit an EOS token and halt by itself — usually somewhere after ~400 tokens, though the exact stopping point varies. You don't need a tight token cap to prevent runaway generation. That said, it's still recommended to set a generous ceiling (e.g. `max_new_tokens=1000`) rather than leaving it uncapped, just as a safety net in case the model doesn't stop on a given run.
276
-
277
  ---
278
 
279
  ## PDF Tokens & The Replacement Character
 
272
 
273
  6. **The model talks about education and academics — a lot.** Trained on FineWeb-HQ (a high-quality filtered web corpus with significant PDF and educational content) and StenCore (100% PDFs), the model has a strong prior toward academic language, school systems, curriculum, research, and formal writing. Prompts unrelated to education will frequently be redirected toward educational framing anyway.
274
 
275
+ 7. **This model will usually stop on its own.** Unlike other Stentor models that tend to run until they hit `max_new_tokens`, Stentor2-12M will typically emit an EOS token and halt by itself — it can happen anywhere, it might be at the 20 token mark or it might be the 500 token mark. The exact stopping point can highly vary. You don't need a tight token cap to prevent runaway generation. That said, it's still recommended to set a generous ceiling (e.g. `max_new_tokens=1000`) rather than leaving it uncapped, just as a safety net in case the model doesn't stop on a given run.
 
276
  ---
277
 
278
  ## PDF Tokens & The Replacement Character