Salesforce/MTA-Vision-DeepSearch
Viewer • Updated • 178 • 18
None defined yet.
Learning from Language Feedback via Variational Policy Distillation
The Illusion of Certainty: Decoupling Capability and Calibration in On-Policy Distillation