new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Jul 3

Similarity-Distance-Magnitude Universal Verification

We address the neural network robustness problem by adding Similarity (i.e., correctly predicted depth-matches into training)-awareness and Distance-to-training-distribution-awareness to the existing output Magnitude (i.e., decision-boundary)-awareness of the softmax function. The resulting SDM activation function provides strong signals of the relative epistemic (reducible) predictive uncertainty. We use this novel behavior to further address the complementary HCI problem of mapping the output to human-interpretable summary statistics over relevant partitions of a held-out calibration set. Estimates of prediction-conditional uncertainty are obtained via a parsimonious learned transform over the class-conditional empirical CDFs of the output of a final-layer SDM activation function. For decision-making and as an intrinsic model check, estimates of class-conditional accuracy are obtained by further partitioning the high-probability regions of this calibrated output into class-conditional, region-specific CDFs. The uncertainty estimates from SDM calibration are remarkably robust to test-time distribution shifts and out-of-distribution inputs; incorporate awareness of the effective sample size; provide estimates of uncertainty from the learning and data splitting processes; and are well-suited for selective classification and conditional branching for additional test-time compute based on the predictive uncertainty, as for selective LLM generation, routing, and composition over multiple models and retrieval. Finally, we construct SDM networks, LLMs with uncertainty-aware verification and interpretability-by-exemplar as intrinsic properties. We provide open-source software implementing these results.

  • 1 authors
·
Feb 27, 2025

NGRPO: Negative-enhanced Group Relative Policy Optimization

RLVR has enhanced the reasoning capabilities of Large Language Models (LLMs) across various tasks. However, GRPO, a representative RLVR algorithm, suffers from a critical limitation: when all responses within a group are either entirely correct or entirely incorrect, the model fails to learn from these homogeneous responses. This is particularly problematic for homogeneously incorrect groups, where GRPO's advantage function yields a value of zero, leading to null gradients and the loss of valuable learning signals. To overcome this issue, we propose NGRPO (Negative-enhanced Group Relative Policy Optimization), an algorithm designed to convert homogeneous errors into robust learning signals. First, NGRPO introduces Advantage Calibration. This mechanism hypothesizes the existence of a virtual maximum-reward sample during advantage calculation, thereby altering the mean and variance of rewards within a group and ensuring that the advantages for homogeneously incorrect samples are no longer zero. Second, NGRPO employs Asymmetric Clipping, which relaxes the update magnitude for positive samples while imposing stricter constraints on that of negative samples. This serves to stabilize the exploration pressure introduced by the advantage calibration. Our experiments on Qwen2.5-Math-7B demonstrate that NGRPO significantly outperforms baselines such as PPO, GRPO, DAPO, and PSR-NSR on mathematical benchmarks including MATH500, AMC23, and AIME2025. These results validate NGRPO's ability to learn from homogeneous errors, leading to stable and substantial improvements in mathematical reasoning. Our code is available at https://github.com/nangongrui-ngr/NGRPO.

  • 11 authors
·
Sep 23, 2025

The Young Supernova Experiment Data Release 1 (YSE DR1): Light Curves and Photometric Classification of 1975 Supernovae

We present the Young Supernova Experiment Data Release 1 (YSE DR1), comprised of processed multi-color Pan-STARRS1 (PS1) griz and Zwicky Transient Facility (ZTF) gr photometry of 1975 transients with host-galaxy associations, redshifts, spectroscopic/photometric classifications, and additional data products from 2019 November 24 to 2021 December 20. YSE DR1 spans discoveries and observations from young and fast-rising supernovae (SNe) to transients that persist for over a year, with a redshift distribution reaching z~0.5. We present relative SN rates from YSE's magnitude- and volume-limited surveys, which are consistent with previously published values within estimated uncertainties for untargeted surveys. We combine YSE and ZTF data, and create multi-survey SN simulations to train the ParSNIP and SuperRAENN photometric classification algorithms; when validating our ParSNIP classifier on 472 spectroscopically classified YSE DR1 SNe, we achieve 82% accuracy across three SN classes (SNe Ia, II, Ib/Ic) and 90% accuracy across two SN classes (SNe Ia, core-collapse SNe). Our classifier performs particularly well on SNe Ia, with high (>90%) individual completeness and purity, which will help build an anchor photometric SNe Ia sample for cosmology. We then use our photometric classifier to characterize our photometric sample of 1483 SNe, labeling 1048 (~71%) SNe Ia, 339 (~23%) SNe II, and 96 (~6%) SNe Ib/Ic. YSE DR1 provides a training ground for building discovery, anomaly detection, and classification algorithms, performing cosmological analyses, understanding the nature of red and rare transients, exploring tidal disruption events and nuclear variability, and preparing for the forthcoming Vera C. Rubin Observatory Legacy Survey of Space and Time.

  • 83 authors
·
Feb 20, 2023

Complex Query Answering with Neural Link Predictors

Neural link predictors are immensely useful for identifying missing edges in large scale Knowledge Graphs. However, it is still not clear how to use these models for answering more complex queries that arise in a number of domains, such as queries using logical conjunctions (land), disjunctions (lor) and existential quantifiers (exists), while accounting for missing edges. In this work, we propose a framework for efficiently answering complex queries on incomplete Knowledge Graphs. We translate each query into an end-to-end differentiable objective, where the truth value of each atom is computed by a pre-trained neural link predictor. We then analyse two solutions to the optimisation problem, including gradient-based and combinatorial search. In our experiments, the proposed approach produces more accurate results than state-of-the-art methods -- black-box neural models trained on millions of generated queries -- without the need of training on a large and diverse set of complex queries. Using orders of magnitude less training data, we obtain relative improvements ranging from 8% up to 40% in Hits@3 across different knowledge graphs containing factual information. Finally, we demonstrate that it is possible to explain the outcome of our model in terms of the intermediate solutions identified for each of the complex query atoms. All our source code and datasets are available online, at https://github.com/uclnlp/cqd.

  • 4 authors
·
Nov 6, 2020

Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

Reinforcement Learning with Verifiable Rewards (RLVR) offers a robust mechanism for enhancing mathematical reasoning in large models. However, we identify a systematic lack of emphasis on more challenging questions in existing methods from both algorithmic and data perspectives, despite their importance for refining underdeveloped capabilities. Algorithmically, widely used Group Relative Policy Optimization (GRPO) suffers from an implicit imbalance where the magnitude of policy updates is lower for harder questions. Data-wise, augmentation approaches primarily rephrase questions to enhance diversity without systematically increasing intrinsic difficulty. To address these issues, we propose a two-dual MathForge framework to improve mathematical reasoning by targeting harder questions from both perspectives, which comprises a Difficulty-Aware Group Policy Optimization (DGPO) algorithm and a Multi-Aspect Question Reformulation (MQR) strategy. Specifically, DGPO first rectifies the implicit imbalance in GRPO via difficulty-balanced group advantage estimation, and further prioritizes harder questions by difficulty-aware question-level weighting. Meanwhile, MQR reformulates questions across multiple aspects to increase difficulty while maintaining the original gold answer. Overall, MathForge forms a synergistic loop: MQR expands the data frontier, and DGPO effectively learns from the augmented data. Extensive experiments show that MathForge significantly outperforms existing methods on various mathematical reasoning tasks. The code and augmented data are all available at https://github.com/AMAP-ML/MathForge.

GD-ML AMAP-ML
·
Jan 28 20

Measuring the Symmetry--Data Exchange Rate

Equivariance theory predicts that an architectural symmetry prior reduces sample complexity by a factor of |G|; this is widely cited but rarely measured as a scaling law with controls that separate the prior from its confounds. On a controlled C_n-symmetric task, we report three findings. First, a wrong-group control with identical orbit size and matched compute is worse than no constraint (joint pairwise CI [+0.79, +3.26] excludes zero, robust across estimators); misaligned constraint is actively harmful, not merely unhelpful. Second, an augmentation baseline equipped with test-time orbit averaging matches the equivariant model exactly -- bit-identical per-epoch validation curves across matched cells -- so the architecture-vs-augmentation gap is conditional on asymmetric test-time computation, not unconditional. Third, the relative exchange rate beta_diff = 1.28 is consistent in sign and order of magnitude with the theoretical 1.0 (single-level CI [+0.92, +2.05]); the more conservative two-level bootstrap (seeds x group sizes) widens this to [-0.63, +1.72], including zero, and a finer-N replication on a sqrt(2)-spaced grid is inconclusive (point estimate -0.82). The methodological contributions -- the relative-rate estimator that cancels the shared-difficulty confound, the wrong-group control, and a pre-specified failure taxonomy -- transfer to any inductive bias whose strength can be parameterised. Honest scoping: the primary estimator beta_diff was adopted post-hoc after the initial analysis revealed a positive-slope identifiability problem; the design was never externally pre-registered; and the headline number rests on an OLS slope over seven group sizes on a coarse N grid. This is an exploratory study, not a confirmatory measurement; the wrong-group result is the cleanest finding and the one we report with the most confidence. A registered replication on fresh seeds is future work.

  • 1 authors
·
May 30 2

RNAGenScape: Property-Guided, Optimized Generation of mRNA Sequences with Manifold Langevin Dynamics

Generating property-optimized mRNA sequences is central to applications such as vaccine design and protein replacement therapy, but remains challenging due to limited data, complex sequence-function relationships, and the narrow space of biologically viable sequences. Generative methods that drift away from the data manifold can yield sequences that fail to fold, translate poorly, or are otherwise nonfunctional. We present RNAGenScape, a property-guided manifold Langevin dynamics framework for mRNA sequence generation that operates directly on a learned manifold of real data. By performing iterative local optimization constrained to this manifold, RNAGenScape preserves biological viability, accesses reliable guidance, and avoids excursions into nonfunctional regions of the ambient sequence space. The framework integrates three components: (1) an autoencoder jointly trained with a property predictor to learn a property-organized latent manifold, (2) a denoising autoencoder that projects updates back onto the manifold, and (3) a property-guided Langevin dynamics procedure that performs optimization along the manifold. Across three real-world mRNA datasets spanning two orders of magnitude in size, RNAGenScape increases median property gain by up to 148% and success rate by up to 30% while ensuring biological viability of generated sequences, and achieves competitive inference efficiency relative to existing generative approaches.

  • 11 authors
·
Oct 14, 2025

VSSD: Vision Mamba with Non-Casual State Space Duality

Vision transformers have significantly advanced the field of computer vision, offering robust modeling capabilities and global receptive field. However, their high computational demands limit their applicability in processing long sequences. To tackle this issue, State Space Models (SSMs) have gained prominence in vision tasks as they offer linear computational complexity. Recently, State Space Duality (SSD), an improved variant of SSMs, was introduced in Mamba2 to enhance model performance and efficiency. However, the inherent causal nature of SSD/SSMs restricts their applications in non-causal vision tasks. To address this limitation, we introduce Visual State Space Duality (VSSD) model, which has a non-causal format of SSD. Specifically, we propose to discard the magnitude of interactions between the hidden state and tokens while preserving their relative weights, which relieves the dependencies of token contribution on previous tokens. Together with the involvement of multi-scan strategies, we show that the scanning results can be integrated to achieve non-causality, which not only improves the performance of SSD in vision tasks but also enhances its efficiency. We conduct extensive experiments on various benchmarks including image classification, detection, and segmentation, where VSSD surpasses existing state-of-the-art SSM-based models. Code and weights are available at https://github.com/YuHengsss/VSSD.

  • 4 authors
·
Jul 26, 2024 2

XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

Although LLM inference has emerged as a critical workload for many downstream applications, efficiently inferring LLMs is challenging due to the substantial memory footprint and bandwidth requirements. In parallel, compute capabilities have steadily outpaced both memory capacity and bandwidth over the last few decades, a trend that remains evident in modern GPU hardware and exacerbates the challenge of LLM inference. As such, new algorithms are emerging that trade increased computation for reduced memory operations. To that end, we present XQuant, which takes advantage of this trend, enabling an order-of-magnitude reduction in memory consumption through low-bit quantization with substantial accuracy benefits relative to state-of-the-art KV cache quantization methods. We accomplish this by quantizing and caching the layer input activations X, instead of using standard KV caching, and then rematerializing the Keys and Values on-the-fly during inference. This results in an immediate 2times memory savings compared to KV caching. By applying XQuant, we achieve up to sim 7.7times memory savings with <0.1 perplexity degradation compared to the FP16 baseline. Furthermore, our approach leverages the fact that X values are similar across layers. Building on this observation, we introduce XQuant-CL, which exploits the cross-layer similarity in the X embeddings for extreme compression. Across different models, XQuant-CL attains up to 10times memory savings relative to the FP16 baseline with only 0.01 perplexity degradation, and 12.5times memory savings with only 0.1 perplexity degradation. XQuant exploits the rapidly increasing compute capabilities of hardware platforms to eliminate the memory bottleneck, while surpassing state-of-the-art KV cache quantization methods and achieving near-FP16 accuracy across a wide range of models.

  • 10 authors
·
Aug 14, 2025 2

How Alignment Shrinks the Generative Horizon

Despite their impressive capabilities, aligned large language models (LLMs) often generate outputs that lack diversity. What drives this stability in the generation? We investigate this phenomenon through the lens of probability concentration in the model's output distribution. To quantify this concentration, we introduce the Branching Factor (BF) -- a token-invariant measure of the effective number of plausible next steps during generation. Our empirical analysis reveals two key findings: (1) BF often decreases as generation progresses, suggesting that LLMs become more predictable as they generate. (2) alignment tuning substantially sharpens the model's output distribution from the outset, reducing BF by nearly an order of magnitude (e.g., from 12 to 1.2) relative to base models. This stark reduction helps explain why aligned models often appear less sensitive to decoding strategies. Building on this insight, we find this stability has surprising implications for complex reasoning. Aligned Chain-of-Thought (CoT) models (e.g., DeepSeek-distilled models), for instance, leverage this effect; by generating longer reasoning chains, they push generation into later, more deterministic (lower BF) stages, resulting in more stable outputs. We hypothesize that alignment tuning does not fundamentally change a model's behavior, but instead steers it toward stylistic tokens (e.g., "Sure") that unlock low-entropy trajectories already present in the base model. This view is supported by nudging experiments, which show that prompting base models with such tokens can similarly reduce BF. Together, our findings establish BF as a powerful diagnostic for understanding and controlling LLM outputs - clarifying how alignment reduces variability, how CoT promotes stable generations, and how base models can be steered away from diversity.

  • 2 authors
·
Jun 21, 2025 1

FED-Bench: A Cross-Granular Benchmark for Disentangled Evaluation of Facial Expression Editing

Facial expression image editing requires fine-grained control to strictly preserve human identity and background while precisely manipulating expression. However, existing editing benchmarks primarily focus on general scenarios, lacking high-quality facial images and corresponding editing instructions. Furthermore, current evaluation metrics exhibit systemic biases in this task, often favoring lazy editing or overfit editing. To bridge these gaps, we propose FED-Bench, a comprehensive benchmark featuring rigorous testing and an accurate evaluation suite. First, we carefully construct a benchmark of 747 triplets through a cascaded and scalable pipeline, each comprising an original image, an editing instruction, and a ground-truth image for precise evaluation. Second, we introduce FED-Score, a cross-granularity evaluation protocol that disentangles assessment into three dimensions: Alignment for verifying instruction following, Fidelity for testing image quality and identity preservation, and Relative Expression Gain for quantifying the magnitude of expression changes, effectively mitigating the aforementioned evaluation biases. Third, we benchmark 18 image editing models, revealing that current approaches struggle to simultaneously achieve high fidelity and accurate expression manipulation, with fine-grained instruction following identified as the primary bottleneck. Finally, leveraging the scalable characteristic of introduced benchmark engine, we provide a 20k+ in-the-wild facial training set and demonstrate its effectiveness by fine-tuning a baseline model that achieves significant performance gains. Our benchmark and related code will be made publicly open soon.

  • 11 authors
·
Mar 30

Memory as Resonance: A Biomimetic Architecture for Infinite Context Memory on Ergodic Phonetic Manifolds

The memory of contemporary Large Language Models is bound by a physical paradox: as they learn, they fill up. The linear accumulation (O(N)) of Key-Value states treats context as a warehouse of static artifacts, eventually forcing a destructive choice between amnesia and latency. We challenge this discrete orthodoxy, proposing that long-term memory is not the storage of items, but the persistence of a trajectory. We introduce Phonetic Trajectory Memory (PTM), a neuro-symbolic architecture that encodes language not as a sequence of tensors, but as a continuous path on an ergodic manifold governed by irrational rotation matrices. By decoupling the navigation (an invariant O(1) geometric signal) from the reconstruction (a probabilistic generative act), PTM achieves a compression magnitude of greater than 3,000x relative to dense caches. We demonstrate that retrieval becomes a process of resonance: the phonetic trace stabilizes the model against hallucination via "Signal Consensus" mechanism, securing up to approximately 92% factual accuracy. While this aggressive abstraction alters generative texture, it unlocks immediate access latency (approximately 34ms) independent of depth. Our results suggest that infinite context does not require infinite silicon; it requires treating memory not as data to be stored, but as a reconstructive process acting on a conserved, undying physical signal.

  • 3 authors
·
Dec 23, 2025 2

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding

Recent state-of-the-art natural language understanding models, such as BERT and XLNet, score a pair of sentences (A and B) using multiple cross-attention operations - a process in which each word in sentence A attends to all words in sentence B and vice versa. As a result, computing the similarity between a query sentence and a set of candidate sentences, requires the propagation of all query-candidate sentence-pairs throughout a stack of cross-attention layers. This exhaustive process becomes computationally prohibitive when the number of candidate sentences is large. In contrast, sentence embedding techniques learn a sentence-to-vector mapping and compute the similarity between the sentence vectors via simple elementary operations. In this paper, we introduce Distilled Sentence Embedding (DSE) - a model that is based on knowledge distillation from cross-attentive models, focusing on sentence-pair tasks. The outline of DSE is as follows: Given a cross-attentive teacher model (e.g. a fine-tuned BERT), we train a sentence embedding based student model to reconstruct the sentence-pair scores obtained by the teacher model. We empirically demonstrate the effectiveness of DSE on five GLUE sentence-pair tasks. DSE significantly outperforms several ELMO variants and other sentence embedding methods, while accelerating computation of the query-candidate sentence-pairs similarities by several orders of magnitude, with an average relative degradation of 4.6% compared to BERT. Furthermore, we show that DSE produces sentence embeddings that reach state-of-the-art performance on universal sentence representation benchmarks. Our code is made publicly available at https://github.com/microsoft/Distilled-Sentence-Embedding.

  • 6 authors
·
Aug 14, 2019

A$^2$TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping

Reinforcement learning for agentic large language models (LLMs) typically relies on a sparse, trajectory-level outcome reward, making it difficult to evaluate the contribution of individual tool-calls within multi-turn interactions. Existing approaches to such process credit assignment either depend on separate external process reward models that introduce additional consumption, or tree-based structural rollout that merely redistributes the outcome signal while constraining trajectory diversity. A promising alternative leverages the per-turn change in the policy's predicted probability of the ground-truth, termed Information Gain (IG), as an intrinsic process signal without an external evaluator. However, prior work on leveraging IG signals within the RL training loop faces three systematic challenges: normalizing across turns that face heterogeneous positional contexts can distort the relative standing of individual turns, accumulating a variable number of terms causes advantage magnitudes to drift with trajectory depth, and a fixed clipping range governs policy updates identically for turns with vastly different IG signals. In this paper, we propose A^2TGPO (Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping), which retains IG as the intrinsic signal but re-designs how it is normalized, accumulated, and consumed: (i) turn-group normalization: normalizes IG within each (prompt, turn-index) group so that each turn is compared only against peers at the same interaction depth; (ii) variance-rescaled discounted accumulation: divides cumulative normalized IG by square root of accumulated terms to keep advantage magnitudes comparable across turn positions; and (iii) adaptive turn-level clipping: modulates each turn's clipping range based on its normalized IG, widening the update region for informative turns and narrowing it for uninformative ones.

tencent Tencent
·
May 6 4

Efficient, Property-Aligned Fan-Out Retrieval via RL-Compiled Diffusion

Many modern retrieval problems are set-valued: given a broad intent, the system must return a collection of results that optimizes higher-order properties (e.g., diversity, coverage, complementarity, coherence) while remaining grounded with respect to a fixed database. Set-valued objectives are typically non-decomposable and are not captured by existing supervised (query, content) datasets which only prioritize top-1 retrieval. Consequently, fan-out retrieval is often employed to generate diverse subqueries to retrieve item sets. While reinforcement learning (RL) can optimize set-level objectives via interaction, deploying an RL-tuned LLM for fan-out retrieval is prohibitively expensive at inference time. Conversely, diffusion-based generative retrieval enables efficient single-pass fan-out in embedding space, but requires objective-aligned training targets. To address these issues, we propose R4T (Retrieve-for-Train), which uses RL once as an objective transducer in a three-step process: (i) train a fan-out LLM with composite set-level rewards, (ii) synthesize objective-consistent training pairs, and (iii) train a lightweight diffusion retriever to model the conditional distribution of set-valued outputs. Across large-scale fashion and music benchmarks consisting of curated item sets, we show that R4T improves retrieval quality relative to strong baselines while reducing query-time fan-out latency by an order of magnitude.

  • 11 authors
·
Mar 5

"I May Not Have Articulated Myself Clearly": Diagnosing Dynamic Instability in LLM Reasoning at Inference Time

Reasoning failures in large language models (LLMs) are typically measured only at the end of a generation, yet many failures manifest as a process-level breakdown: the model "loses the thread" mid-reasoning. We study whether such breakdowns are detectable from inference-time observables available in standard APIs (token log probabilities), without any training or fine-tuning. We define a simple instability signal that combines consecutive-step distributional shift (JSD) and uncertainty (entropy), summarize each trace by its peak instability strength, and show that this signal reliably predicts failure. Across GSM8K and HotpotQA, instability strength predicts wrong answers with above-chance AUC and yields monotonic bucket-level accuracy decline at scale across model sizes. Crucially, we show that instability is not uniformly harmful: early instability can reflect subsequent stabilization and a correct final answer (corrective instability), whereas late instability is more often followed by failure (destructive instability), even at comparable peak magnitudes, indicating that recoverability depends not only on how strongly the distribution changes but also on when such changes occur relative to the remaining decoding horizon. The method is model-agnostic, training-free, and reproducible, and is presented as a diagnostic lens rather than a corrective or control mechanism.

  • 4 authors
·
Feb 2 3

Improving Neural Network Training by Decoupling the Magnitude and Direction of Weight Vectors

Modern neural network training relies on optimizers such as Adam and Muon which act on each weight matrix as a single object. Yet every weight matrix carries two distinct quantities -- a magnitude and a direction -- and all optimizers stepping in the matrix as a whole couple their dynamics: the directional change from an update depends on the current magnitude, while the magnitude drifts as a byproduct of learning the direction, so neither is governed directly by the learning rate. Typical training therefore leans on surrounding recipes such as weight decay and warmup to keep learning stable at scale, though these regulate the coupling only indirectly; other recent methods instead constrain the weight to a fixed-norm sphere, but add no learnable magnitude, leaving scale control to normalization layers alone. We propose Magnitude--Direction (MD) Decoupling, an optimizer modification that factorizes each weight into a fixed-norm direction on a hypersphere and learnable per-row and per-column magnitude gains, updated at separate learning rates, all while the model still sees a single fused weight tensor. The method is agnostic to the base optimizer and removes the need for weight decay and warmup. Across both Adam and Muon, MD Decoupling improves on well-tuned baselines, transfers the optimal LR across model width without retuning, and continues to help at scale on large Mixture-of-Experts (MoE) models. Treating magnitude and direction as separately controlled quantities thus yields more predictable training dynamics and a simple, broadly applicable improvement to modern optimizers.

  • 4 authors
·
Jun 23

VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery

Monocular depth estimation can be broadly categorized into two directions: relative depth estimation, which predicts normalized or inverse depth without absolute scale, and metric depth estimation, which aims to recover depth with real-world scale. While relative methods are flexible and data-efficient, their lack of metric scale limits their utility in downstream tasks. A promising solution is to infer absolute scale from textual descriptions. However, such language-based recovery is highly sensitive to natural language ambiguity, as the same image may be described differently across perspectives and styles. To address this, we introduce VGLD (Visually-Guided Linguistic Disambiguation), a framework that incorporates high-level visual semantics to resolve ambiguity in textual inputs. By jointly encoding both image and text, VGLD predicts a set of global linear transformation parameters that align relative depth maps with metric scale. This visually grounded disambiguation improves the stability and accuracy of scale estimation. We evaluate VGLD on representative models, including MiDaS and DepthAnything, using standard indoor (NYUv2) and outdoor (KITTI) benchmarks. Results show that VGLD significantly mitigates scale estimation bias caused by inconsistent or ambiguous language, achieving robust and accurate metric predictions. Moreover, when trained on multiple datasets, VGLD functions as a universal and lightweight alignment module, maintaining strong performance even in zero-shot settings. Code will be released upon acceptance.

  • 2 authors
·
May 5, 2025

CfA3: 185 Type Ia Supernova Light Curves from the CfA

We present multi-band photometry of 185 type-Ia supernovae (SN Ia), with over 11500 observations. These were acquired between 2001 and 2008 at the F. L. Whipple Observatory of the Harvard-Smithsonian Center for Astrophysics (CfA). This sample contains the largest number of homogeneously-observed and reduced nearby SN Ia (z < 0.08) published to date. It more than doubles the nearby sample, bringing SN Ia cosmology to the point where systematic uncertainties dominate. Our natural system photometry has a precision of 0.02 mag or better in BVRIr'i' and roughly 0.04 mag in U for points brighter than 17.5 mag. We also estimate a systematic uncertainty of 0.03 mag in our SN Ia standard system BVRIr'i' photometry and 0.07 mag for U. Comparisons of our standard system photometry with published SN Ia light curves and comparison stars, where available for the same SN, reveal agreement at the level of a few hundredths mag in most cases. We find that 1991bg-like SN Ia are sufficiently distinct from other SN Ia in their color and light-curve-shape/luminosity relation that they should be treated separately in light-curve/distance fitter training samples. The CfA3 sample will contribute to the development of better light-curve/distance fitters, particularly in the few dozen cases where near-infrared photometry has been obtained and, together, can help disentangle host-galaxy reddening from intrinsic supernova color, reducing the systematic uncertainty in SN Ia distances due to dust.

  • 8 authors
·
Jan 29, 2009

Cosmic Evolution Early Release Science (CEERS) survey: The colour evolution of galaxies in the distant Universe

The wavelength-coverage and sensitivity of JWST now enables us to probe the rest-frame UV - optical spectral energy distributions (SEDs) of galaxies at high-redshift (z>4). From these SEDs it is, in principle, through SED fitting possible to infer key physical properties, including stellar masses, star formation rates, and dust attenuation. These in turn can be compared with the predictions of galaxy formation simulations allowing us to validate and refine the incorporated physics. However, the inference of physical properties, particularly from photometry alone, can lead to large uncertainties and potential biases. Instead, it is now possible, and common, for simulations to be forward-modelled to yield synthetic observations that can be compared directly to real observations. In this work, we measure the JWST broadband fluxes and colours of a robust sample of 5<z<10 galaxies using the Cosmic Evolution Early Release Science (CEERS) Survey. We then analyse predictions from a variety of models using the same methodology and compare the NIRCam/F277W magnitude distribution and NIRCam colours with observations. We find that the predicted and observed magnitude distributions are similar, at least at 5<z<8. At z>8 the distributions differ somewhat, though our observed sample size is small and thus susceptible to statistical fluctuations. Likewise, the predicted and observed colour evolution show broad agreement, at least at 5<z<8. There is however some disagreement between the observed and modelled strength of the strong line contribution. In particular all the models fails to reproduce the F410M-F444W colour at z>8, though, again, the sample size is small here.

  • 23 authors
·
Nov 14, 2023