arXiv:2604.16333v1 Announce Type: new Abstract: Knee osteoarthritis frequently exhibits discordance between structural damage observed in imaging and patient-reported symptoms such as pain. This mismatch complicates clinical interpretation and patient stratification and remains insufficiently modeled in existing decision support systems. We propose a discordance aware multimodal framework that co…
arXiv cs.LG
↗ arxiv.org/list/cs.LG/recentresearch · en · weight 1.1
- A Discordance-Aware Multimodal Framework with Multi-Agent Clinical Reasoning◆ 1.0#cs.lg#cs.ai
- Annotation Entropy Predicts Per-Example Learning Dynamics in LoRA Fine-Tuning
arXiv:2604.16332v1 Announce Type: new Abstract: We find that LoRA fine-tuning exhibits un-learning on contested examples: items with high annotator disagreement show increasing loss during training, a qualitatively distinct pattern largely absent under full fine-tuning and consistent across all six models tested (four encoder, two decoder-only). This discovery emerges from correlating annotation …
◆ 1.0#cs.lg#cs.cl - Reasoning on the Manifold: Bidirectional Consistency for Self-Verification in Diffusion Language Models
arXiv:2604.16565v1 Announce Type: new Abstract: While Diffusion Large Language Models (dLLMs) offer structural advantages for global planning, efficiently verifying that they arrive at correct answers via valid reasoning traces remains a critical challenge. In this work, we propose a geometric perspective: Reasoning on the Manifold. We hypothesize that valid generation trajectories reside as stab…
◆ 1.3#cs.lg#cs.ai - An Interpretable Framework Applying Protein Words to Predict Protein-Small Molecule Complementary Pairing Rules
arXiv:2604.16550v1 Announce Type: new Abstract: Despite the high accuracy of 'black box' deep learning models, drug discovery still relies on protein-ligand interaction principles and heuristics. To improve interpretability of protein-small molecule binding predictions, we developed the PWRules framework, which applies binding affinity data to identify privileged small molecule fragments and subs…
◆ 1.0#cs.lg#cs.ai - Multi-Label Phase Diagram Prediction in Complex Alloys via Physics-Informed Graph Attention Networks
arXiv:2604.16468v1 Announce Type: new Abstract: Accurate phase equilibria are foundational to alloy design because they encode the underlying thermodynamics governing stability, transformations, and processing windows. However, while the CALculation of Phase Diagrams (CALPHAD) provides a rigorous thermodynamic framework, exploring multicomponent composition-temperature space remains computational…
◆ 1.0#cs.lg#cond-mat.mtrl-sci - (Sparse) Attention to the Details: Preserving Spectral Fidelity in ML-based Weather Forecasting Models
arXiv:2604.16429v1 Announce Type: new Abstract: We introduce Mosaic, a probabilistic weather forecasting model that addresses two principal sources of spectral degradation in ML-based weather prediction: (1) deterministic training against ensemble means and (2) compressive encoding creating an information bottleneck. Mosaic generates ensemble members through learned functional perturbations and o…
◆ 1.0#cs.lg#cs.ai - Matched-Learning-Rate Analysis of Attention Drift and Transfer Retention in Fine-Tuned CLIP
arXiv:2604.16410v1 Announce Type: new Abstract: CLIP adaptation can improve in-domain accuracy while degrading out-of-domain transfer, but comparisons between Full Fine-Tuning (Full FT) and LoRA are often confounded by different learning-rate conventions. We study how adaptation method and optimization scale jointly shape attention drift and transfer retention in CLIP using a controlled matched-l…
◆ 1.0#cs.lg - In Search of Lost DNA Sequence Pretraining
arXiv:2604.16570v1 Announce Type: new Abstract: DNA sequence encoding is fundamental to gene function prediction, protein synthesis, and diverse downstream biological tasks. Despite the substantial progress achieved by large-scale DNA sequence pretraining, existing studies have overwhelmingly emphasized pretraining scale and custom downstream evaluation datasets, while neglecting some essential c…
◆ 1.0#cs.lg#cs.ai - The Global Neural World Model: Spatially Grounded Discrete Topologies for Action-Conditioned Planning
arXiv:2604.16585v1 Announce Type: new Abstract: We present the Global Neural World Model (GNWM), a self-stabilizing framework that achieves topological quantization through balanced continuous entropy constraints. Operating as a continuous, action-conditioned Joint-Embedding Predictive Architecture (JEPA), the GNWM maps environments onto a discrete 2D grid, enforcing translational equivariance wi…
◆ 1.0#cs.lg#cs.ai - POLAR: Online Learning for LoRA Adapter Caching and Routing in Edge LLM Serving
arXiv:2604.16583v1 Announce Type: new Abstract: Edge deployment of large language models (LLMs) increasingly relies on libraries of lightweight LoRA adapters, yet GPU/DRAM can keep only a small resident subset at a time. Serving a request through a non-resident adapter requires paging its weights from storage, incurring measurable latency. This creates a two-timescale online control problem: on a…
◆ 1.3#cs.lg#cs.ai - SetFlow: Generating Structured Sets of Representations for Multiple Instance Learning
arXiv:2604.16362v1 Announce Type: new Abstract: Data scarcity and weak supervision continue to limit the performance of machine learning models in many real-world applications, such as mammography, where Multiple Instance Learning (MIL) often offers the best formulation. While recent foundation models provide strong semantic representations out of the box, effective augmentation of such represent…
◆ 1.0#cs.lg#cs.ai - UniMamba: A Unified Spatial-Temporal Modeling Framework with State-Space and Attention Integration
arXiv:2604.16325v1 Announce Type: new Abstract: Multivariate time series forecasting is fundamental to numerous domains such as energy, finance, and environmental monitoring, where complex temporal dependencies and cross-variable interactions pose enduring challenges. Existing Transformer-based methods capture temporal correlations through attention mechanisms but suffer from quadratic computatio…
◆ 1.0#cs.lg#cs.ai - FedLLM: A Privacy-Preserving Federated Large Language Model for Explainable Traffic Flow Prediction
arXiv:2604.16612v1 Announce Type: new Abstract: Traffic prediction plays a central role in intelligent transportation systems (ITS) by supporting real-time decision-making, congestion management, and long-term planning. However, many existing approaches face practical limitations. Most spatio-temporal models are trained on centralized data, rely on numerical representations, and offer limited exp…
◆ 1.3#cs.lg - Randomized Antipodal Search Done Right for Data Pareto Improvement of LLM Unlearning
arXiv:2604.16591v1 Announce Type: new Abstract: Large language models (LLMs) sometimes memorize undesirable knowledge, which must be removed after deployment. Prior work on machine unlearning has focused largely on optimization methods that adjust parameters to enforce forgetting while preserving retention. However, these approaches assume that the forget and retain sets are readily available, wh…
◆ 1.3#cs.lg#cs.ai - Global Attention with Linear Complexity for Exascale Generative Data Assimilation in Earth System Prediction
arXiv:2604.16590v1 Announce Type: new Abstract: Accurate weather and climate prediction relies on data assimilation (DA), which estimates the Earth system state by integrating observations with models. While exascale computing has significantly advanced earth simulation, scalable and accurate inference of the Earth system state remains a fundamental bottleneck, limiting uncertainty quantification…
◆ 1.0#cs.lg#cs.ai - Hybrid Spectro-Temporal Fusion Framework for Structural Health Monitoring
arXiv:2604.16589v1 Announce Type: new Abstract: Structural health monitoring plays a critical role in ensuring structural safety by analyzing vibration responses from engineering systems. This paper proposes a Spectro-Temporal Alignment framework and a Hybrid Spectro-Temporal Fusion framework that integrate arrival-time interval descriptors with spectral features to capture both fine-scale and co…
◆ 1.0#cs.lg#cs.ai - A Systematic Survey and Benchmark of Deep Learning for Molecular Property Prediction in the Foundation Model Era
arXiv:2604.16586v1 Announce Type: new Abstract: Molecular property prediction integrates quantum chemistry, cheminformatics, and deep learning to connect molecular structure with physicochemical and biological behavior. This survey traces four complementary paradigms, including Quantum, Descriptor Machine Learning, Geometric Deep Learning, and Foundation Models, and outlines a unified taxonomy li…
◆ 1.0#cs.lg#cs.ai - Towards Trustworthy Depression Estimation via Disentangled Evidential Learning
arXiv:2604.16579v1 Announce Type: new Abstract: Automated depression estimation is highly vulnerable to signal corruption and ambient noise in real-world deployment. Prevailing deterministic methods produce uncalibrated point estimates, exposing safety-critical clinical systems to the severe risk of overconfident misdiagnoses. To establish a highly resilient and trustworthy assessment paradigm, w…
◆ 1.0#cs.lg#cs.ai - NCO4CVRP: Neural Combinatorial Optimization for the Capacitated Vehicle Routing Problem
arXiv:2604.16581v1 Announce Type: new Abstract: Neural Combinatorial Optimization (NCO) has emerged as a powerful framework for solving combinatorial optimization problems by integrating deep learning-based models. This work focuses on improving existing inference techniques to enhance solution quality and generalization. Specifically, we modify the Random Re-Construct (RRC) approach of the Light…
◆ 1.0#cs.lg#cs.ai - Continuous ageing trajectory representations for knee-aware lifetime prediction of lithium-ion batteries across heterogeneous dataset
arXiv:2604.16580v1 Announce Type: new Abstract: Accurate assessment of lithium-ion battery ageing is challenged by cell-to-cell variability, heterogeneous cycling protocols, and limited transferability of data-driven models across datasets. In particular, robust identification of degradation transitions, such as the knee point, and reliable early-life prediction of remaining useful life (RUL) rem…
◆ 1.0#cs.lg#cs.ai - Evaluating Temporal and Structural Anomaly Detection Paradigms for DDoS Traffic
arXiv:2604.16575v1 Announce Type: new Abstract: Unsupervised anomaly detection is widely used to detect Distributed Denial-of-Service (DDoS) attacks in cloud-native 5G networks, yet most studies assume a fixed traffic representation, either temporal or structural, without validating which feature space best matches the data. We propose a lightweight decision framework that prioritizes temporal or…
◆ 1.0#cs.lg#cs.ai - From User Recognition to Activity Counting: An Identity-Agnostic Approach to Multi-User WiFi Sensing
arXiv:2604.16572v1 Announce Type: new Abstract: Wi-Fi Channel State Information (CSI) enables device-free human activity recognition, but existing multi-user approaches assume a fixed set of known users during both training and inference. This closed-set assumption limits deployment, as models trained on a specific user set degrade when applied to new individuals or environments. We reformulate m…
◆ 1.0#cs.lg - Positive-Only Drifting Policy Optimization
arXiv:2604.16519v1 Announce Type: new Abstract: In the field of online reinforcement learning (RL), traditional Gaussian policies and flow-based methods are often constrained by their unimodal expressiveness, complex gradient clipping, or stringent trust-region requirements. Moreover, they all rely on post-hoc penalization of negative samples to correct erroneous actions. This paper introduces Po…
◆ 1.0#cs.lg#cs.ro - Cross-Modal Generation: From Commodity WiFi to High-Fidelity mmWave and RFID Sensing
arXiv:2604.16558v1 Announce Type: new Abstract: AIGC has shown remarkable success in CV and NLP, and has recently demonstrated promising potential in the wireless domain. However, significant data imbalance exists across RF modalities, with abundant WiFi data but scarce mmWave and RFID data due to high acquisition cost. This makes it difficult to train high-quality generative models for these dat…
◆ 1.0#cs.lg - S-GRPO: Unified Post-Training for Large Vision-Language Models
arXiv:2604.16557v1 Announce Type: new Abstract: Current post-training methodologies for adapting Large Vision-Language Models (LVLMs) generally fall into two paradigms: Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). Despite their prevalence, both approaches suffer from inefficiencies when applied in isolation. SFT forces the model's generation along a single expert trajectory, ofte…
◆ 1.0#cs.lg#cs.cl - LLM as a Tool, Not an Agent: Code-Mined Tree Transformations for Neural Architecture Search
arXiv:2604.16555v1 Announce Type: new Abstract: Neural Architecture Search (NAS) aims to automatically discover high-performing deep neural network (DNN) architectures. However, conventional algorithm-driven NAS relies on carefully hand-crafted search spaces to ensure executability, which restricts open-ended exploration. Recent coding-based agentic approaches using large language models (LLMs) r…
◆ 1.3#cs.lg#cs.ai - Towards Reliable Testing of Machine Unlearning
arXiv:2604.16536v1 Announce Type: new Abstract: Machine learning components are now central to AI-infused software systems, from recommendations and code assistants to clinical decision support. As regulations and governance frameworks increasingly require deleting sensitive data from deployed models, machine unlearning is emerging as a practical alternative to full retraining. However, unlearnin…
◆ 1.0#cs.lg#cs.ai - SCATR: Simple Calibrated Test-Time Ranking
arXiv:2604.16535v1 Announce Type: new Abstract: Test-time scaling (TTS) improves large language models (LLMs) by allocating additional compute at inference time. In practice, TTS is often achieved through parallel scaling: generating multiple candidate responses and selecting the best via a Best-of-N (BoN) strategy. Its effectiveness therefore hinges on the scoring function. Learned scorers such …
◆ 1.3#cs.lg#cs.ai - BASIS: Balanced Activation Sketching with Invariant Scalars for "Ghost Backpropagation"
arXiv:2604.16324v1 Announce Type: new Abstract: The activation memory required for exact backpropagation scales linearly with network depth, context length, and feature dimensionality, forming an O(L * BN ) spatial bottleneck (where B is the sequence-batch cardinality and N is the feature dimension). This constraint historically throttles the scaling of deep neural networks. While randomized auto…
◆ 1.0#cs.lg - Sampling for Quality: Training-Free Reward-Guided LLM Decoding via Sequential Monte Carlo
arXiv:2604.16453v1 Announce Type: new Abstract: We introduce a principled probabilistic framework for reward-guided decoding in large language models, addressing the limitations of standard decoding methods that optimize token-level likelihood rather than sequence-level quality. Our method defines a reward-augmented target distribution over complete sequences by combining model transition probabi…
◆ 1.3#cs.lg#cs.ai - Dimensional Criticality at Grokking Across MLPs and Transformers
arXiv:2604.16431v1 Announce Type: new Abstract: Abrupt transitions between distinct dynamical regimes are a hallmark of complex systems. Grokking in deep neural networks provides a striking example -- an abrupt transition from memorization to generalization long after training accuracy saturates -- yet robust macroscopic signatures of this transition remain elusive. Here we introduce \textbf{TDU-…
◆ 1.3#cs.lg#cond-mat.dis-nn - Non-Stationarity in the Embedding Space of Time Series Foundation Models
arXiv:2604.16428v1 Announce Type: new Abstract: Time series foundation models (TSFMs) are widely used as generic feature extractors, yet the notion of non-stationarity in their embedding spaces remains poorly understood. Recent work often conflates non-stationarity with distribution shift, blurring distinctions fundamental to classical time-series analysis and long-standing methodologies such as …
◆ 1.0#cs.lg#cs.ai - Functional Similarity Metric for Neural Networks: Overcoming Parametric Ambiguity via Activation Region Analysis
arXiv:2604.16426v1 Announce Type: new Abstract: As modern deep learning architectures grow in complexity, representational ambiguity emerges as a critical barrier to their interpretability and reliable merging. For ReLU networks, identical functional mappings can be achieved through entirely different weight configurations due to algebraic symmetries: neuron permutation and positive diagonal scal…
◆ 1.0#cs.lg - FedOBP: Federated Optimal Brain Personalization through Cloud-Edge Element-wise Decoupling
arXiv:2604.16574v1 Announce Type: new Abstract: Federated Learning (FL) faces challenges from client data heterogeneity and resource-constrained mobile devices, which can degrade model accuracy. Personalized Federated Learning (PFL) addresses this issue by adapting shared global knowledge to local data distributions. A promising approach in PFL is model decoupling, which separates the model into …
◆ 1.0#cs.lg#cs.ai - Shifting the Gradient: Understanding How Defensive Training Methods Protect Language Model Integrity
arXiv:2604.16423v1 Announce Type: new Abstract: Defensive training methods such as positive preventative steering (PPS) and inoculation prompting (IP) offer surprising results through seemingly similar processes: both add trait-inducing objects to large language models (LLMs) during training, and both defend the LLM against acquiring the trait. The surprising success of these methods comes with t…
◆ 1.3#cs.lg#cs.ai - G-PARC: Graph-Physics Aware Recurrent Convolutional Neural Networks for Spatiotemporal Dynamics on Unstructured Meshes
arXiv:2604.16533v1 Announce Type: new Abstract: Physics-aware recurrent convolutional networks (PARC) have demonstrated strong performance in predicting nonlinear spatiotemporal dynamics by embedding differential operators directly into the computational graph of a neural network. However, pixel-based convolutions are restricted to static, uniform Cartesian grids, making them ill-suited to follow…
◆ 1.0#cs.lg#cs.ai - CGCMA: Conditionally-Gated Cross-Modal Attention for Event-Conditioned Asynchronous Fusion
arXiv:2604.16411v1 Announce Type: new Abstract: We study asynchronous alignment, a first-class multimodal learning setting in which a dense primary stream must be fused with sporadic external context whose value depends on when it arrives. Unlike standard multimodal benchmarks that assume structural synchrony, this setting requires models to reason explicitly about freshness and trust. We focus o…
◆ 1.0#cs.lg - SaFeR-Steer: Evolving Multi-Turn MLLMs via Synthetic Bootstrapping and Feedback Dynamics
arXiv:2604.16358v1 Announce Type: new Abstract: MLLMs are increasingly deployed in multi-turn settings, where attackers can escalate unsafe intent through the evolving visual-text history and exploit long-context safety decay. Yet safety alignment is still dominated by single-turn data and fixed-template dialogues, leaving a mismatch between training and deployment.To bridge this gap, we propose …
◆ 1.3#cs.lg#cs.cl - Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents
arXiv:2604.16335v1 Announce Type: new Abstract: Despite recent progress in Large Language Model (LLM) Agents for Software Engineering (SWE) tasks, end-to-end fine-tuning typically relies on verifiable terminal rewards such as whether all unit tests pass. While these binary signals reflect whether the final solution is correct, they provide little guidance for shaping intermediate behaviors during…
◆ 1.3#cs.lg#cs.ai - Preventing overfitting in deep learning using differential privacy
arXiv:2604.16334v1 Announce Type: new Abstract: The use of Deep Neural Network based systems in the real world is growing. They have achieved state-of-the-art performance on many image, speech and text datasets. They have been shown to be powerful systems that are capable of learning detailed relationships and abstractions from the data. This is a double-edged sword which makes such systems vulne…
◆ 1.0#cs.lg#cs.ai - PINNACLE: An Open-Source Computational Framework for Classical and Quantum PINNs
arXiv:2604.15645v1 Announce Type: new Abstract: We present PINNACLE, an open-source computational framework for physics-informed neural networks (PINNs) that integrates modern training strategies, multi-GPU acceleration, and hybrid quantum-classical architectures within a unified modular workflow. The framework enables systematic evaluation of PINN performance across benchmark problems including …
◆ 0.7#cs.lg#physics.comp-ph - NK-GAD: Neighbor Knowledge-Enhanced Unsupervised Graph Anomaly Detection
arXiv:2604.15668v1 Announce Type: new Abstract: Graph anomaly detection aims to identify irregular patterns in graph-structured data. Most unsupervised GNN-based methods rely on the homophily assumption that connected nodes share similar attributes. However, real-world graphs often exhibit attribute-level heterophily, where connected nodes have dissimilar attributes. Our analysis of attribute-lev…
◆ 0.7#cs.lg - Optimizing Stochastic Gradient Push under Broadcast Communications
arXiv:2604.15549v1 Announce Type: new Abstract: We consider the problem of minimizing the convergence time for decentralized federated learning (DFL) in wireless networks under broadcast communications, with focus on mixing matrix design. The mixing matrix is a critical hyperparameter for DFL that simultaneously controls the convergence rate across iterations and the communication demand per iter…
◆ 0.7#cs.lg#cs.dc - Predicting Where Steering Vectors Succeed
arXiv:2604.15557v1 Announce Type: new Abstract: Steering vectors work for some concepts and layers but fail for others, and practitioners have no way to predict which setting applies before running an intervention. We introduce the Linear Accessibility Profile (LAP), a per-layer diagnostic that repurposes the logit lens as a predictor of steering vector effectiveness. The key measure, $A_{\mathrm…
◆ 0.7#cs.lg#cs.cl - Reward Weighted Classifier-Free Guidance as Policy Improvement in Autoregressive Models
arXiv:2604.15577v1 Announce Type: new Abstract: Consider an auto-regressive model that produces outputs x (e.g., answers to questions, molecules) each of which can be summarized by an attribute vector y (e.g., helpfulness vs. harmlessness, or bio-availability vs. lipophilicity). An arbitrary reward function r(y) encodes tradeoffs between these properties. Typically, tilting the model's sampling d…
◆ 0.7#cs.lg#cs.ai - PAWN: Piece Value Analysis with Neural Networks
arXiv:2604.15585v1 Announce Type: new Abstract: Predicting the relative value of any given chess piece in a position remains an open challenge, as a piece's contribution depends on its spatial relationships with every other piece on the board. We demonstrate that incorporating the state of the full chess board via latent position representations derived using a CNN-based autoencoder significantly…
◆ 0.7#cs.lg#cs.ai - Adapting in the Dark: Efficient and Stable Test-Time Adaptation for Black-Box Models
arXiv:2604.15609v1 Announce Type: new Abstract: Test-Time Adaptation (TTA) for black-box models accessible only via APIs remains a largely unexplored challenge. Existing approaches such as post-hoc output refinement offer limited adaptive capacity, while Zeroth-Order Optimization (ZOO) enables input-space adaptation but faces high query costs and optimization challenges in the unsupervised TTA se…
◆ 0.7#cs.lg#cs.cv - VoodooNet: Achieving Analytic Ground States via High-Dimensional Random Projections
arXiv:2604.15613v1 Announce Type: new Abstract: We present VoodooNet, a non-iterative neural architecture that replaces the stochastic gradient descent (SGD) paradigm with a closed-form analytic solution via Galactic Expansion. By projecting input manifolds into a high-dimensional, high-entropy "Galactic" space ($d \gg 784$), we demonstrate that complex features can be untangled without the therm…
◆ 0.7#cs.lg#cs.ai - Flexible Empowerment at Reasoning with Extended Best-of-N Sampling
arXiv:2604.15614v1 Announce Type: new Abstract: This paper proposes a novel method that incorporates empowerment when reasoning actions in reinforcement learning (RL), thereby achieving the flexibility of exploration-exploitation dilemma (EED). In previous methods, empowerment for promoting exploration has been provided as a bonus term to the task-specific reward function as an intrinsically-moti…
◆ 0.7#cs.lg - Majority Voting for Code Generation
arXiv:2604.15618v1 Announce Type: new Abstract: We investigate Functional Majority Voting (FMV), a method based on functional consensus for code generation with Large Language Models, which identifies a representative solution from multiple generations using their runtime execution signatures on test inputs. We find that FMV is an effective test-time inference strategy, substantially boosting per…
◆ 0.7#cs.lg - Graph self-supervised learning based on frequency corruption
arXiv:2604.15699v1 Announce Type: new Abstract: Graph self-supervised learning can reduce the need for labeled graph data and has been widely used in recommendation, social networks, and other web applications. However, existing methods often underuse high-frequency signals and may overfit to specific local patterns, which limits representation quality and generalization. We propose Frequency-Cor…
◆ 0.7#cs.lg#cs.si - Towards Robust Endogenous Reasoning: Unifying Drift Adaptation in Non-Stationary Tuning
arXiv:2604.15705v1 Announce Type: new Abstract: Reinforcement Fine-Tuning (RFT) has established itself as a critical paradigm for the alignment of Multi-modal Large Language Models (MLLMs) with complex human values and domain-specific requirements. Nevertheless, current research primarily focuses on mitigating exogenous distribution shifts arising from data-centric factors, the non-stationarity i…
◆ 0.9#cs.lg - Reasoning-targeted Jailbreak Attacks on Large Reasoning Models via Semantic Triggers and Psychological Framing
arXiv:2604.15725v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) have demonstrated strong capabilities in generating step-by-step reasoning chains alongside final answers, enabling their deployment in high-stakes domains such as healthcare and education. While prior jailbreak attack studies have focused on the safety of final answers, little attention has been given to the safety of …
◆ 0.7#cs.lg#cs.ai - M3R: Localized Rainfall Nowcasting with Meteorology-Informed MultiModal Attention
arXiv:2604.15377v1 Announce Type: new Abstract: Accurate and timely rainfall nowcasting is crucial for disaster mitigation and water resource management. Despite recent advances in deep learning, precipitation prediction remains challenging due to limitations in effectively leveraging diverse multimedia data sources. We introduce M3R, a Meteorology-informed MultiModal attention-based architecture…
◆ 0.7#cs.lg#cs.cv - FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models
arXiv:2604.15488v1 Announce Type: new Abstract: Large language models (LLMs) often exhibit undesirable behaviors, such as safety violations and hallucinations. Although inference-time steering offers a cost-effective way to adjust model behavior without updating its parameters, existing methods often fail to be simultaneously effective, utility-preserving, and training-efficient due to their rigi…
◆ 0.9#cs.lg#cs.ai - Lightweight Geometric Adaptation for Training Physics-Informed Neural Networks
arXiv:2604.15392v1 Announce Type: new Abstract: Physics-Informed Neural Networks (PINNs) often suffer from slow convergence, training instability, and reduced accuracy on challenging partial differential equations due to the anisotropic and rapidly varying geometry of their loss landscapes. We propose a lightweight curvature-aware optimization framework that augments existing first-order optimize…
◆ 0.7#cs.lg#cs.ai - StoSignSGD: Unbiased Structural Stochasticity Fixes SignSGD for Training Large Language Models
arXiv:2604.15416v1 Announce Type: new Abstract: Sign-based optimization algorithms, such as SignSGD, have garnered significant attention for their remarkable performance in distributed learning and training large foundation models. Despite their empirical superiority, SignSGD is known to diverge on non-smooth objectives, which are ubiquitous in modern machine learning due to ReLUs, max-pools, and…
◆ 0.7#cs.lg#cs.ai - Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit
arXiv:2604.15356v1 Announce Type: new Abstract: Recent work on KV cache quantization, culminating in TurboQuant, has approached the Shannon entropy limit for per-vector compression of transformer key-value caches. We observe that this limit applies to a strictly weaker problem than the one that actually matters: compressing the KV cache as a sequence. The tokens stored in a KV cache are not arbit…
◆ 0.7#cs.lg#cs.ai - The Spectral Geometry of Thought: Phase Transitions, Instruction Reversal, Token-Level Dynamics, and Perfect Correctness Prediction in How Transformers Reason
arXiv:2604.15350v1 Announce Type: new Abstract: We discover that large language models exhibit \emph{spectral phase transitions} in their hidden activation spaces when engaging in reasoning versus factual recall. Through systematic spectral analysis across \textbf{11 models} spanning \textbf{5 architecture families} (Qwen, Pythia, Phi, Llama, DeepSeek-R1), we identify \textbf{seven} core phenomen…
◆ 1.3#cs.lg - Aletheia: Gradient-Guided Layer Selection for Efficient LoRA Fine-Tuning Across Architectures
arXiv:2604.15351v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) has become the dominant parameter-efficient fine-tuning method for large language models, yet standard practice applies LoRA adapters uniformly to all transformer layers regardless of their relevance to the downstream task. We introduce Aletheia, a gradient-guided layer selection method that identifies the most task-releva…
◆ 0.7#cs.lg#cs.cl - Mapping High-Performance Regions in Battery Scheduling across Data Uncertainty, Battery Design, and Planning Horizons
arXiv:2604.15360v1 Announce Type: new Abstract: This study presents a triadic analysis of energy storage operation under multi-stage model predictive control, investigating the interplay between data characteristics, forecast uncertainty, planning horizon, and battery c-rate. Synthetic datasets are generated to systematically explore variations in data profiles and uncertainty, enabling parametri…
◆ 0.7#cs.lg#cs.sy - PRL-Bench: A Comprehensive Benchmark Evaluating LLMs' Capabilities in Frontier Physics Research
arXiv:2604.15411v1 Announce Type: new Abstract: The paradigm of agentic science requires AI systems to conduct robust reasoning and engage in long-horizon, autonomous exploration. However, current scientific benchmarks remain confined to domain knowledge comprehension and complex reasoning, failing to evaluate the exploratory nature and procedural complexity of real-world research. In this work, …
◆ 0.9#cs.lg#cs.ai - Python library supporting Discrete Variational Formulations and training solutions with Collocation-based Robust Variational Physics Informed Neural Networks (DVF-CRVPINN)
arXiv:2604.15398v1 Announce Type: new Abstract: We explore the possibility of solving Partial Differential Equations (PDEs) using discrete weak formulations. We propose a programming environment for defining a discrete computational domain, introducing discrete functions defined over a set of points, constructing discrete inner products, and introducing discrete weak formulations employing Kronec…
◆ 0.7#cs.lg#cs.na - Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation
arXiv:2604.15400v1 Announce Type: new Abstract: We present causal evidence that hallucination in autoregressive language models is an early trajectory commitment governed by asymmetric attractor dynamics. Using same-prompt bifurcation, in which we repeatedly sample identical inputs to observe spontaneous divergence, we isolate trajectory dynamics from prompt-level confounds. On Qwen2.5-1.5B acros…
◆ 0.9#cs.lg#cs.ai - Dispatch-Aware Ragged Attention for Pruned Vision Transformers
arXiv:2604.15408v1 Announce Type: new Abstract: Token pruning methods for Vision Transformers (ViTs) promise quadratic reductions in attention FLOPs by dropping uninformative patches. Yet when pruned sequences are executed with state-of-the-art variable-length attention APIs -- including FlashAttention-2's varlen and PyTorch's NestedTensor SDPA-the wall-clock attention latency doesn't scale accor…
◆ 0.7#cs.lg#cs.ai - The Illusion of Equivalence: Systematic FP16 Divergence in KV-Cached Autoregressive Inference
arXiv:2604.15409v1 Announce Type: new Abstract: KV caching is a ubiquitous optimization in autoregressive transformer inference, long presumed to be numerically equivalent to cache-free computation. This assumption fails under standard FP16 precision: cache-ON and cache-OFF execution paths employ different floating-point accumulation orderings which, due to FP16 non-associativity, produce a deter…
◆ 0.7#cs.lg#cs.ai - Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning
arXiv:2604.15414v1 Announce Type: new Abstract: Continual reinforcement learning must balance retention with adaptation, yet many methods still rely on \emph{single-model preservation}, committing to one evolving policy as the main reusable solution across tasks. Even when a previously successful policy is retained, it may no longer provide a reliable starting point for rapid adaptation after int…
◆ 0.7#cs.lg#cs.ai - Neural Continuous-Time Markov Chain: Discrete Diffusion via Decoupled Jump Timing and Direction
arXiv:2604.15694v1 Announce Type: new Abstract: Discrete diffusion models based on continuous-time Markov chains (CTMCs) have shown strong performance on language and discrete data generation, yet existing approaches typically parameterize the reverse rate matrix as a single object -- via concrete scores, clean-data predictions ($x_0$-parameterization), or denoising distributions -- rather than a…
◆ 0.7#cs.lg#math.pr - Transfer Learning from Foundational Optimization Embeddings to Unsupervised SAT Representations
arXiv:2604.15448v1 Announce Type: new Abstract: Foundational optimization embeddings have recently emerged as powerful pre-trained representations for mixed-integer programming (MIP) problems. These embeddings were shown to enable cross-domain transfer and reduce reliance on solver-generated labels. In this work, we investigate whether such representations generalize beyond optimization to decisi…
◆ 0.7#cs.lg#cs.ai - Evaluating LLM Simulators as Differentially Private Data Generators
arXiv:2604.15461v1 Announce Type: new Abstract: LLM-based simulators offer a promising path for generating complex synthetic data where traditional differentially private (DP) methods struggle with high-dimensional user profiles. But can LLMs faithfully reproduce statistical distributions from DP-protected inputs? We evaluate this using PersonaLedger, an agentic financial simulator, seeded with D…
◆ 0.9#cs.lg#cs.cl - ${\pi}_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities
arXiv:2604.15483v1 Announce Type: new Abstract: We present a new robotic foundation model, called ${\pi}_{0.7}$, that can enable strong out-of-the-box performance in a wide range of scenarios. ${\pi}_{0.7}$ can follow diverse language instructions in unseen environments, including multi-stage tasks with various kitchen appliances, provide zero-shot cross-embodiment generalization, for example ena…
◆ 0.7#cs.lg#cs.ro - ProtoTTA: Prototype-Guided Test-Time Adaptation
arXiv:2604.15494v1 Announce Type: new Abstract: Deep networks that rely on prototypes-interpretable representations that can be related to the model input-have gained significant attention for balancing high accuracy with inherent interpretability, which makes them suitable for critical domains such as healthcare. However, these models are limited by their reliance on training data, which hampers…
◆ 0.7#cs.lg#cs.cv - Natural gradient descent with momentum
arXiv:2604.15554v1 Announce Type: new Abstract: We consider the problem of approximating a function by an element of a nonlinear manifold which admits a differentiable parametrization, typical examples being neural networks with differentiable activation functions or tensor networks. Natural gradient descent (NGD) for the optimization of a loss function can be seen as a preconditioned gradient de…
◆ 0.7#cs.lg#cs.ai - Why Colors Make Clustering Harder:Global Integrality Gaps, the Price of Fairness, and Color-Coupled Algorithms in Chromatic Correlation Clustering
arXiv:2604.15738v1 Announce Type: new Abstract: Chromatic Correlation Clustering (CCC) extends Correlation Clustering by assigning semantic colors to edges and requiring each cluster to receive a single color label. Unlike standard CC, whose LP relaxation has integrality gap 2 on complete graphs and admits a 2.06-approximation, the analogous LP for CCC has a strict lower bound of 2.11, and the be…
◆ 0.7#cs.lg - Harmonizing Multi-Objective LLM Unlearning via Unified Domain Representation and Bidirectional Logit Distillation
arXiv:2604.15482v1 Announce Type: new Abstract: Large Language Models (LLMs) unlearning is crucial for removing hazardous or privacy-leaking information from the model. Practical LLM unlearning demands satisfying multiple challenging objectives simultaneously: removing undesirable knowledge, preserving general utility, avoiding over-refusal of neighboring concepts, and, crucially, ensuring robust…
◆ 0.9#cs.lg#cs.ai - Learning Affine-Equivariant Proximal Operators
arXiv:2604.15556v1 Announce Type: new Abstract: Proximal operators are fundamental across many applications in signal processing and machine learning, including solving ill-posed inverse problems. Recent work has introduced Learned Proximal Networks (LPNs), providing parametric functions that compute exact proximals for data-driven and potentially non-convex regularizers. However, in many setting…
◆ 0.7#cs.lg#cs.cv - Stargazer: A Scalable Model-Fitting Benchmark Environment for AI Agents under Astrophysical Constraints
arXiv:2604.15664v1 Announce Type: new Abstract: The rise of autonomous AI agents suggests that dynamic benchmark environments with built-in feedback on scientifically grounded tasks are needed to evaluate the capabilities of these agents in research work. We introduce Stargazer, a scalable environment for evaluating AI agents on dynamic, iterative physics-grounded model-fitting tasks using infere…
◆ 0.7#cs.lg - Collective Kernel EFT for Pre-activation ResNets
arXiv:2604.15742v1 Announce Type: new Abstract: In finite-width deep neural networks, the empirical kernel $G$ evolves stochastically across layers. We develop a collective kernel effective field theory (EFT) for pre-activation ResNets based on a $G$-only closure hierarchy and diagnose its finite validity window. Exploiting the exact conditional Gaussianity of residual increments, we derive an ex…
◆ 0.7#cs.lg#hep-th - Faster LLM Inference via Sequential Monte Carlo
arXiv:2604.15672v1 Announce Type: new Abstract: Speculative decoding (SD) accelerates language model inference by drafting tokens from a cheap proposal model and verifying them against an expensive target model via rejection sampling. Because rejection truncates the draft block at the first error, throughput degrades when draft and target diverge. Rather than rejecting draft tokens outright, we p…
◆ 0.9#cs.lg#cs.cl - Hierarchical Active Inference using Successor Representations
arXiv:2604.15679v1 Announce Type: new Abstract: Active inference, a neurally-inspired model for inferring actions based on the free energy principle (FEP), has been proposed as a unifying framework for understanding perception, action, and learning in the brain. Active inference has previously been used to model ecologically important tasks such as navigation and planning, but scaling it to solve…
◆ 0.7#cs.lg#cs.ai