Curvature-guided anisotropic noise injection for robust multimodal data processing in neuroscience and perception science
Article excerpt
IntroductionLarge-batch training is widely used to scale multimodal neural networks that integrate heterogeneous inputs such as visual, textual, and physiological signals. However, increasing the batch size suppresses the stochastic fluctuations of mini-batch sampling, which can trap multimodal models in sharp,…
IntroductionLarge-batch training is widely used to scale multimodal neural networks that integrate heterogeneous inputs such as visual, textual, and physiological signals. However, increasing the batch size suppresses the stochastic fluctuations of mini-batch sampling, which can trap multimodal models in sharp, modality-dominant minima and produce a persistent generalization gap.MethodsTo address this problem, we propose Geometric Anisotropic Noise Injection (GANI), a curvature-aware optimization framework inspired by information geometry and multisensory integration. GANI decouples deterministic large-batch descent from stochastic geometric exploration. It approximates local curvature through an exponential moving average of first-order gradients and injects structured anisotropic noise during parameter updates, thereby restoring the geometry-aware exploration dynamics of small-batch stochastic gradient descent with linear computational complexity.ResultsTheoretical analysis shows that curvature-aligned stochasticity can accelerate escape from sharp modality-specific basins and guide parameters toward flatter regions. Experimental evaluations across multimodal benchmark settings and stress tests demonstrate that GANI reduces the generalization gap, improves convergence stability, bounds the maximum Hessian eigenvalue, and maintains stronger performance under visual noise and missing textual information than standard large-batch SGD and common adaptive optimizers.DiscussionBy linking optimization geometry with multimodal representation dynamics, GANI provides an efficient and interpretable mechanism for robust heterogeneous data processing. The framework offers potential value for uncertainty-aware multisensory integration, brain-inspired perception science, and scalable multimodal learning under noisy or incomplete sensory conditions.