Modern Multimodal Large Reasoning Models (MLRMs) can infer precise locations from seemingly innocuous images by performing step-by-step hierarchical reasoning.
Figure 1: Geographic inference vulnerability in MLRMs. Given a personal image, MLRMs employ hierarchical reasoning to progressively narrow location estimates from continental to street-level precision.
Multi-modal large reasoning models (MLRMs) pose significant privacy risks by inferring precise geographic locations from personal images through hierarchical chain-of-thought reasoning. Existing privacy protection techniques, primarily designed for perception-based models, prove ineffective against MLRMs' sophisticated multi-step reasoning processes that analyze environmental cues.
We introduce ReasonBreak, a novel adversarial framework specifically designed to disrupt hierarchical reasoning in MLRMs through concept-aware perturbations. Our approach is founded on the key insight that effective disruption of geographic reasoning requires perturbations aligned with conceptual hierarchies rather than uniform noise. ReasonBreak strategically targets critical conceptual dependencies within reasoning chains, generating perturbations that invalidate specific inference steps and cascade through subsequent reasoning stages.
To facilitate this approach, we contribute GeoPrivacy-6K, a comprehensive dataset comprising 6,341 ultra-high-resolution images with hierarchical concept annotations. Extensive evaluation across seven state-of-the-art MLRMs (including GPT-o3, GPT-5, Gemini 2.5 Pro) demonstrates ReasonBreak's superior effectiveness, achieving a 14.4% improvement in tract-level protection (33.8% vs 19.4%) and nearly doubling block-level protection (33.5% vs 16.8%). This work establishes a new paradigm for privacy protection against reasoning-based threats.
Figure 2: The ReasonBreak Framework Overview. 1) The input image undergoes Adaptive Decomposition into an m* × n* grid of blocks. 2) Each block Bk is assigned a set of relevant concepts Ck via spatial overlap analysis. 3) The Minimax Target Selection uses the assigned concept set Ck and a pre-computed Embedding Bank E to find a hard-negative prior ekprior. 4) This prior is fed into the learnable Decoder Gθ to synthesize a block-specific perturbation δk. 5) The final adversarial image I' is reconstructed by adding the perturbations to their corresponding clean blocks. The dashed boxes at the bottom illustrate the three possible outcomes of the concept assignment logic in step (2): a block may be assigned a single concept (left), multiple concepts (middle), or the default set of all image concepts if it has no spatial overlap (right).
We evaluated ReasonBreak on the DoxBench dataset against seven state-of-the-art MLRMs. The results demonstrate significant improvements in privacy protection rates, particularly at finer geographic granularities (Tract and Block levels).
Table 1: Privacy protection performance across geographical granularities on DoxBench.
To advance research in reasoning-aware privacy, we introduce GeoPrivacy-6K, a specialized dataset of 6,341 ultra-high-resolution images. Unlike standard datasets, GeoPrivacy-6K is annotated with hierarchical geographic concepts (spanning continental, national, city, and local levels) and precise spatial bounding boxes.
@article{zhang2025reasonbreak,
title={Disrupting Hierarchical Reasoning: Adversarial Protection for Geographic Privacy in Multimodal Reasoning Models},
author={Zhang, Jiaming and Wang, Che and Cao, Yang and Huang, Longtao and Lim, Wei Yang Bryan},
journal={arXiv preprint arXiv:2512.08503},
year={2025}
}