We developed AnyAttack, a framework that can turn ordinary images into targeted adversarial examples that fool Vision-Language Models. By pre-training on LAION-400M dataset, our method can make a benign image (like a dog) trick VLMs into generating any specified output (like "this is violent content"), working across both open-source and commercial models.
Vision-Language Models (VLMs) have revolutionized multimodal AI applications, yet their vulnerability to adversarial manipulation presents significant security challenges. Traditional targeted attacks require predefined labels, severely limiting their scalability and real-world impact.
We introduce AnyAttack, a novel self-supervised framework that achieves unprecedented attack flexibility through large-scale foundation model training. By pre-training an adversarial noise generator on the LAION-400M dataset without label supervision, our approach enables transforming any benign image into an attack vector that can target any desired output across different VLM architecture.
Our comprehensive evaluation demonstrates AnyAttack's effectiveness across five open-source VLMs (CLIP, BLIP, BLIP2, InstructBLIP, MiniGPT-4) on diverse multimodal tasks including retrieval, classification, and image captioning. Most notably, AnyAttack successfully transfers to commercial systems (Google Gemini, Claude Sonnet, Microsoft Copilot, OpenAI GPT), revealing systemic vulnerabilities in the VLM ecosystem.
This work establishes the first foundation model for adversarial attacks, fundamentally reshaping the threat landscape and highlighting the urgent need for robust defense mechanisms against this new class of scalable, transferable attacks.
Our proposed framework, AnyAttack, introduces a novel two-phase approach to generating targeted adversarial examples without label supervision:
In the pre-training phase, we leverage the large-scale LAION-400M dataset (𝒟p) to develop a universal understanding of adversarial patterns. We train a decoder network F to produce adversarial noise δ while using a frozen encoder E as the surrogate model:
In the fine-tuning phase, we adapt the pre-trained decoder F to specific downstream tasks and datasets (𝒟f):
This two-phase approach enables AnyAttack to achieve unprecedented flexibility - any benign image can be transformed into an adversarial example capable of inducing any desired output from target VLMs. By pre-training on massive-scale data, our method develops transferable adversarial capabilities that generalize across models and tasks.
We evaluated AnyAttack across both open-source and commercial Vision-Language Models, demonstrating its unprecedented transferability and effectiveness.
Our results reveal a systemic vulnerability across the entire VLM ecosystem. Despite being trained on different datasets with different architectures, both open-source and commercial models remain susceptible to our self-supervised attack approach. This highlights the urgent need for robust defense mechanisms against this new class of transferable adversarial attacks.
AnyAttack demonstrates that self-supervised adversarial learning at scale creates a fundamentally new security challenge for Vision-Language Models, requiring urgent development of robust defenses against this class of transferable attacks.
We have open-sourced our LAION-400M pre-trained adversarial image generator, which can produce targeted adversarial examples with just a single forward pass. This represents a significant efficiency improvement over traditional adversarial training methods that require costly gradient calculations.
Most importantly, our pre-trained generator potentially offers a promising alternative to conventional adversarial training on large models. By generating diverse adversarial examples efficiently, this approach could enable more practical and scalable robustness enhancements for the next generation of multimodal AI systems.
@inproceedings{zhang2025anyattack,
title={Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models},
author={Zhang, Jiaming and Ye, Junhong and Ma, Xingjun and Li, Yige and Yang, Yunfan and Yunhao, Chen and Sang, Jitao and Yeung, Dit-Yan},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}