Watermarks Attack Watermarks: Re-Watermarking as a Generic Removal Strategy

Authors: Maria Bulychev, Neil G. Marchant, Benjamin I. P. Rubinstein

Published: 2026-05-16 03:57:37+00:00

Comment: 9 pages, 6 figures

AI Summary

The paper introduces "re-watermarking" as a simple yet effective attack strategy against invisible image watermarking schemes, which reliably suppresses the original signal and allows for ownership forgery. This attack, which leverages watermarks to attack other watermarks, requires no gradients or detection keys. Furthermore, a classifier is developed to identify the victim watermarking method, enabling a fully blind and automated attack pipeline that achieves substantial watermark removal.

Abstract

Watermarking combines an imperceptible change to an input image that will trigger a detector, to assert provenance and protect intellectual property. The literature has shown great interest in attacks on watermarking schemes: attackers are clearly motivated to steal copyrighted material or circumvent legislated deepfake protections. In this work, we make a simple-yet-powerful observation: that such attacks on watermarking-like watermarks themselves-seek an imperceptible change to an input image (now already watermarked) that will trigger a detector. This analogy comparing watermark attacks to watermarking itself is highly suggestive: that watermarks could be used to attack watermarks. Our first contribution validates this hypothesis. In rigorous experiments spanning 96 combinations of dataset, victim, and attack watermarks, we show that simply re-watermarking an already watermarked image reliably suppresses the original signal, without requiring gradients, surrogate models, or detection keys. Our second contribution is a simple classifier for detecting the presence and identity of an existing watermark in a given image. Surprisingly, experimental findings demonstrate outstanding overall accuracies 0.878-0.953. This result is of independent interest as a security vulnerability: research shows that method-specific attacks achieve substantially stronger removal than black-box attacks. Taken together, watermark identification combined with re-watermarking successfully reduces bit accuracy by at least 25% and up to 48%. Our work constitutes a cheap, generic, and highly effective attack pipeline, calling into question the reliability of current watermarking schemes to such a simple attack, as well as the value of existing sophisticated attacks.

Key findings

Re-watermarking reliably suppresses original watermark signals across all tested schemes, reducing bit accuracy by 25-48% with significantly less perceptual degradation than diffusion-based baseline attacks. A ConvNeXt-V2 classifier effectively identifies specific watermarking methods from images with high accuracy (0.878-0.953), challenging undetectability claims and enabling targeted attacks. The end-to-end attack pipeline is highly successful, not only removing watermarks but also allowing adversaries to embed new messages and claim false ownership.

Approach

They propose a two-step attack pipeline. First, a lightweight ConvNeXt-V2 Large classifier identifies the specific watermarking method present in an image. Second, based on this identification, an optimal re-watermarking strategy is applied: using ZoDiac for in-processing victims or reapplying the same method for post-processing victims, effectively overwriting or corrupting the original watermark signal.

Datasets

DiffusionDB, MS-COCO, ImageNet-22K

Model(s)

UNKNOWN

Author countries

Australia

← Previous