Asymmetric Phase Coding Audio Watermarking

Authors: Guang Yang, Amir Ghasemian, Ninareh Mehrabi, Homa Hosseinmardi

Published: 2026-05-08 04:54:59+00:00

Comment: 13 pages, 12 figures, 3 tables

AI Summary

This paper introduces Asymmetric Phase Coding (APC), a training-free cryptographic audio watermarking scheme designed to provide auditable provenance for audio, especially against deepfakes. APC combines Ed25519 digital signatures, Reed-Solomon error correction, and a hybrid phase-domain and magnitude-QIM embedding mechanism. It achieves robust, blind-extractable, and non-repudiable watermarking, offering an alternative to passive forensic detectors.

Abstract

The proliferation of deepfake audio challenges voice-based authentication systems; passive forensic detectors are sensitive to evolving generative models and to real-world channel distortions. We propose Asymmetric Phase Coding (APC), a training-free cryptographic signing layer for audio, designed as a compact and auditable provenance primitive that can stand alone or be stacked with learned watermarks. APC combines Ed25519 digital signatures (EdDSA, FIPS 186-5; 64-byte signatures) with Reed-Solomon error correction, pseudo-random STFT phase-bin selection, and a redundant quantization-index-modulation (QIM) code on log-magnitude differences of adjacent bin pairs, yielding a compact, non-repudiable, blind-extractable watermark. We evaluate APC on 1,000 LibriSpeech test-clean clips (10 s each, 44.1 kHz) under eight attack configurations -- identity, 10% end-cropping, 20% end-cropping, 8 kHz low-pass, 16 kHz round-trip resampling, FLAC re-encoding, MP3 at 128 kbps, and OGG-Vorbis at 128 kbps -- and achieve cryptographic verification rates between 97.5% and 98.3% on every condition at mean PESQ=3.02 and tens-of-milliseconds CPU latency. We explicitly compare APC against recent neural baselines (AudioSeal, WavMark, SilentCipher), detail the threat model (forgery resistance vs. erasure), characterize the dataset, define all metrics, quantify an adaptive white-box erasure attack, and release code, keys, and metadata for reproducibility.


Key findings
APC achieved cryptographic verification rates between 97.5% and 98.3% on 1,000 LibriSpeech test-clean clips across eight attack configurations (including identity, MP3/OGG 128 kbps, resampling, and cropping) at a mean PESQ of 3.02. The magnitude-QIM channel significantly improved robustness against lossy codecs. A white-box erasure attack showed that verification only collapses after substantial perceptual degradation (a 1.3 PESQ point drop), in contrast to the zero-cost removal of metadata-only signatures.
Approach
APC embeds a Reed-Solomon encoded Ed25519 digital signature into audio using a hybrid approach. It utilizes pseudo-random STFT phase-bin selection for the primary channel and a quantization-index-modulation (QIM) code on log-magnitude differences of adjacent bin pairs for a survivability channel. This combined strategy ensures robustness against various real-world audio distortions and attacks.
Datasets
LibriSpeech test-clean
Model(s)
None (training-free, non-neural approach)
Author countries
USA