AudioHijack: Imperceptible Audio Attacks Hijack AI Voice Models with Up to 96% Success Rate

Security1 source·May 18

red-teaming prompt-injection microsoft mistral audio-language-models

Summary

• Researchers embed imperceptible malicious commands in audio clips, hijacking voice AI models at 79–96% success rate
• Single 30-minute crafted attack signal is reusable indefinitely regardless of what users say (context-agnostic)
• Validated against 13 models including Microsoft and Mistral; triggered web searches, file downloads, email exfiltration
• Attacker needs only audio control — not prompt access — enabling passive attacks on active third-party user sessions

Adjust signal

Details

#	Type	Key Point	Context
1	Research	AudioHijack presented at IEEE Symposium on Security and Privacy by Zhejiang University team	Lead author Meng Chen, a PhD student at Zhejiang University, developed AudioHijack for presentation at the IEEE Symposium on Security and Privacy in San Francisco — a top-tier peer-reviewed security venue, lending the research significant credibility.
2	Security Alert	79–96% success rate across 13 models including commercial services from Microsoft and Mistral	The attack was validated against 13 leading open and commercial audio-language models. Demonstrated malicious outcomes include sensitive web searches, downloading files from attacker-controlled servers, and sending emails containing user data — all real agentic capabilities that modern LALMs can perform.
3	Tech Info	Iterative waveform optimization embeds imperceptible malicious commands in audio files	The algorithm repeatedly adjusts a digital audio clip's numerical waveform values — changes inaudible to humans — while measuring the model's response and refining until the model executes the desired malicious action. This is a well-established adversarial example approach now extended to generative audio models.
4	Security Alert	Context-agnostic: 30 minutes of training produces a reusable attack signal effective indefinitely	Once trained, the malicious signal works against the target model regardless of user instructions. This dramatically lowers the barrier to sustained attacks — a single 30-minute investment creates a reusable weapon deployable across all future user sessions with the targeted model.
5	Security Alert	Attacker controls only audio data — not user instructions — enabling passive third-party attacks	Prior generative model attacks required attacker-as-user access (controlling both the audio input and system prompt). AudioHijack breaks this constraint: by manipulating only the audio being processed, attackers can hijack a model during an active session belonging to a completely different user.
6	Market Impact	Attack vectors span YouTube videos, music, voice notes, Zoom calls, and live voice chats	Any audio a user queries an AI model about is a potential attack surface. Embedding malicious audio in widely distributed content (YouTube, music) or live Zoom meeting recordings sent to transcription services creates scalable passive attack channels. Unpublished follow-on research demonstrates real-time injection into live AI voice chats.
7	Insight	Structural flaw: LALMs cannot separate legitimate user audio from attacker-injected instructions	The core vulnerability mirrors prompt injection in text LLMs — models accept instructions in the same medium as content, making separation impossible. As voice AI gains agentic capabilities (web access, email, file downloads), this attack surface will grow and is harder to filter than text-based attacks because malicious signals are imperceptible to human review.

1.Research

AudioHijack presented at IEEE Symposium on Security and Privacy by Zhejiang University team

Lead author Meng Chen, a PhD student at Zhejiang University, developed AudioHijack for presentation at the IEEE Symposium on Security and Privacy in San Francisco — a top-tier peer-reviewed security venue, lending the research significant credibility.

2.Security Alert

79–96% success rate across 13 models including commercial services from Microsoft and Mistral

The attack was validated against 13 leading open and commercial audio-language models. Demonstrated malicious outcomes include sensitive web searches, downloading files from attacker-controlled servers, and sending emails containing user data — all real agentic capabilities that modern LALMs can perform.

3.Tech Info

Iterative waveform optimization embeds imperceptible malicious commands in audio files

The algorithm repeatedly adjusts a digital audio clip's numerical waveform values — changes inaudible to humans — while measuring the model's response and refining until the model executes the desired malicious action. This is a well-established adversarial example approach now extended to generative audio models.

4.Security Alert

Context-agnostic: 30 minutes of training produces a reusable attack signal effective indefinitely

Once trained, the malicious signal works against the target model regardless of user instructions. This dramatically lowers the barrier to sustained attacks — a single 30-minute investment creates a reusable weapon deployable across all future user sessions with the targeted model.

5.Security Alert

Attacker controls only audio data — not user instructions — enabling passive third-party attacks

Prior generative model attacks required attacker-as-user access (controlling both the audio input and system prompt). AudioHijack breaks this constraint: by manipulating only the audio being processed, attackers can hijack a model during an active session belonging to a completely different user.

6.Market Impact

Attack vectors span YouTube videos, music, voice notes, Zoom calls, and live voice chats

Any audio a user queries an AI model about is a potential attack surface. Embedding malicious audio in widely distributed content (YouTube, music) or live Zoom meeting recordings sent to transcription services creates scalable passive attack channels. Unpublished follow-on research demonstrates real-time injection into live AI voice chats.

7.Insight

Structural flaw: LALMs cannot separate legitimate user audio from attacker-injected instructions

The core vulnerability mirrors prompt injection in text LLMs — models accept instructions in the same medium as content, making separation impossible. As voice AI gains agentic capabilities (web access, email, file downloads), this attack surface will grow and is harder to filter than text-based attacks because malicious signals are imperceptible to human review.

Research = academic publication context; Security Alert = specific attack capability and vulnerability; Tech Info = attack mechanism details; Market Impact = real-world attack vectors and scale; Insight = structural and strategic implications

What This Means

As voice AI systems gain the ability to take real-world actions — browsing the web, sending emails, executing code — adversarial audio attacks like AudioHijack represent a serious and scalable threat. A single 30-minute investment produces a reusable weapon that can hijack agentic AI models across any audio content it processes, from YouTube videos to Zoom recordings, with no user awareness. AI practitioners deploying audio-language models in agentic or customer-facing contexts should treat untrusted audio as a code-injection surface and begin evaluating defenses before this class of attack moves from academic research to active exploitation.

Sources

Voice AI Systems Are Vulnerable to Hidden Audio AttacksSpectrum

Similar Events

Northeastern Study: OpenClaw AI Agents Manipulated Into Self-Sabotage via Social Engineering

Mar 25

AI Safety Controls Remain Easy to Bypass, Researchers Warn

May 14