Qwen2.5-Max: A Secure Mixture-of-Experts Model
The latest stable release of Qwen2.5-Max, dated January 28, 2025, is a Mixture-of-Experts (MoE) language model developed by Alibaba. Like other advanced AI models, it excels at text generation, multilingual comprehension, and complex reasoning. Notably, recent security benchmarks indicate that Qwen2.5-Max is more resilient to cyberattacks than DeepSeek-V3-0324.
Security Testing with Recon
Protect AI, the company behind the security testing tool Recon, recently conducted a comparative analysis of Qwen2.5-Max and DeepSeek-V3-0324.
According to their report:
“DeepSeek-V3-0324 is more vulnerable than Qwen2.5-Max, with Recon achieving an almost 25% higher attack success rate (ASR).”
Despite its improved security, Qwen2.5-Max remains susceptible to cyber threats. Tests showed that prompt injection attacks were the most common vulnerability, accounting for 48% of all successful attacks. In contrast, evasion and jailbreak attempts had a lower ASR of around 40% each.
Exposing DeepSeek-V3’s Vulnerabilities
Recon evaluates AI security across six key attack categories:
- Evasion techniques
- System prompt leaks
- Prompt injection attacks
- AI jailbreak attempts
- General safety controls
- Adversarial suffix resistance
In addition to running simulated cyberattacks, Recon also tests how resistant models are to generating harmful or illegal content. For example, adversarial suffix resistance testing attempts to manipulate AI into producing restricted outputs.
When tested against both models, Qwen2.5-Max consistently showed a lower ASR than DeepSeek-V3-0324 across multiple attack types:
Attack Type | Qwen2.5-Max ASR | DeepSeek-V3 ASR |
---|---|---|
Prompt Injection | 47% | 77% |
Evasion Techniques | 39.4% | 69.2% |
Jailbreak Attempts | ~40% | Higher ASR |
These results indicate that DeepSeek-V3 is significantly more vulnerable to cyber threats compared to Qwen2.5-Max.
DeepSeek-V3’s Strengths
Despite its security weaknesses, DeepSeek-V3-0324 outperforms Qwen2.5-Max in several key benchmarks. Unlike ASR (where a lower score is better), a higher score in these tests reflects stronger performance.
Benchmark | DeepSeek-V3-0324 | Qwen2.5-Max |
---|---|---|
MMLU-Pro (General Knowledge) | 81.2 | 75.9 |
GPQA Diamond (Advanced Sciences) | 68.4 | 59.1 |
MATH-500 (Mathematics) | 94.0 | 90.2 |
AIME 2024 (AI in Medicine) | 59.4 | 39.6 |
LiveCodeBench (Programming) | 49.2 | 39.2 |
These results highlight DeepSeek-V3’s strengths in language understanding, STEM subjects, medical AI, and coding—making it a strong competitor in performance, despite its higher security risks.