Benchmarks Find ‘DeepSeek-V3-0324 Is More Vulnerable Than Qwen2.5-Max’

Last updated: April 4, 2025 3:02 pm

Follow:

3 Min Read

Qwen2.5-Max: A Secure Mixture-of-Experts Model

The latest stable release of Qwen2.5-Max, dated January 28, 2025, is a Mixture-of-Experts (MoE) language model developed by Alibaba. Like other advanced AI models, it excels at text generation, multilingual comprehension, and complex reasoning. Notably, recent security benchmarks indicate that Qwen2.5-Max is more resilient to cyberattacks than DeepSeek-V3-0324.

Contents

Qwen2.5-Max: A Secure Mixture-of-Experts Model Security Testing with Recon Exposing DeepSeek-V3’s Vulnerabilities DeepSeek-V3’s Strengths

Security Testing with Recon

Protect AI, the company behind the security testing tool Recon, recently conducted a comparative analysis of Qwen2.5-Max and DeepSeek-V3-0324.

According to their report:

“DeepSeek-V3-0324 is more vulnerable than Qwen2.5-Max, with Recon achieving an almost 25% higher attack success rate (ASR).”

Despite its improved security, Qwen2.5-Max remains susceptible to cyber threats. Tests showed that prompt injection attacks were the most common vulnerability, accounting for 48% of all successful attacks. In contrast, evasion and jailbreak attempts had a lower ASR of around 40% each.

- Advertisement -

Exposing DeepSeek-V3’s Vulnerabilities

Recon evaluates AI security across six key attack categories:

Evasion techniques
System prompt leaks
Prompt injection attacks
AI jailbreak attempts
General safety controls
Adversarial suffix resistance

In addition to running simulated cyberattacks, Recon also tests how resistant models are to generating harmful or illegal content. For example, adversarial suffix resistance testing attempts to manipulate AI into producing restricted outputs.

When tested against both models, Qwen2.5-Max consistently showed a lower ASR than DeepSeek-V3-0324 across multiple attack types:

Attack Type	Qwen2.5-Max ASR	DeepSeek-V3 ASR
Prompt Injection	47%	77%
Evasion Techniques	39.4%	69.2%
Jailbreak Attempts	~40%	Higher ASR

These results indicate that DeepSeek-V3 is significantly more vulnerable to cyber threats compared to Qwen2.5-Max.

DeepSeek-V3’s Strengths

Despite its security weaknesses, DeepSeek-V3-0324 outperforms Qwen2.5-Max in several key benchmarks. Unlike ASR (where a lower score is better), a higher score in these tests reflects stronger performance.

Promoted

Benchmark	DeepSeek-V3-0324	Qwen2.5-Max
MMLU-Pro (General Knowledge)	81.2	75.9
GPQA Diamond (Advanced Sciences)	68.4	59.1
MATH-500 (Mathematics)	94.0	90.2
AIME 2024 (AI in Medicine)	59.4	39.6
LiveCodeBench (Programming)	49.2	39.2

These results highlight DeepSeek-V3’s strengths in language understanding, STEM subjects, medical AI, and coding—making it a strong competitor in performance, despite its higher security risks.

TAGGED:ai ai models ai security alibaba artificial intelligence Benchmarks Cybersecurity DeepSeek deepseek-v3 DeepSeekV30324 Find qwen2.5-max Qwen2.5Max threats and vulnerabilities Vulnerable

Share This Article

TonaPTC – Subscription Based Pay Per Click Platform

Gen AI is in the ‘Trough of Disillusionment,’ Yet Spending Expected to Increase Through 2028

Leave a review

Archives

Categories

Benchmarks Find ‘DeepSeek-V3-0324 Is More Vulnerable Than Qwen2.5-Max’