Unveiling the Qwen 3 235b: A Deep Dive into Local AI Performance vs Deepseek R1 671b
The world of AI is evolving at breakneck speed, and one of the most anticipated showdowns in recent months is the Qwen 3 235b vs Deepseek R1 671b. This comprehensive review explores how the Qwen 3 235b, a cutting-edge large language model, stacks up against the formidable Deepseek R1 671b across three distinct hardware setups. From high-end workstations to budget-friendly builds, the results are nothing short of groundbreaking. Whether you’re a tech enthusiast or a professional deploying AI locally, this analysis will help you decide which model delivers the best performance for your needs.
Benchmarking the Qwen 3 235b vs Deepseek R1 671b: Real-World Testing
To truly understand the capabilities of the Qwen 3 235b, it was put through rigorous benchmarking tests on three rigs: the Quad 3090 AI server, the Threadripper 7995WX powerhouse, and the budget-friendly $500 Z440 build. The tests focused on real-world scenarios, including code generation, text parsing, and creative tasks like SVG creation. The results revealed that the Qwen 3 235b not only holds its own but often surpasses the Deepseek R1 671b in speed and accuracy. For users prioritizing efficiency and versatility, the Qwen 3 235b emerges as a compelling choice.
Local LLM Testing: Questions That Define AI Potential
One of the most intriguing aspects of this review was the use of specialized tests to evaluate the Qwen 3 235b’s adaptability. Tasks like parsing arbitrary arrays, generating game code, and even decoding 100 digits of Pi showcased the model’s problem-solving prowess. These challenges are crucial for developers and researchers relying on local AI systems. The Qwen 3 235b’s ability to tackle complex queries without cloud dependency highlights its potential as a standalone solution. For those seeking a reliable local AI server, this model is a game-changer.
The $500 Z440 Build: Affordable Power for Qwen 3 235b
The $500 Deepseek R1 671b on a Z440 build might seem like a budget-friendly option, but the Qwen 3 235b’s performance on similar hardware proves otherwise. By leveraging the Z440’s 32GB to 64GB RAM setup and optimized SSD storage, the Qwen 3 235b delivers results that rival its more expensive counterparts. This makes it an ideal choice for small businesses or hobbyists looking to experiment with AI without breaking the bank. The flexibility of the Z440 rig also allows for upgrades, ensuring long-term value.
High-End vs Mid-Range Builds: Where Does Qwen 3 235b Shine?
The Qwen 3 235b was also tested on the Threadripper 7995WX rig, a high-end workstation featuring a 7995WX CPU, four RTX 3090 GPUs, and 256GB of RAM. Here, the model demonstrated exceptional scalability, handling tasks with minimal latency. Meanwhile, the mid-range build highlighted its efficiency in resource-constrained environments. Whether you’re investing in a high-end AI server or a compact setup, the Qwen 3 235b adapts seamlessly, making it a versatile contender in the Qwen 3 235b vs Deepseek R1 671b debate.
Creative Tasks: SVG Generation and Beyond
One of the standout features of the Qwen 3 235b is its creativity. In the SVG test, it generated intricate graphics with precision, outperforming the Deepseek R1 671b in both speed and complexity. This makes it a favorite for designers and developers working on visual projects. The model’s ability to handle tasks like “Pico de Gato” (a tricky programming challenge) further underscores its reliability. For users who need AI that’s as creative as it is functional, the Qwen 3 235b is a top pick.
The Verdict: Is Qwen 3 235b the Future of Local AI?
After extensive testing across multiple rigs, it’s clear that the Qwen 3 235b holds its own against the Deepseek R1 671b. While the Deepseek model has its strengths, the Qwen 3 235b’s efficiency, adaptability, and performance in real-world scenarios make it a stronger option for many use cases. Whether you’re building a local AI server or optimizing an existing setup, the Qwen 3 235b vs Deepseek R1 671b comparison shows that the former is a worthy investment.
Support the Journey: Join the Community
For readers interested in more deep dives into AI hardware and software, subscribe to Digital Space Port on YouTube and explore their website for detailed server build guides. By supporting the channel through memberships, Buy Me a Coffee, or Patreon, you help fuel future content on the latest in AI innovation.
Final Thoughts
The Qwen 3 235b’s ability to outperform the Deepseek R1 671b across diverse rigs solidifies its position as a leader in local AI. With its robust benchmarking results and adaptability, it’s no wonder this model is gaining traction among developers and enthusiasts alike. As the AI landscape continues to evolve, the Qwen 3 235b vs Deepseek R1 671b showdown is just the beginning of an exciting new era.
Poll 1 – This is a CAT: http://youtube.com/post/Ugkx_GPFWSH6m6mJYmPtWLumV4LdKX1Es8mf?si=in5WOyGit9h2FHqj
Poll 2 – Armageddon with a Twist: http://youtube.com/post/Ugkx19eoGaw6sXp4Aje7Rhf560lB1elE7D5j?si=SMUK1CdAaRPtsJYN
Is QWEN better than deepseek when it comes to companies future outlook, finance, trading related questions and topics
2:54
[sad llama.cpp noises]
For the quad 3090 rig, if you use a pair of NVLink Bridges it will improve the tokens/sec since reasoning models are very reliant on bandwidth. If you're curious, the deepseek whitepaper does a good job explaining why. PCIe 4 tops out at about 64Gb/s while nvlink at about 300 Gb/s.
By the way, I've tried qwen 3 on both ollama and llamacpp and there wasn't much improvement if any with llamacpp. I usually notice a big boost on llamacpp with non reasoning models like gemma3.
30 minutes on a vid is a good maximum length. Something longer, divide it up. Not content lost. Viewers often have lives with other demands. Maybe multiple vids with a summary/nuances vid? BTW, ask the penultimate letter in the sentence. Interesting that some models include the period as the last letter.
why QWEN3 removed all other models? from Ollama? there are only the smalls
Re: Cat. The Z Layers are scrambled. The thing in front should be sent backwards and the two triangles at the top should be sent to back.
Curious if there's a noticeable difference between PCIe Gen 1 to 3 for, say, running [specific LLM model]? A video comparing them would be awesome
DeepSeek 671B, so far, has not been surpassed by a local AI, it may even take longer, be heavier for a simple machine, but in the end it makes up for it with more cohesive responses. The others only demonstrate the ability to make mistakes and formulate fake results in benchmarks, as if they would not be unmasked in use.
Why compare to R1? Isnt that already outdated and bad compared to everything new?
mistaken cat
Great video as always, some recommendations:
– use unsloth dynamic quants v2
– use LM studio as backend, at least currently it's much faster
this post is written just to make more activity to the channel, thx
I liked the cat abstraction.
Im running Qwen3:253b-a22b and deepseek-r1:671b on 8 Nvidia H200 GPUs at the same time.
Quality wise, id say the are very close which is impressive. However even though its less then half the size of the R1, the qwen model has the same speed as R1.
Both give me around 25 tokens per second.
Is that good for 8 H200 gpus? i kind of expected more eventhough im running on ollama. Gonna try to compare the benchmarks with vllm as well.
Nice!!! My needs are humble….just enough to run a 70b (prefer 128b) model for creative writing. Something that keep track of all my notes and timeline.
Hi there @DigitalSpaceport,
ime in the UK & fairly new to running local LLM's, (been at it about a year)
ime using a Zbook fury workstation with a pair of Ada generation rtx5000 gpus each with 16gb's of vram & 128gigs of system RAM with an i914900k & the max 12 terabytes of M.2 storage.
Ive managed to get ollama setup with docker & openweb UI, i muddled thru it tho, it took me days.
One thing i would really love to do is get my LLM's to speak their answers all on my local machine,
but my knowledge is very rudimentary at present,
is it possible for me to be able to get the answers spoken with hardly any coding experience without copy pasting into a text to speech application ?
I am also building a set of custom external GPU's using a pair of Nvidia Tesla V100 Sxm2 32Gb modules to enable me to run much more powerful models,
as you can tell ive got the A.I. bug.
If you dont get back to me as i know your probably snowed under with questions from the community i just wanted you to know that you have a great channel with so much interesting stuff that i need to learn about, glad i found you, keep up the cool A.I. stuff, ime as hooked on your channel as i am running the LLM's.
you should at least test in q8 to know ot full performance
There is a new KTransformers v0.3 that supports Qwen 3 MoE. It should really improve the performance for hybrid inference (CPU + GPU).
It would be great to see how it improves performance of your AI server. I don't have enough RAM to test it locally.
Try to move the cat's head to the foreground by layers, it seems to me that there is a mistake in the arrangement of the layers there.
Can you pls test gemma 3 on amd 9070 xt?
I test the different LLM´s with the question — build the game Qix and make it work in the chat. So far only Gemini 2.5 does the job.
Thanks