Unlock Creativity with Qwen3: The Best Open-Source Model

36
86

In the rapidly evolving landscape of artificial intelligence, open-source models are gaining significant traction. The recent release of Qwen 3 has stirred the AI community, offering a powerful alternative to proprietary models like Gemini 2.5 Pro. This article delves into the features, performance, and advantages of Qwen 3, highlighting why it’s a game-changer in the AI domain.


2. Understanding Qwen 3

2.1. What is Qwen 3?

Qwen 3 is an open-source large language model (LLM) developed to provide high-performance AI capabilities. With its open weights and source code, it offers transparency and flexibility for developers and researchers.

2.2. Key Features

  • Open-Source: Fully accessible code and weights.

  • Hybrid Thinking Mode: Adjustable reasoning capabilities.

  • Tool Integration: Seamless function calling during chain-of-thought processes.

  • Multiple Model Variants: Including both Mixture of Experts and dense models.


3. Benchmark Comparisons

3.1. Performance Metrics

Qwen 3’s flagship model, Qwen3-235B-A22B, demonstrates impressive performance across various benchmarks:

  • LiveCodeBench: Scores 70.7%, surpassing Gemini 2.5 Pro’s 70.4%.

  • CodeForces ELO Rating: Achieves 2056, compared to Gemini 2.5 Pro’s 2001.

  • BFCL (Berkeley Function Calling Leaderboard): Attains a score of 70.8, outperforming Gemini 2.5 Pro’s 62.9.

3.2. Function Calling Capabilities

Qwen 3 excels in function calling tasks, crucial for agentic applications and coding assistance. Its superior performance in BFCL benchmarks underscores its proficiency in this area.


4. Hybrid Thinking Mode

4.1. Thinking vs. Non-Thinking Modes

Qwen 3 introduces a hybrid approach to problem-solving:

  • Thinking Mode: Engages in step-by-step reasoning for complex tasks.

  • Non-Thinking Mode: Provides rapid responses for straightforward queries.

4.2. Adjustable Thinking Budget

Users can configure the model’s reasoning depth by adjusting the token budget, balancing performance and speed according to task requirements.


5. Model Variants

5.1. Mixture of Experts (MoE) Models

  • Qwen3-235B-A22B: 235 billion parameters with 22 billion active parameters.

  • Qwen3-30B-A3B: 30 billion parameters with 3 billion active parameters, optimized for efficiency.

5.2. Dense Models

Qwen 3 offers six dense models ranging from 600 million to 32 billion parameters, catering to various computational capacities and application needs.


6. Training and Data

6.1. Pre-training Stages

Qwen 3 underwent a comprehensive training process:

  • Stage 1: Pre-trained on over 30 trillion tokens to establish foundational language skills.

  • Stage 2: Focused on knowledge-intensive data, including STEM and reasoning tasks.

  • Stage 3: Extended context length to 32K tokens using high-quality long-context data.

6.2. Post-training Enhancements

Post-training involved:

  • Long Chain-of-Thought Training: Enhanced reasoning abilities.

  • Reinforcement Learning: Improved model exploration and exploitation capabilities.

  • Thinking Model Fusion: Integrated quick response capabilities.

  • General Reinforcement Learning: Strengthened general capabilities and corrected undesired behaviors.


7. Tool Integration and Use Cases

7.1. Tool Calling During Chain of Thought

Qwen 3’s ability to perform tool calls within its reasoning process enables complex task execution, such as:

  • Fetching data from APIs.

  • Organizing files based on type.

  • Generating and executing code snippets.

7.2. Integration with Zapier MCP

Through Zapier’s MCP server, Qwen 3 can connect with over 7,000 applications, facilitating extensive automation and integration capabilities.


8. Comparison with Gemini 2.5 Pro

8.1. Performance Benchmarks

While Gemini 2.5 Pro leads in certain benchmarks, Qwen 3 closely trails, often surpassing in specific areas like function calling and code generation.

8.2. Open-Source Advantage

Unlike Gemini 2.5 Pro, Qwen 3’s open-source nature allows for:

  • Greater transparency.

  • Customization and fine-tuning.

  • Broader accessibility for research and development.


9. Deployment and Accessibility

9.1. Running Qwen 3 Locally

Qwen 3 can be deployed locally using platforms like LM Studio, offering users control over their AI applications without reliance on external APIs.

9.2. Platform Support

The model is compatible with various frameworks, including:

  • Ollama

  • MLX

  • Llama.cpp

  • K Transformers


10. Conclusion

Qwen 3 emerges as a formidable open-source LLM, challenging proprietary models with its robust performance, hybrid thinking capabilities, and extensive tool integration. Its accessibility and flexibility make it a valuable asset for developers, researchers, and organizations seeking advanced AI solutions.


11. FAQs

Q1: What sets Qwen 3 apart from other open-source models?

A1: Qwen 3’s hybrid thinking mode, superior function calling capabilities, and extensive tool integration distinguish it from other open-source LLMs.

Q2: Can Qwen 3 be fine-tuned for specific applications?

A2: Yes, its open-source nature allows for customization and fine-tuning to cater to specific use cases.

Q3: How does Qwen 3 handle complex tasks?

A3: Utilizing its thinking mode, Qwen 3 engages in step-by-step reasoning, making it adept at handling complex problems requiring deeper analysis.

Q4: Is Qwen 3 suitable for real-time applications?

A4: Absolutely. Its non-thinking mode provides quick responses, making it ideal for applications where speed is crucial.

Q5: Where can I access Qwen 3?

A5: Qwen 3 is available on platforms like Hugging Face, LM Studio, and can be integrated using frameworks such as Ollama and Llama.cpp.

source

36 COMMENTS

  1. Good league promotion, now Mr Matthew is affiliated with Zapier… I bet that is a result of his work being highlighted on a Google show. Way to go, congratulations, This channel was one of the first I ever followed on AI developments…

  2. I can't believe you you made a video out of reading Qwen3 release material, parroting the fake benchmark data and not even bothering to test it, while claming it's amazing and being all enthusiastic. You literally have zero credibility anymore. Why the ridiculous thumbnail like you just discovered that Qwen3 is Jesus reincarnated as an LLM? You can't possibly think people are liking this crap.

  3. Not multimodal unfortunately. But otherwise the best 32b model I've ever tested locally, and I have tested A LOT. In thinking mode outperforms even the best 70b models I have tested. It's also great that you simply can switch it to fast mode and even then it's still strong. There are some pitfalls though like that you can be default only use 32K context without changing some configuration options that Ollama doesn't even expose. So with main versions on their site, you are stuck with 32k. Also at low temperatures the reasoning can get stuck. Reasoning worked best at 0.8 for me and still good at 0.6 (which is also the recommended default). But I've also only tested the q4 from ollama. There are probably better quantizations, especially from unslaught that already offers a 128k context variant. I couldn't go full context with these either though yet, but ollama is just stupid (no KV cache splitting, llama.cpp's split options not exposed).

  4. I used the biggest one in hugging hat and it's BAAAD. It's neither possible to steer it up with custom system prompt (for the moment it doesn't work even if it's activated by user). Matthew videos of last year are all the same: benchmarks,fast generations and his enthusiasm….

  5. Qwen3 also fails Needle in a heystack on every level. I've given it the same 10k instruction document over 100 times now, and it's not once been able to keep the information strait. Referencing and using info from part 5 in the data set in other parts, mixing everything up so bad that the output is legit gibberish and useless. If I were to give it a NIHS score, it gets 20%. The worst I've ever seen.

  6. Qwen3 is honestly terrible. It's actively worse than Qwen 2.5 in everything other than answering knowledge based questions. It doesn't follow instructions at all, it can't be used to Role play an Agent. Seriously, it is the worst model I've seen in the last year.

  7. I did a quick test with qwen3.32b and it was much better without thinking. I wanted some dart code. With thinking I got not dart but javascript and the code did not do exactly what I wanted. Without thinking I got some excellent code in dart. From now on qwen3.32b-no-thinking and gemma3.27b are my two favorite LLMs.

  8. @ Matthew Berman:
    I've discovered that apparently, according to the llm, it's knowledge cut off date is October 2023. Can you verify this? Does this matter in your opinion? (I found it did in my case, based on the prompt I provided it)

    It's still pretty dam good though, but it aint Gemini 2.5 Pro.

    Anyone else want to chime in? I welcome all opinions, as long as they're constructive to some degree (if you want to vent frustration then that's fine too).

  9. TIMESTAMPS*, Sponsor Skips & Summary (by *VidSkipper AI ): Qwen3 is a new open-source model comparable to Gemini 2.5 Pro. It features a hybrid thinking model, optimized for agent-based tasks and coding with tool calling during chain of thought.
    0:00 🚀 Qwen 3 Overview
    • 🚀 Qwen 3, an open-source model, rivals Gemini 2.5 Pro, excelling in coding and function calling.
    • ⚙️ Features hybrid thinking, balancing deep reasoning with fast responses, ideal for tasks like coding.
    • 💾 Trained on 36 trillion tokens, incorporating web data, PDFs, textbooks, and synthetic data for enhanced knowledge.
    2:08 🧠 Hybrid Thinking Model
    • 🧠 Hybrid thinking model allows adjusting the 'thinking budget,' optimizing for complex tasks requiring deeper thought.
    • 💻 Suitable for coding, allowing configuration of task-specific budgets, achieving balance between cost-efficiency and quality.
    • 🛠️ Optimized for MCP tool usage, integrates with tools via Zapier's MCP server.

    [Skip ad: 5:25] 4:01 (84s): ⏭ Sponsor

    5:25 👨‍💻 Qwen 3 Models
    • 👨‍💻 Released two Mixture of Experts models and six dense models, including flagship 235B parameter model.
    • 💾 Flagship model: Qwen 3 235B with 22 billion active parameters, 128 experts, 128K context length.
    • ⚡ Efficient model: 30B parameter model with 3 billion active parameters, ideal for fast performance.
    6:44 🛠️ Advanced Capabilities
    • 🛠️ Tool calling during chain of thought, enabling complex tasks like fetching GitHub stars and plotting bar charts.
    • 🗂️ Can organize desktops by file type within the same inference run, showcasing advanced computer use.
    • 🧪 Utilizes pre-training data from web, PDFs, textbooks, and synthetic data, enhanced by previous Qwen models.
    9:58 ⚙️ Training and Availability
    • ⚙️ Four-stage training pipeline includes long chain of thought, reasoning reinforcement learning, and thinking model fusion.
    • 💡 Integrates non-thinking capabilities for quick responses, fine-tuned for reasoning and rapid response balance.
    • 🚀 Employs reinforcement learning across 20 general domain tasks to strengthen general capabilities.
    11:22 🥇 Benchmarks and Testing
    • 💻 Available on LM Studio, O Lama, MLX, Llama CPP, and K Transformers, outperforming Llama 4.
    • 🥇 Excels in benchmarks like MMLU and GPQA, demonstrating superior performance across various tasks.
    • 🧪 Independent benchmarks confirm flagship model's strong performance in scientific reasoning.

    SPONSORED SEGMENTS
    4:015:25 (84s): ⏭ Sponsor

    ** Generated using ✨ VidSkipper AI Chrome Extension

  10. I wish new release videos like this came with a quick summary at the beginning… List each version of the new model, whether it can run on each of the common VRAM sizes (8/12/24/48/80/132…) and if it is multi-modal. I know that it could hurt your ratings because right now you have my attention for the whole video while I wait with growing frustration for the two items of information that I am looking for. More often than not, I walk away with an implied answer to both questions. For example having watched this video entirely, I sort of know that the 30B version can run in 92gb and (because you never mentioned it) that the model is not multi-modal. So I cross Qwen-3 off my testing list and move on. As always, thanks for a great video!

  11. Summary by gemini

    Here's a summary of the video:
    * Performance and Benchmarks: Qwen 3 rivals Gemini 2.5 Pro in various benchmarks, showcasing strong performance in coding and agentic tasks [00:07]. It excels in function calling and demonstrates impressive results in Live Codebench and Code Forces [00:41].
    * Hybrid Thinking Model: Qwen 3 introduces a hybrid approach, allowing users to adjust the "thinking budget" for the model. This means it can provide quick responses for simple tasks or take more time for complex problems [02:13].
    * Model Specifications: The Qwen 3 family includes various models, with the flagship model having 235 billion parameters and 22 billion active parameters [05:27]. There are also smaller, more efficient models available [06:03].
    * Tool Calling and Computer Use: Qwen 3 can perform tool calling during chain of thought, enabling it to handle complex tasks like fetching GitHub stars and plotting a bar chart [06:49]. It can also perform computer tasks, such as organizing files [07:41].
    * Pre-training and Post-training: Qwen 3 was trained on 36 trillion tokens across 119 languages, using a multi-stage process that included synthetic data generation [08:25]. The post-training process focused on developing the hybrid model and improving its reasoning and response capabilities [09:57].
    * Comparison to Llama 4: Qwen 3 outperforms Llama 4 in several benchmarks, making it a strong contender in the open-source model landscape [11:37].
    * Availability and Speed: The model is available for download on platforms like LM Studio and O Lama [11:18]. The 30 billion parameter model demonstrates impressive speed, especially on systems with powerful hardware [13:35].
    Do you have any other questions about this video, or would you like to explore other videos?