NVIDIA’s DGX Spark entered the desktop AI market in October 2025 at $3,999, positioning itself as a “desktop AI supercomputer.” The system packs 128GB of unified memory and promises one petaflop of FP4 AI performance in a Mac Mini-sized chassis.
See the benchmark results reveal more about value and performance compared to alternatives, such as AMD’s Strix Halo and Apple’s Mac Studio.
DGX Spark: Technical specifications
The DGX Spark features NVIDIA’s GB10 Grace Blackwell Superchip with:
- 20 CPU cores (10 Cortex-X925 + 10 Cortex-A725)
- 128GB unified LPDDR5X memory
- 273 GB/s memory bandwidth (shared between CPU and GPU)
- Dual 100 Gb ConnectX-7 networking for clustering
- 1 petaflop of sparse FP4 AI compute
The system’s defining advantage is its ability to load models with up to 120B parameters in memory, but its LPDDR5X memory bandwidth of 273 GB/s becomes the primary bottleneck for token generation.
Raw performance benchmarks
llama.cpp results
Early benchmarks from llama.cpp developer Georgi Gerganov provides baseline performance metrics. The tests measured prompt processing (how quickly the model ingests input) and token generation (response speed):
Source: Hardware-Corner.net 1
The pattern is clear: DGX Spark excels at prompt processing (compute-bound) but struggles with token generation (memory-bound).
Ollama performance tests
Official Ollama benchmarks using firmware version 580.95.05 and Ollama v0.12.6 tested multiple models with standardized conditions:
Source: Ollama Blog 2
Note: OpenAI’s gpt-oss models tested by Ollama use the official MXFP4 format with BF16 in the attention layers, not the q8_0-quantized versions found in some online GGUFs.
Competitive analysis: DGX Spark vs. Alternatives
Head-to-Head comparison (GPT-OSS 120B Model)
When comparing systems on the demanding GPT-OSS 120B model (MXFP4 format), performance differences become stark:
Sources: Hardware-Corner.net 3 , IntuitionLabs.ai 4
Key performance insights
- Prompt processing: DGX Spark and 3×RTX 3090 are nearly identical (1,723 vs 1,642 tokens/sec), with DGX Spark slightly ahead due to FP4 efficiency. The AMD Strix Halo lags significantly at 340 tokens/sec despite similar FP4 capabilities.
- Token generation: The 3×RTX 3090 setup dominates at 124 tokens/sec, more than 3× faster than DGX Spark’s 38.55 tokens/sec. This confirms that LPDDR5X memory bandwidth (273 GB/s) is the bottleneck compared to GDDR6X aggregate bandwidth.
- Memory capacity advantage: DGX Spark’s 128GB unified memory enables it to run models that would crash on 24GB GPUs. A single RTX 3090 cannot run 120B models without offloading to slower system RAM.
Source: LMSYS Org 5 , Substack 6
The chart shows decode speeds across various models using Ollama with a batch size of.
The chart demonstrates that:
- For smaller models (GPT-OSS 20B, Llama-3.1 8B), performance is nearly identical
- For medium models (Gemma-3 12B, DeepSeek-R1 14B), DGX Spark holds a slight edge
- For large models (Gemma-3 27B, Qwen-3 32B), Mac Mini M4 Pro actually outperforms DGX Spark in decode speed
- Both systems struggle with very large models, but remain usable
Price-performance analysis
Note: Prices are approximate as of November 2025
AMD Strix Halo: The budget alternative
The Framework Desktop with AMD Ryzen AI Max 385 (Strix Halo) offers compelling value at nearly half the price7 :
- Similar 128GB unified memory configuration
- Comparable memory bandwidth (~273 GB/s)
- Supports standard operating systems (Windows/Linux)
- Performance within 10-15% of DGX Spark for most workloads
However, Strix Halo lacks:
- Hardware FP4 acceleration (Blackwell’s key advantage)
- NVIDIA’s CUDA ecosystem and TensorRT optimizations
- Pre-configured AI development environment
Apple Mac Studio: The high-bandwidth option
Apple’s Mac Studio with M3 Ultra and 256GB unified memory presents a different trade-off 8 :
- 3× higher memory bandwidth (819 GB/s vs 273 GB/s)
- Superior token generation performance (70.79 vs 38.55 tok/s on 120B models)
- Doubles as a complete workstation
- Higher price ($4,999+)
Limitations include:
- No CUDA support
- Limited AI framework compatibility
- Performance degradation with extreme context sizes (34 tok/s dropping to 6 tok/s at high context)
Multi-GPU DIY builds:
A 3×RTX 3090 configuration delivers the best raw performance for token generation:
- 124 tokens/sec on 120B models (3.2× faster than DGX Spark)
- Higher aggregate memory bandwidth (~936 GB/s)
- Lower total cost using used GPUs (~$800 each)
Trade-offs include:
- Complex setup and configuration
- Higher power consumption (1050W vs 210W)
- Larger physical footprint
- No out-of-box software stack
When is DGX Spark better?
Despite performance limitations, DGX Spark excels in specific scenarios9 :
1. Rapid prototyping and development
- Pre-configured Ubuntu environment with AI tools installed
- NVIDIA’s official playbooks for common workflows
- Same software stack as enterprise DGX systems
- Smooth transition from desktop to datacenter deployment
2. Model experimentation at scale
- Run 70B-120B models that won’t fit on consumer GPUs
- Test quantization formats (FP4, FP8, INT4) with hardware acceleration
- Experiment with multi-agent systems and RAG applications
3. Distributed inference research
- Dual QSFP 200Gb networking enables two-unit clusters
- Can run models up to 405B parameters when clustered
- EXO Labs demonstrated 2.8× speedup combining DGX Spark with Mac Studio using disaggregated prefill/decode
4. Educational and academic use
- Universities placing units in research labs (Stanford, MIT CSAIL)
- Included $90 NVIDIA Deep Learning Institute course
- Teaching tool for AI hardware architecture and memory hierarchies
Alternatives to consider
For budget-conscious researchers
Recommendation: AMD Strix Halo systems (Framework Desktop, GMKTec boxes)
- 50% lower cost than DGX Spark
- 90% of the performance for most workloads
- Standard OS compatibility
For production inference
Recommendation: Multi-GPU workstation (RTX 3090s or RTX 4090s)
- Superior token generation throughput
- Scalable to larger model sizes
- Better performance per dollar
For an all-around workstation
Recommendation: Mac Studio M3/M4 Ultra
- Excellent memory bandwidth
- Complete macOS ecosystem
- Strong performance for AI and traditional computing
For model training
Recommendation: Cloud instances (AWS p5, Azure ND H100v5)
- DGX Spark is unsuitable for serious training (limited by memory bandwidth)
- Cloud provides better hardware for training workloads
- The pay-per-use model is more economical
Limitations
Memory bandwidth bottleneck
The 273 GB/s LPDDR5X bandwidth severely limits token generation, especially for large models. This is roughly 1/3 the bandwidth of the Mac Studio M3 Ultra and significantly lower than that of multi-GPU setups.
Price
At $31.24 per GB of memory, DGX Spark charges a significant premium over alternatives. Critics describe it as “selling VRAM at $250/GB when you factor in the AI compute.”
Limited ecosystem (at launch)
Early adopters reported:
- Some PyTorch wheels for CUDA on ARM were missing initially
- Not all AI frameworks are fully optimized for GB10 at launch
- NVIDIA has since addressed many issues with playbooks and updates
Platform lock-in
- Proprietary Ubuntu build (not standard Linux)
- Difficult to use as a general-purpose computer
- No Windows support (by design)
💡Conclusion
DGX Spark occupies a unique niche in the desktop AI landscape. It’s not the fastest system for inference, nor the most economical. Instead, it offers convenience, ecosystem integration, and the ability to run models that won’t fit elsewhere.
For most users focused on performance per dollar, AMD Strix Halo systems or multi-GPU builds provide better value. For those who need extreme memory bandwidth, the Mac Studio M3 Ultra excels. For production workloads, cloud instances remain the superior choice.
However, for AI developers who value:
- Turnkey deployment
- NVIDIA ecosystem compatibility
- Experimentation with cutting-edge models
- Smooth path to enterprise scaling
DGX Spark delivers a compelling, if expensive, solution. It’s less about raw benchmark numbers and more about enabling AI development workflows that were previously complex or impossible on desktop hardware.
Methodology
This analysis synthesizes benchmark data from multiple independent sources:
- Hardware-Corner.net 10 : Allan Witt’s llama.cpp benchmarks comparing DGX Spark, AMD Strix Halo, and multi-GPU systems.
- Ollama Official Blog 11 : Standardized performance tests using Ollama v0.12.6 with firmware 580.95.05.
- IntuitionLabs.ai 12 : Comprehensive review with SGLang and Ollama benchmarks across multiple platforms.
- Level1Techs Forum 13 : Wendell’s hands-on review focusing on the software ecosystem and practical use cases.
All benchmarks use publicly available models with consistent test conditions where possible. Variations in results between sources are minimal (typically <5%) and attributable to firmware versions, software configurations, and testing methodologies.
FAQs
Further reading
- Top 30 Cloud GPU Providers & Their GPUs
- GPU Software for AI: CUDA vs. ROCm
- Top 20+ AI Chip Makers: NVIDIA & Its Competitors
- Multi-GPU Benchmark: B200 vs H200 vs H100 vs MI300X
If you need help finding a vendor or have any questions, feel free to contact us:
Find the Right VendorsReference Links

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.



Be the first to comment
Your email address will not be published. All fields are required.